Migrating to microservices isn’t always needed: first, you must fix your monolith.
The tech industry, and software engineers, are wired for the new and the flashy. That’s also why we see dozens of new javascript frameworks getting created every year, the majority of which end up burning into oblivion as they re-enter the Earth’s atmosphere—well, except for jQuery, which somehow keeps surviving.
The same can be said of architecture: almost every time I hear or read someone mention monolithic architecture, there’s almost always a negative connotation associated with it, because it has this feeling of old and outdated.
But simply because monolithic architectures are older doesn’t mean they’re bad, or that they’re less, compared to service-oriented architecture or to serverless.
This reputation comes from the fact that most monolithic architectures end up turning into giant spaghetti monsters of code, difficult to build, deploy, and evolve. I agree with that, that has also been my experience.
What I disagree with is the speed with which many engineers are ready to discard monolithic architectures entirely and switch to something else, say microservices, which they understand even less, instead of trying to fix what’s already there. There’s a lot of hype involved with such pushes, because it entails using new cool frameworks, new programming languages, and so on.
In addition to that, there’s often a lack of understanding from engineers that the decision of choosing an architecture must never depend only on technical considerations. In fact, it should be the other way around: picking an architecture should always be done by putting most weight on what will best serve the business needs and growth of the company over time, and take technical considerations only as a second-order concern.
And before discarding an entire architecture or codebase, one should always strive to fix it with what’s already there.
In this article, I want to share techniques I’ve seen deployed in production and which have helped make monolithic codebases easier to work with. I’ve seen first-hand that they were successful in improving build times and consequently deployment times.
The six patterns for optimizing your monolithic architecture are:
- Remove build bottlenecks
- Extract frequently updated code areas into their own modules
- Clean up unneeded dependencies
- Clean up unused code
- Enable server-side caching in your CI/CD
- Use subviews to mock dependencies
In the rest of this article, I will share more details about those techniques, and how to go about using them
I’ll be using the term “microservices,” as if I’m taking the perspective of a backend system. But this article also applies to frontend applications and to microfrontend architectures. Also, I’m going to stay very generic in my statements, and that’s intentional: I want to focus on the big picture ideas without being specific to a particular language or toolchain.
Join my email list
The premise
Imagine you’re in the following situation:
- Monolithic architecture developed over 10 years.
- Over 500k lines of code.
- 100+ developers
- Build time is very high, which is a problem for developer velocity (they end up waiting and do nothing during builds).
- Deployments to production have gotten difficult, despite a good CI/CD setup
On top of that, there are also organizational problems:
- Many engineers in the org are saying that it’s a pain to work with this setup, and things would be so much better if only management could see that microservices was the true solution, and if they would fund teams to look after it. They say this although many of them have very limited experience with microservices at scale.
- This is also impacting the overall mood of the rest of the organization, and some engineers start leaving the company, citing the old architecture as one of the reasons for their resignation.
What would you do? Do you cave in and spend the next 3-5 years migrating everything to microservices, or can you think of another option?
Migrating to microservices isn’t always possible
Another thing to consider from the start is that migrating to a microservices or microfrontend architecture isn’t always possible. For example, in the case of native mobile applications, the app needs to be built into a single binary so it can be shipped to users’ devices.
Mobile applications are using monolithic architectures, and there’s no going around that. Sure, one can split the application into strictly separated modules and then have different business organizations or teams own and build their modules, which are later assembled at the final build stage. I would argue that you end up with the same issues as with a single monolith into each of these modules anyway.
Thus sometimes, as it’s the case for mobile applications, migrating the codebase away from a monolithic architecture isn’t possible at all, and knowing how to optimize a monolith becomes even more critical.
The six patterns to optimize a monolith architecture
Before taking the massive endeavor of migrating away from a monolithic architecture, especially at the large scale I mentioned above, one should spend serious time fixing and cleaning the current codebase.
Monoliths by themselves are not bad: problems occur only when dependency management is poorly done. The first step to cleaning up a monolith is to generate a dependency graph, along with a map of the build process. These two look like the diagrams below.
Then using those tools I just mentioned, you want to run some profiling on the build process and figure out which modules or dependencies are causing them the most issues.
Below are the six patterns I’ve seen work best. Note, I’m assuming the language used by the monolith is static and requires a build step. Some of the problems I mention below will not apply to dynamic languages.
1. Remove build bottlenecks
A common case is when a single module is a dependency for too many other modules, and then the build process hangs and waits for that one module to compile before proceeding, when things could be parallelized better.
In this case, fixing the dependency chain, by removing unneeded dependencies, would make the build process faster.
2. Extract frequently updated code areas into their own modules
Another common pattern is when a giant module with hundreds or thousands of code keeps having only a few lines of code changed in it, which triggers the recompilation of the entire module, which increases the build time unnecessarily.
This is something that can be detected by coding a quick script that would read the git commits for that module, and run a count of how many commits touched a particular set of lines within the last 6-12 months.
The fix for this pattern is to have better modularization, by extracting the lines of code that changed frequently into their own functionality, and storing it into its own separate smaller file or module. This will help the compiler with re-building only this smaller module for the majority of incoming new code changes, and not the initial main module which is bigger and takes longer.
3. Clean up unneeded dependencies
When a module, let’s say A, depends on another module, let’s call it B, then any time that B is going to change, this will trigger a recompilation of A. If that repeats a dozen times in a codebase, you can end up waiting literally minutes on rebuilding modules due to changes in dependencies.
Now, this recompilation is expected behavior, except when the dependencies are unneeded.
Imagine that someone had made a change a couple of years ago which required a library, so that library was included. Later, that change was removed, but the include statement remained, and now you have new builds triggers for no good reason, making everyone lose time, and without anyone realizing.
The solution to this problem is simple: check for unused dependencies, and systematically remove them. Many IDEs these days are telling you when dependencies are not used, and some languages—golang for example—show strong warnings when dependencies are not used.
Although relying on tooling doesn’t fix everything: sometimes you see a method called within an if statement that would always be false and there would never execute, and that’s not something that the compiler can detect. So humans are still your best bet for detecting such cases and cleaning things up.
4. Clean up unused code
You can consider this a corollary of the “unneeded dependency.” Basically, some code was added for whatever reason, a feature that was built and then removed, or an A/B experiment that was never cleaned up, and so on.
Regardless of the root cause, the outcome is the same: the compiler is going to spend time building that code for nothing because it will never get used or executed. Some compilers are smart enough to strip code that they know for sure won’t be executed, but in the case of dynamic/interpreted languages, it’s often hard for the interpreter to know if a piece of code won’t be needed.
5. Enable server-side caching in your CI/CD
There are other ways to speed up build times, which don’t require refactoring the business logic.
One such way is to invest in server-side build servers (i.e. basic CI/CD) and enable the server-side build cache. With this, whether a developer builds the project locally or on a server, the build process will peek into the shared cache to see if there exists a built object for a module based on the latest code version, and then simply download it.
The tradeoff is between build time and network latencies: if it’s faster to query and transfer the cached object than it is to rebuild it, then it’s a win. Otherwise, it’s still better to build it each time. That decision is often automated by the caching system itself so you don’t have to worry about it.
6. Use subviews to mock dependencies
A common problem I see is when a module owned by a team has dependencies on other modules that are owned by other teams. As I explained in the “unneeded dependency” section above, these dependencies might change a lot, which requires the rebuilding of the initial module by that single team, even if their code hasn’t changed. This can be very frustrating for engineers, as they feel they have to endure pain because of changes on code that are not their own down the dependency chain, and they have no power over it.
One way to improve build times in this situation is to create subviews of the application. You would create one subview for each team or group of teams that work on a logical or business slice of the big application. For instance: user reviews, payments, search results, etc.
Each subview would mock its dependencies, and through that, the subview wouldn’t need to build those dependencies each time there is a change in one of them. Now, the team that works on that subview can build and iterate as fast as they need, as if they were working on their own independent codebase.
This does require that the team owning the subview monitors changes in their dependencies, and evolve the mocking in their subview accordingly. However, this doesn’t happen too frequently, generally only once every few weeks at most. It ends up that the time it takes to update dependencies is significantly less than the time that would be lost if you summed all the unneeded build time for all engineers in the team over the same time. So the tradeoff is worth the investment.
Don’t migrate to microservices too soon
So here you go, I’ve covered many possible tools and ideas for monolithic architectures which I’ve seen deployed in production, and which were successful in improving build times and consequently deployment times.
Migrating to the realm of microservices implies a ton of considerations that most engineers don’t realize at first: having to own your DevOps and operations, having to deal with distributed systems in which state is never fully defined, dealing with partial deployments and partial availability zones, and the list goes on. Such migrations should be decided only if the benefits of a microservice architecture surpass the problems caused by the current monolithic architecture.
So next time you hear someone make the argument that “we should migrate this system to service-oriented or microservice architecture,” make sure you ask them first if they have covered all the points I mentioned above, and if the migration is warranted. They will probably give you some handwavy answers at first, or will cite some vague success story from a random company.
Don’t be fooled: keep asking the questions until you hear the right answers. And if the right answers are not there, then it’s time to look at the monolith and apply the techniques I described above one by one, to the entire codebase, and see how far they take you in terms of improving the build times and the overall developer experience.
And if after trying all of them you do hit a wall, then yes, you should consider migrating to microservices or to serverless, but only then. How to approach such a migration will be the topic of a future article, so check my blog regularly to see when it will be out, and join my mailing list to be notified of future articles.
Do you know of other interesting patterns or techniques to improve the build time and the developer experience in monolithic architectures? Post a comment below!
Join my email list
Further reading
- https://www.infoq.com/articles/monolith-defense-part-1/
- https://www.infoq.com/articles/monolith-defense-part-2/
- https://levelup.gitconnected.com/a-look-into-the-modular-monolith-1df3b571c21f
- https://martinfowler.com/bliki/MonolithFirst.html
Image credit: Zoltan Tasi , Lara Jameson
how long is too long for a monolith build and deploy?
in the example you gave below how long do you think the build and deploy would take?
Monolithic architecture developed over 10 years.
Over 500k lines of code.
100+ developers
The number of deployments really depends on your industry and on the expectations of your customers.
If you’re running a B2B company with only a few large businesses as customers, and if for example you’re in a slow industry where tech isn’t really something people care as top priority, then you might get away with poorer engineering practices, because if things break or are delayed, then you only have to do phone calls to a few people and use your established relationships to ease things up. And if needed, you can always have your sales teams give discounts to make up for things.
If you’re running a B2C company and your codebase is serving a large group of individual end users, then in an ideal world, you would build and deploy every few commits just so you would reduce to a minimum the number of changes at any given deploy. This is so you would minimize the risk of any deploy containing one or more critical defect, and if that’s the case, you could detect and address the defect more easily because you can just roll back a few commits to immediately bring the system back up, and then look at a very small set of commits with recently introduced changes which allows you to quickly find the defects. This assumes that you have a top notch monitoring and alerting, and that your devs have a culture of ownership and quality, and are looking at the graphs after every deploy. With this, you’d deploy multiple times per day, and here “multiple times” can vary from 2 to 20 times.
If you can’t afford the things I’ve listed above, then deploying at least twice per week would be a fine compromise. My rule of thumb is that deploying at least once per week is needed to make sure that teams can push features and minor bug fixes without causing too much impact on end users, and therefore aiming for two deployments per week ensures that if you miss one deployment due to whatever reason, there’s still another deployment coming in the next 3-4 days.
With two deployments per week, that means around 8 deployments per month, and from there you can set an SLA at 90% for your org which is roughly 7 deploys per month. If the metric drops below 90%, I would be booking a meeting with the DevOps/tooling team and I would ask them what went wrong last month and what is their plan to get deployments back above the 90% target, and I wouldn’t stop asking them for updates and progress until the problem is solved.