At this point, I have two(+) years of experience with Microservices, and I’m not an expert, but I have some hard-earned knowledge distilled from working with them (and making lots of mistakes in the process). Here’s what I learned that I wish I had known going into it.
Microservices are not mini-monoliths
Jim Gaffigan has a rather funny skit about (American) Mexican food. Listen to it here before I butcher the punchline. The punchline of the skit is all Mexican food basically consists of a tortilla with cheese, meat, or vegetables. We tend to think of deployable software in that same way. It’s all code, wrapped up with a deployment script, and sent to production. Monoliths are independent complete applications that fulfill a business function. So what’s a Microservice? An independent complete application that fulfills a business function. So why aren’t microservices just ‘mini-monoliths’? The answer comes from the idea that microservices collaborate. A monolith does not rely on another monolith for its uptime, data, or resiliency. It is generally a self-contained view of the world and due to their nature they do not care if anyone else exists. Your company’s website is wholly independent of anything else. More critically though, multiple teams may work on your company’s website. They share code, branches, and a single production pipeline. Microservices, on the other hand, are independent complete applications that fulfill a business function, but doesn’t fulfill more than one. A monolith does.
A Microservice understands that while it is independent, there are possibly zero or more people out there interested in what it has to say, and so it is designed with that understanding in mind. A Monolith is not, and does not have to be. Businesses eventually find out that they wish their monolith was designed to share its information in a de-coupled fashion, but often too late to do anything about it easily.
Microservices are not mini-monoliths; they’re collaborators that operate independently when they need to.
Microservices require a different way of thinking about problem solving
Developers love to write code. We’re so enamored with writing code that we’ll write code even when no one needs us to. We’ll write code to solve nagging problems on our own machines, or to automate silly things, or even write code to solve problems in our households. In fact, I have a new side project to set up a Raspberry Pi as a calendar viewer in my house. This is probably not unique to software development (though maybe it is? Do plumbers re-pipe their houses? Do electricians rewire theirs on a whim?) but the tenor of it is so overdone in software development that we exhort new developers to not write code first.
… And then we ask them to work on a monolith. Monoliths make writing more code easy. It gets to a point where the default state is “find problem”, “write code”, “ship”, without understanding whether or not the problem is best served by a bolt-on or add-on to the existing system. For small things this is not an emergent issue. Those small things can add up, and it will become a problem over time.
For instance, if you’ve ever tried to add a CSV import to any existing system , you’ve probably found out within days that the desired “CSV Import” feature is really a “CSV + Domain Specific Logic” import function, or almost as harmful is if a ‘bulk’ method of inserting wasn’t part of the original requirements; necessitating a change in the API. In a monolith; it’s really easy to write code to add this functionality that has baked in assumptions that aren’t clear, and to potentially change the API your system exposes, or how it presents itself to the user. Because of the ease of ‘just’ writing code, it it easy to rush the implementation without regards to the design. Writing code quickly is not the job; solving problems without causing more problems is the job; and a monolith makes that hard to do.
Microservices, on the other hand, require up-front planning before code is written, every time. Every new service or any change to a service may be able to be coupled with completely replacing that service. Anything that has the potential to change the contract in a system (whether with the user or other services), requires more understanding and up-front design than the same change in a monolith. To go back to our CSV import example; a potential way of doing it with microservices is to have a new CSV importer service stood up that takes in a CSV file; does any Domain Specific Formatting; and emits an event or sends an HTTP request to the correct service and uses its existing API for adding/importing information.
Now, these services are necessarily coupled to each other (though the coupling does goes in the right direction), and since the contract has not been changed for the original service; the guarantees of the original service are kept intact. Microservices make it harder to break existing consumers if done well. The trade-off is more upfront planning is required when designing a solution in a microservices based topology.
Domain boundaries are critical to Microservices success
There are three general flows to microservices (There may be more; but the types are escaping me right now):
1. Microservices that give new capabilities to an existing domain bounded context (the previous example of adding CSV import for a portfolio service as a separate microservice is an example of this — there are several trade-offs to doing that, and it depends on your constraints and desires)
2. Microservices that represent a stateless process (viz. validating a credit card)
3. Microservices that represent a stateful process or interaction (the portfolio service)
Notice that I said nothing about size of these services; and depending on whom you speak to, the size of a microservice is a mystery. I have opinions on this, of course; but the one invariant I’ve seen is that good microservices topologies ensure the lines are drawn at the domain’s “bounded context“. This is a fancy Domain Driven Design phrase that means to split up models and interactions by what they mean. To sales, a customer interaction is quite a different model and mode of interaction than a customer interaction for customer support. By splitting them up by their ‘context’ (and the boundaries being sales and customer support), the software can maintain independent ideas of how to interact with a customer depending on the context.
For microservices, this typically means that your customer support portal will be a different bounded context than your sales funnel; even if they share the same properties of a customer (at least demographically). There are three ways to handle the above problem:
Method 1: Set up a separate service with an independent customer model for each service (sales, customer support), and one created in one system is not necessarily referenced elsewhere (or it can be; customer_id, customer_support_id, sales_id)
Method 2: Set up a “Customer” service, a sales service, and a customer support service, and both sales and customer support get customer information from the “customer” service.
Method 3: Set up a customer service, a sales service, and a customer support service; and sales and customer support have duplicated data (received through events) of things that happen in the customer service, but they maintain their own disparate models for what a customer means to them. From a system perspective the internal identifier is the same; how it’s used varies from system to system. This means having a customer service that has demographic information; a sales service that may or may not have this same demographic information but adds on sales context, and a customer support service that maintains this duplicate information but adds on its customer support pieces.
Each method has its own trade-offs; but you can quickly see the maintenance issues with each:
- Method 1 has three different representations of a customer; and potentially at different states in each service (a sales person sees a customer before they’ve signed on the dotted line, and a customer support person always has a “post sale” view of the customer. This is OK until you want sales to have the customer support information; and then you need to do a bit of juggling to ensure a customer from a sales context is indeed the same customer in a customer support context.
- Method 2 allows there to be one representation of a customer; and each service can either “add-on” to this representation of a customer; but each downstream service is still beholden to the customer service; and which context does that live in? Both. There is also a temporal coupling factor as each service “gets” demographic information from the customer service.
- Method 3 allows each service to be de-coupled from the “customer” service. It allows each service to add its own data to what it means for there to be a customer; and it allows each service to change independently (since each service will emit events it can listen to to update its model if it wants to). But this also means having a unified contract of what defines the demographics of a customer; and ensuring each service is set up to listen to events pertaining to customers, and each service appropriately handles being down if a customer event is emitted (event sourcing is a possible solution here).
None of these methods are “ideal” from an “easiest to develop” standpoint; and they have different levels of maintenance requirements. The one crucial decision that a team must make is what is the domain context, is this <thing> I’m dealing with talked about differently depending on who I talk to, and what is the maintenance cost of each approach.
If the team chooses method #1, then they have a lot of distributed systems problems that aren’t easily solved; they’ve made interacting with the system harder. If they choose #2, then two services depend on a third (not really ‘independent’ at that point), and they’ve added an Request/Response dependency between services that may not need to exist (And is harder to debug). If they choose approach #3, they have quite a bit of upfront work (defining contracts; defining patterns), but the maintenance work, reasoning about how a service interacts with another service, debugging, and future expansion is far easier.
Developer Tooling doesn’t support Microservices as well as Monoliths
We have about 25 years of experience as an industry creating tooling around building and deploying software; though it’s only really in the last 15-18 years that the tooling has accelerated. But, even at 18 years of experience, we have pretty solid tooling around developing and debugging monoliths. Debuggers and IDEs take monoliths for-granted, as they likely should. If you write microservices that depend on other microservices over REST, you’re going to have a bad time debugging services locally. Your choices range from standing up the parts of the system that collaborate, or mocking out external dependencies, or dockerizing the system’s services so that they can be stood up independently. Of course, once you do this you’re diving into mixed networking land for Docker; and there’s not a lot of tooling that can make that experience seamless. A service running outside of docker that you’re debugging is hard to set up to work with services running inside of a docker network, or vice versa. Front-end development is even worse; as node.js is a requirement for building front-ends these days. Try live-debugging with docker for your UI where the source is kept locally. Not fun. Teams handle this problem in different ways; but the point is this problem exists, and the solutions are not as mature as debugging a monolith.
If you use microservices, you need to allocate a sizable chunk of time to building the tooling necessary to allow people to develop against those services.
Deployment requires better tooling with Microservices
Deployment considerations are key if you want a fast moving organization. You can’t respond to change without being able to change your software quickly. Even if you can develop changes quickly, if you can’t deploy them quickly you aren’t a fast-moving organization. Continuous Integration (CI) and Continuous Delivery (CD) is essential to being able to respond to change. These products reflect that the deployment view of the world is monolithic in nature. Source control is built for it, CI/CD systems are built around it; and pretty much every commercial CD system is built with monoliths in mind. There are several deployment models where microservices are used; and none of them have good tooling for microservices.
- Deploy on-premises as a packaged solution
- Deploy to the cloud independently
- Deploy to the cloud as a packaged solution
If you sell your product to customers, and they run it in their own data center, deployment method #1 is what you often deal with. Your solution must be packaged up and deployed together as a single unit. Should this necessitate that you develop as a monolith? No. It shouldn’t. However, if you have microservices, you necessarily have multiple deployable artifacts (whether they’re contained in a mono-repository (all services in one source control repository) or micro-repositories (each service in its own source control repository) is a separate matter), and your CD pipeline must take that into account. The trade-offs change whether it’s a micro-repository or mono-repository; but they still exist as problems not solved by current tooling. For instance, tagging master or a release branch with what is in production; or your promotion model to different internal environments; or even local deployments need to be taken into account by the tooling. If you choose method #2 and combine it with continuous delivery, some of those trade-offs go away; as you can make a rule that the latest in master is always pushed to internal promotion environments; and the only tag happens after a particular commit has been pushed to production; but again, tooling is still lacking to make this a seamless experience.
Microservices deliver on the promises of Object-Oriented Programming
I didn’t understand the hype of object oriented programming. I understood the fundamentals of encapsulation, abstraction, inheritance, message dispatching, and polymorphism, but I didn’t understand why they were so useful (I started with Perl, and then moved to Java, so I had nothing to compare Java’s OO nature to. At the time it just seemed like more work to do the same things I could do in Perl. Ahh, youth). The SOLID principles helped later on, but I always felt like there was more hype to OO than actual benefit. After several jobs maintaining and creating Object-Oriented solutions, I was convinced that Object Oriented Programming was a pipe-dream. To the 80% of us who are not “expert” programmers, it is a fad we can never make full use of and it causes more harm than good.
That was until I started researching microservices. This was it! A fully independent object that had agency that could collaborate with others; but encapsulation was ensured! The Open/Closed principle was a requirement! Single responsibility was almost ensured just by the nature of the service! (It says “micro” on the tin) Inheritance was far simpler — consume what the service gives you and modify it to suit your needs (the CSV example above). You couldn’t share information unless you had a common contract and used some sort of message dispatching!
This was absolutely huge for me. All of those principles that I’d been trying to bring to reality for years in codebases I’ve worked on were here — and best of all they didn’t have the downsides of OOP in practice! It’s really easy when modifying code to do something that breaks encapsulation, and business pressures make it even easier. With Microservices, that was no longer possible. Sure, other business induced pressures might cause problems, but they couldn’t alter the contract of a service; and that allowed the system to be reasoned about in ways OOP promised. Perhaps best of all, microservices put up guard rails that keep the mistakes of OOP from happening, and we’re all better for it.
Contracts, Patterns, and Practices should be Code generated
If you do something once, do it manually. If you do it twice, write down the steps, and do it manually. By the third time, automate it. Producing even a dozen services means either manually enforcing the structure of contracts
(the format by which services communicate with each other or to the user), patterns (how you structure common infrastructural concerns), and practices (how you write software) or code generating it for commonality. If you don’t code generate it, entropy wins. Even across features services start to do the same thing different ways; or you find new patterns for structuring your events, and depending on which service you’re in, you could see a different pattern. It’s untenable from a development and maintenance perspective.
Method #3 above shows a world where the Customer Service emits events when a customer is added or updated; allowing interested services to listen for changes and update their own data stores as necessary. Without code generation this would be a tedious process filled with error. With code generation and schema defined models; this is a viable development model.
There are only two sane paths; package the commonalities (which can really only be done for dependencies) into utility functions, or code-generate everything.
Packaging utility classes/models (like the customer model and the events above); is a valid approach. The concerns with using it are taking on dependencies (even internal ones); the overhead of internal infrastructure; and the fact that every service would be required to be in the same programming language.
The latter path (code generation) is exactly what Michael Bryzek advocated in his talk Designing Microservices Architectures the Right Way and coming from trying the other paths (packaging common functionality, and doing it manually), I can see its utility. The trade-off, of course, is that developing the code generation tooling is a heavy investment of time. It requires discipline to develop this tooling first without trying to develop features; and it would likely result in no visible movement on things the business cares about (features, revenue, etc). It also ensures that as long as you have tooling to support that language, you can implement those models in any language you’d like.
You can’t punt on non-functional requirements
There are lots of non-functional requirements in a system that never appear on the roadmap, are never spoken about at the sales meetings, and are only tolerated by the product manager. Things like a user should be signed out after fifteen minutes; or the authorization system should incorporate roles and location; or some data is transient and not part of the backup strategy, and other data needs to be backed up every minute. Or, the system must allow 5000 concurrent users at a time. Those are non-functional requirements; they’re qualities of the system that aren’t part of the user-facing features being developed.
In a Monolith, there’s typically very few places to go to implement a non-functional requirement, and as we’ve discussed previously, IDE tooling is built for the refactoring necessary to ensure a change takes place everywhere it’s needed (only for the statically typed languages; the dynamic folks have their own problems to contend with), and even if you have to implement a new feature, there’s generally one place to do it.
Not so with microservices. If you implement authorization, you must implement it across all services. If you implement a timeout, you must implement it across all services. Unless your microservices are across hosts, any performance improvements must take into account that each service may share host resources with one or more other services. If each service is using the same server instance (i.e., every service that uses postgres shares a postgres server instance, even if they’re separate databases in that instance), then performance tuning and backups must take that into account. This greatly complicates matters of performance tuning and dealing with non-functional requirements; and for the system to be easily built, those non-functional requirements need to be known at the beginning! Every delay in implementing a non-functional requirement makes it more likely that some disparate changes will need to be made across several services; and that will take much longer once the services are built.
Event Driven Programming makes microservices work
In firmware programming, the finite state machine and events got me through the day. Each peripheral has separate states; and those are triggered by events that may happen from user input or other peripherals (for instance, seeing a bluetooth advertisement from a whitelisted address may trigger a connection). Since firmware by-and-large sits on a single core System-on-Chip with limited use of or no threads at all, using an event loop and finite-state-machines are one of the best ways to make firmware work.
Finite State Machines coupled with Event Driven programming also has other nice properties that parlay well into microservices: events ensure each service is de-coupled from the others (there are no direct request/responses between services); and Finite State Machines dictate what happens based on the current state of the service plus its input. This makes debugging a matter of knowing which state the service is in, and what input was received. That’s it. This greatly reduces the complexity in standing up and debugging services; and allows problems to be de-composed into events and states. If you add event sourcing into the mix, you have an event stream that records the events that occurred, so playing back issues is as simple as replaying events.
This is possible because microservices operate on network boundaries. In a monolith you’re forced to debug the entire monolith at once, and hope someone didn’t write code with disastrous side-effects that are impossible to find through normal means. It’s easier to find a needle in a small jar of needles than a giant haystack, and that’s possible because of the observable boundaries of microservices and using patterns that limit the amount complexity that allows you to arrive at a certain state.
If you’re going to start writing microservices; I highly recommend going down the path of event-driven programming, state machines, and some sort of event stream (even if you decide against event sourcing).
Choosing between REST and Events for supporting Microservices is tougher than you may think
If you’ve read the fallacies of distributed systems, then this section almost writes itself. Microservices are distributed systems, no matter how you shake it. One of the major problems when communicating across a network boundary is “is that service down, or am I just having a network timeout?” If you’re using REST, this means implementing the circuit-breaker pattern with some sort of timeout. It also means that if your services communicate to services that communicate to services through REST, then the availability in that chain will eventually hover just above zero. (00:00-12:31). As the video rightfully says, don’t do that. I’d go so far as to say that if at all possible; don’t make calls to other services through REST.
If you need data, have the service publish an event, and consume that event. This sounds great; it’s de-coupled, and it’s resilient to failure. However, each service must now have means to publish to a bus, consume an event off of a bus, and support whatever serialization scheme you want to use. Oh, and now you need to be able to debug all of the above. If you want runtime resiliency, you must sacrifice development simplicity to get there.
Maintaining Microservices requires strong organizational and technical leadership
“The business” does not care what the topology of your system is. They don’t care about its architecture, and they don’t care about how easy to maintain it is, any more than you care whether they use Excel or Quickbooks for forecasting. The business wants two things (really it’s n of 9 things) but work with me here:
- Increase Revenue
- Reduce costs
They believe more features will increase revenue. It’s a fair belief (correlation does not imply causation), but more features also increases development costs. To “the business”, the way to solve this problem is not by reducing the costs, but by increasing revenues. Again, this is also fair, and in a good number of cases is the right path.
Earlier, I mentioned that microservices keep those nasty shortcuts that cripple development teams from happening, and that’s a good thing, but, to the business, it can also be a bad thing. See, that crippling shortcut may never happen; but adding that feature (to their way of thinking) will increase revenue. If they have to choose between helping revenue but possibly hurting future maintenance, or delaying that feature by several weeks but helping future maintenance, they’ll pick the path to fastest revenue, every time.
The person or people that keep this from happening are hopefully the organization’s CTO and engineering leadership (VP or Director of Engineering, the Architect, and senior leaders of the team). They’re the people with the cachet and experience to know when this is going to hurt future maintenance, and they hopefully know enough to know it’s probably not a sure revenue bet either. But this requires discipline and trust on the part of the engineering leadership team. They must have gained the trust of the business by delivering what the business wants in the timeframe they want it; and they must be disciplined enough to stick to their guns. If someone says, “Well, we could do this in a week if we just hooked Service A up to Service B’s database”, you have now failed with microservices and are maintaining a future monolith. You’ve also lost the advantages of working with microservices.
Shortcuts are easy to say yes to, and shortcuts can greatly endanger the maintainability and health of a development team and the system.
Microservices are a technical solution to an organizational problem
While developers and consultants tend to espouse microservices in a cloud scenario, they tend to ignore that microservices are orthogonal to their deployment scenario, and they’re orthogonal to technology stacks. Take away all these advantages of microservices; and you’re still left with a topology that allows you to segment teams along domain boundaries, and have those teams operate independently of one another. At a small enough scale, you could even have individuals own services and scale out your feature creation to the number of people in your development organization. The Mythical Man month states that adding people to a late project makes it later; and it says that because those people have to communicate with each other. What if they didn’t? or what if you could reduce the amount of communication needed to ship a feature? Microservices let you do that. (I fall firmly in the micro-repository camp as well, so I’m about to conflate the two on purpose). Microservices development means independent repositories, and less issues with merge conflicts, branching, or collaboration needing to happen to push out a particular feature. It also means fewer avenues for the feature to clash with existing features; since by definition the service is independent and autonomous. It means fewer parts to reason about, and that results in faster development time.
Microservices (when architected well), let you go faster and further than you otherwise could, with less need to put organizational guardrails on the development team (code reviews; gated checkins, code freezes) to resolve team performance issues. It minimizes the effect a single developer can have against the whole system. This is a great benefit if the organization does not hire well or pay well (and if every organization did, we’d have a low turnover rate in software development), as it substitutes technology for some of the human training and improvements that organizations should do but don’t do.
If you have all top-notch performers in a high-performing engineering organization with a high performing business with no turnover, you don’t need microservices because you’re not going to make the mistakes that microservices would fix. If, however, you’re in an organization that consists of humans that are fallible, microservices provide a benefit to development that monoliths cannot.
Microservices are another tool to help make software development better and to make systems easier to maintain. They provide many benefits and have many trade-offs with traditional monoliths, and it’s rarely clear whether or not a system should be developed as a monolith or as microservices. There are several factors that can steer the choice towards one or the other; but those factors depend greatly on the individuals, organizational leadership, business model, constraints, and politics of the organization implementing those services.
These are the things I wish I had known when I started with microservices. What do you wish you had known about Microservices before working with them?
Note: Special thanks to Adam Maras for spending part of his weekend giving me feedback on this post.
7 thoughts on “The Realities of Microservices”
Great post, thanks!!
Great post. I agree that microservices, despite of hype around them, have a number of disadvantages, and it is not always justified to use them.
I was excited and enthusiastic about this article until I got to the section titled “Choosing between REST and Events”. Your advice, to generally prefer asynchronous events over RPC, is precisely the opposite of my (and others’) hard-won experience. This post elegantly summarizes my position: https://programmingisterrible.com/post/162346490883/how-do-you-cut-a-monolith-in-half
Great article thanks George. In your experience, do you embrace or avoid distributed transaction handling in a microservice centric architecture? When do you reassess a microservice for further subdivision? What policies have you found to have worked to mitigate against data contract variations?
This is where I have opinions about the ‘sizes’ of Microservices. I’d rather design the system so there isn’t a need for a distributed ‘transaction’ (in the database sense of all or nothing). That is, if an ‘insertion’ is in one domain’s bounded context, it should belong in one place, in a single service. I think publishing events that other services can use to update their own representations or update for their own needs is a possible method of handling the communication of change in a system. I’d prefer to keep the services de-coupled and for them to communicate asynchronously. A distributed transaction (by its nature) is synchronous and blocking; and I’d prefer to avoid architecting that state into the system.
This is where I also love embedded systems; they work their magic off of event driven systems that pass messages asynchronously, and to me that’s a good architecture that is easy to reason about and can scale past a few services.
“Well, we could do this in a week if we just hooked Service A up to Service B’s database”, you have now failed with microservices and are maintaining a future monolith. “
Well in sometimes we have to support the business, hence can’t that be Ok and considered as a technical debt to be sorted later ?
Excellent post. Thank you for your effort.