By now you’ve heard of event-driven architecture (EDA): A scheme by which all of the remote calls in the system are replaced by one of four types of asynchronous messages; and the parts of the system only communicate asynchronously.

It’s as if someone told you from now on you must communicate through email instead of walking over to bug someone in the middle of their work.

With that sort of description, I can’t tell if that makes me love EDA more or less.

Anyway, back to our scheme.

Blindly sending emails all the time has its own problem: No one will read thousands of emails in any appreciable time. At that point you may as well just go over and interrupt the person because that would the fastest way to get your answer. As long as they aren’t passed out at their desk from exhaustion. Or maybe they’re in a napping pod. That was — at least in the before times, the most recent addition to startup culture.

But this is not a post about startup culture, it’s about event-driven architecture.

So blindly sending thousands of emails to someone to deal with, and going over and interrupting them every two seconds isn’t going to work either.

So what do we do?

That’s where the theory of constraints (ToC) come in.

Your software system, just like a production manufacturing line, has a natural limit to what each sub-system can do. To put this in concrete terms: Your database can only store items so fast. Your indexer can only index so fast. Your order system can only report so fast.

The theory of constraints provides a method of reasoning and resolving bottlenecks in your system. Unchecked concurrency won’t make your system any faster, and we can use the theory of constraints to help the system run optimally.Identify the system’s constraint(s).Decide how to exploit (make maximum use) of the system’s constraint(s).Subordinate everything else to the above decision(s).Alleviate the system’s constraint(s).If this constraint is no longer a constraint, go back to step 1, but do not allow inertia (“how we’ve always done it”) to cause a system’s constraint

Even though you have an asynchronous and naturally concurrent system; you still have bottlenecks. Your bottlenecks will determine how fast you can actually process data/get things done. Find those bottlenecks and put your resources to resolving that bottleneck. Rate limit/turn off whatever you need to to keep that from being the bottleneck. Fix the bottleneck. Iterate.

Story time: In an Event Driven system I had designed; we elected to use node.js and TypeScript as the technology stack for quasi-custom parsing DSL (It was regex based parsing plus an XML based DSL so customers could add their own — this ability being a core requirement; and we wanted to use existing standards; so regex and XML were it). Node.js is famously single-threaded; and we thought this was a limiting factor when we noticed that in our system, the parsing service had a lot of events waiting on it to finish processing messages. So we focused on speeding up the parsing service; and turned everything else down — it wouldn’t have mattered how fast the rest of the system went, the parsing service was the bottleneck. Ultimately, it turned out that some of the regexes we used in combination with a particular line in our parsing DSL were quadratic. Oops.

Resolving this bottleneck sped up parses considerably, and as a result the whole system sped up.

Blind event-driven architecture failed us; but combining EDA and ToC helped us find joy1.

1: Yes, the title was actually a Game of Thrones reference.

Leave a Reply