In many cases when your application and/or your team start growing the only way to maintain a fast development and deployment pace is to split the application and teams in different smaller units. In case of teams/people that creates some interesting and not necessarily easier to solve challenges but this post is focused on the problems and complexity created in the software/architecture part.
When you split your solution in many components there are at least two problems to solve:
- How to pass the information from one component to another (f.e. how do you notify all the sub-components when a user signs up so that you send him notifications, start billing him, generate recommendations...)
- How to maintain the consistency of all the partially overlapped data stored in the different components (f.e. how do you remove all the user data from all the sub-components when the user decides to drop out from your service)
Inter component communication
At a very high level there are two communication models that are needed in most of the architectures:
- Synchronous request/response communications. This has his own challenges and I recommend to use gRPC and some best practices around load balancing, service discovery, circuit breakers.... (find here my slides for TEFCON 2016) but it is usually a well understood model.
- Asynchronous event based communications where a component generates an event and one or many components receive it and implement some logic in response to that event.
The elegant way to solve this second requirement is having in the middle a bus or a queue (depending on the reliability guarantees required for the use case) where producers send events and consumers can read those events from it. There are many solutions to implement this pattern but when you have to handle heterogeneous consumers (that consume events at different rates or with different guarantees) or you have a massive amount of events or consumers the solution is not so obvious.
Data consistency
The biggest problem to solve in pure microservices architectures is probably how to ensure data consistency. Once you split your application in different modules with data that is not completely independent (at the very least they all have the information about the same users) you have to figure out how to maintain that information in sync.
Obviously you have to try to maintain these dependencies and duplicated data as small as possible but usually at least you have to solve the problem of having the same users created in all of them.
To solve it you need a way to sync the data changes between different components that could be duplicated and need to be updated in other components. So basically you need a way to replicate data that ensures the eventual consistency of it.
The Unified Log solution
If you look at those two problems they can be reduced to a single one: To have a real-time and reliable unified log that you can use to distribute events among different components with different needs and capabilities. That's exactly the problem that LinkedIn had and what they built Kafka to solve. The post "The Log: What every software engineer should know about real-time data's unifying abstraction" it is a very very recommended reading.
Kafka decouples the producers from the consumers including the ability to have slow consumers without affecting rest of the consumers. Kafka does that and at the same time supports very high rates of events (it is common to have hundreds of thousands per second) with very low latencies (<20 msecs easily). All these features while still being a very simple solution and providing some advanced features like organizing events in topics, preserving ordering of the events or handling consumer groups.
Those Kafka characteristics make it suitable to support most the inter-component communication use cases including events distribution, logs processing and data replication/synchronization. All with a single simple solution by modeling all these communications as an infinite list of ordered events accessible for multiple consumers using a centralized unified log.
This post was about Kafka but all/most-of-it is equally applicable to the Amazon clone Kinesis.
You can follow me in Twitter if you are interested in Software and Real Time Communications.