Solving distributed data problems in a microservice architecture

To deliver a large complex application rapidly, frequently and reliably, you often must use the microservice architecture. The microservice architecture is an architectural style that structures the application as a collection of loosely coupled services.

One challenge with using microservices is that in order to be loosely coupled each service has its own private database. As a result, implementing transactions and queries that span services is no longer straightforward.

In this EDA Summit presentation, you will learn how event-driven microservices address this challenge. Chris Richardson, creator of microservices.io, describes how to use sagas, which is an asynchronous messaging-based pattern, to implement transactions that span services.

You will learn how to implement queries that span services using the CQRS pattern, which maintain easily queryable replicas using events.

2021 Summit Developers

Speaker

Chris Richardson

Creator of microservices.io

Microservices.io

Transcript

Hello. Welcome to my talk on event driven microservices. In this talk I describe how the need for loose coupling in a microservice architecture makes it challenging to implement transactions and queries. You will learn how to overcome those challenges using patterns that are based on events. But first, a little bit about me. I've done a number of things over the years. Most notably, I wrote the book pages and action that came out back in 2006 that was all about how to build applications with spring and hibernate. And then in 2008, I created the original Cloud Foundry that was a PaaS for deploying Java applications on AWS. These days I focused on the microservice architecture. I wrote the book microservice Services Patterns, and I help organizations around the world adopt and use microservices successfully through consulting and training.

If you like, you can get discounts on my book and my online training.

Here's the agenda for my talk. First, I'm going to describe the essential characteristics of the microservice architecture and describe why services must be loosely coupled. And then I describe how the need for the loose coupling makes it challenging to implement transactions and queries that span services. After that, I describe how to use events to implement transactions. And then finally, I describe how to implement queries using events. So let's get started.

The microservice architecture is an architectural style that structures an application as a set of services. The services are loosely coupled. Each service is owned by a small team and each service is independently deployable. And the lead time for each service, which is the time between the developer committing a change and that change being deployed into production must be under 15 minutes. So why use microservices?

Back in 2011, Mark Andresen used the phrase software is eating the world. What this phrase means is that a business's products and services are increasingly powered by software.It doesn't matter whether your company is a financial services company, an airline, a mining company. Software is key to how you operate your business. Now, the challenge is the world is becoming increasingly volatile, uncertain, complex and ambiguous. Sadly, there's no better example of that than COVID, which has been the ultimate disrupter. Because of the dynamic nature of the world, businesses need to be nimble, they need to be agile, and they need to innovate faster. And because software is powering those businesses, IT must deliver software much more rapidly, frequently and reliably. You achieve that using what I call the success triangle. You need a combination of three things process, organization and architecture. The process is DevOps, which embraces concepts like continuous delivery and deployment, and delivers a stream of small, frequent changes to production.

You need to structure your organization as a network of autonomous, empowered, loosely coupled, long lived product teams. And you need an architecture that is loosely coupled and modular. If you're developing a large, complex application, you must typically use microservices. That's because the microservice architecture gives you the testability and deploy ability that you need in order to do DevOps. And it gives you the loose coupling that enables your teams to be loosely coupled. That's an application of Conway's law. So, I've talked a lot about loose coupling, but what is that exactly?

Services in a microservice architecture naturally collaborate. For example, in the order and customer example that I used throughout this talk to create an order, the order service needs to reserve credit in the customer's service. As a result, these two services must collaborate. This results in coupling, which is the degree of connectedness between the services. However, while some amount of coupling is inevitable, it's essential that you design your services to be loosely coupled. To understand why, let's look at the different types of coupling.

The first type of coupling is runtime coupling. Now let's imagine that the order service reserves credit by making a PUT request to the customer service. And this design, the order service, cannot respond to a POST request until it receives a response from the customer service. While this seems simple and in some ways natural, it's actually an example of very undesirable tight coupling. The create order end point requires both services to be available, which reduces availability. In fact, the very common anti-pattern is a brittle architecture that has many long chains of service calls.Now each service ends up being a point of failure.

The second type of coupling is Design-time coupling. Because the order service consumes the API of the customer service, there is Design-time coupling in this example. There is a risk that a change to the customer service that impacts its API will require lockstep changes to be made to the order service.Lockstep changes are undesirable because they require teams to code and eat their work, which reduces productivity. The third type of coupling is infrastructure coupling. If both services are deployed to the same infrastructure, then they are competing for the same resources. As a result, there is a risk of resource starvation. For example, the customer service might consume all available resources and prevent the order service from functioning.

There is also the risk that an erroneous configuration change for one service will make all services unavailable. The need to minimize these types of coupling is one of the main considerations when defining an applications, Microservice architecture. In particular, it has a dramatic impact on how databases are used in a microservice architecture. Let's imagine that you refactor your monolith to services but left the database unchanged. In this partially refactored architecture, the order service reserves credit by directly accessing the customer's table. It seems simple, but there are two kinds of tight coupling. First, there is Design-time coupling. If the team that owns the customer service changes the customer table, the order service would need to be updated in lockstep. Second, there is infrastructure coupling. Both services are competing for the resources of the single database server. Also, a misconfiguration could impact both services. In order to ensure loose Design-time coupling, services must only communicate via APIs. They should not share database tables. Each service must have its own private database.

Furthermore, each service should ideally have its own database server in order to minimize infrastructure coupling. Using a database per service. Avoids Design-time and infrastructure coupling.
The challenge, however, is that there are commands and queries that span services. In this example, the create order command spans both databases. And there are queries such as find order for customer that also spans databases. We need a way to implement these commands and queries in a loosely coupled way. You might be tempted to use traditional distributed transactions, which are also known as two phase commit. Whether this is straightforward depends on your technology stack. For example, Java microservices can use JTA. However, one major drawback of two phase commit is that, it is a form of runtime coupling. All of the participants must be available in order for the transaction to commit. Another issue is how to implement queries that span services. You might be tempted to simply write SQL statements that join across tables belonging to multiple services.

On the surface, this seems simple. However, these kinds of queries bypass service APIs violate encapsulation and result in Design-time coupling. A solution to both these problems is to use an event-driven architecture. And the rest of the section, I'm going to define what I mean by an event and then describe the basics of how they are used in a microservice architecture. After that, I'll look at how events are used to implement transactions and queries.

Conceptually, an event is something notable that happens in a domain. If your domain is e-commerce, then examples of events include order created or canceled. If your business is an airline, then events include flight departed, flight landed, flight arrived at the gate. From a domain-driven design perspective, an event is emitted by an aggregate, which is another term for business object, a command creates, updates, or deletes an aggregate. The aggregate responds by emitting an event. An event is also a type of message delivered by a message broker. My favourite model of messaging comes from the book Enterprise Integration Patterns, which is a great collection of messaging patterns. A sender sends a message via a message channel and a recipient reads from a message channel. A channel is an abstraction of the capabilities of the message broker. There are two types of channels point to point which deliver each message to a single recipient and pub sub, which deliver messages to all recipients. A message consists of headers, also known as metadata and a payload. There are different kinds of messages. There are events which communicate the occurrence of something notable. There are also commands which are requests to the recipient to invoke an operation and reply messages that contain the outcome of invoking the operation.

In a microservice architecture, services publish and consume events, the events published by a service are part of its API, along with the more traditional commands and queries. And a service might consume events published by other services.The events are transported by a message broker. Now, since the message broker plays such an important role, it needs to be highly available and scalable in order to avoid any concerns related to infrastructure coupling. Also, the patterns for transaction and queries that I discussed in this presentation are easier to implement if the broker implements certain features, including, at least once delivery, order delivery, and durable subscription which preserve events when a consumer is down.

As I mentioned earlier, an aggregate emits events in response to a command. In other words, the service handles a request by updating a business object and publishing an event. In order for requests to be handled reliably, the service must atomically update the database and publish an event. If only one of these actions occurred, the application would be in a permanently inconsistent state. The traditional approach is to use a distributed transaction spanning the database and message broker. However, since that's another form of runtime coupling that's best avoided. Instead, a service can use the transactional outbox pattern. First, the database transaction that updates the business object inserts the message to send into an outbox table. Since this is an acid transaction, it's inherently atomic.

Second, another process reads events, inserted into the Outbox table and publishes them to the message broker. This pattern guarantees that if the service updates the database, an event will be published. Let's now look at how events implement transactions and queries. I will first describe how to implement transactions. After that, I will describe how to implement queries. In a microservice architecture, I'd recommend implementing transactions that span services using the Saga pattern. This is not a new idea. The idea of a Saga was first proposed back in 1987. The key idea is that instead of a single distributed transaction that spans multiple services, you use a Saga. A Saga is a sequence of local transactions that are coordinated through the exchange of messages. In other words, the service performs an update and then sends a message that triggers the next step. Here's how you would implement the Create Order operation using a Saga.

First, the Order Service gets a request that initiate the Saga. The first step of the Saga creates the order. It creates the order in a pending state, which indicates that it is in the process of being created. It's actually known as a semantic lock, which I don't have time to talk about, but it's one of the key issues with Saga design. The Order Service then sends back a response containing the Order ID. It also sends a message that triggers the next step, which occurs in the customer service. This step reserves credit for the order and then assuming that the customer's credit can be successfully reserved. The customer service sends a message which triggers the third step in the order service. This third and final step in the Order Service changes the state of the order to approved. So instead of one global transaction, there are three local transactions, two in the order service and one in the customer service.

An interesting challenge when designing Saga, is handling failure scenarios. If a business rule is violated, an application can simply roll back and ask the transactions. Any updates that were previously made simply disappear. A saga cannot automatically roll back when a step fails. Since the change is made by the previous steps have already been committed. A saga must instead use compensating transactions to undo what was done previously. Here's how the create Order Saga uses compensating transactions.

If the reserve credit step fails because either the customer ID is invalid or the customer has sufficient credit, the Saga executes a compensating transaction to undo the creation of the order. This compensating transaction performs a semantic undo by changing the state of the order to reject it. Abstractly, we can view a Saga as a sequence of steps one: T1, T2, T3, and so on. Each step, except the last, has a corresponding compensating transaction. C1 undoes what T1 did, C2 undoes T2, and so on. The T transaction are executed one after the other. If a T transaction fails, the C transactions for the preceding steps are executed sequentially in the reverse order. A Saga's implementation logic needs to coordinate the execution of these steps.

There are two coordination mechanisms. The first is orchestration, where a centralized orchestrator tells the participants what to do. Orchestration-based Saga use command and reply messages. If you want to know more about those, please look at either my website book or online boot camp. In this talk, I'm going to focus on the second option choreography-based sagas, which is a decentralized approach. A choreography based Saga uses events to coordinate the execution of the steps. Each step of the Saga updates an aggregate and publishes an event. That event is consumed by an event handler in some other service, which updates an aggregate, publishes an event, and so on. Let's look at the choreography based version of the Create Order Saga. When the Order Service receives a request to create an order, it creates the order and publishes an order created event. It sends that event to an Order Events Channel, which is a published subscribed channel. One of the consumers of that channel is the customer service. It receives the order created event and attempts to reserve credit. It then publishes an event indicate the outcome to the Customer Events channel. It publishes either a credit reserved event or a credit limit exceeded event. One of the consumers of the customer events channel is the order service. Depending on the event type, it will either approve or reject the order.

As you can see, events enable you to implement transactions that span services in a loosely coupled way. As you might expect, however, choreography based Saga are not a silver bullet. Sometimes orchestration based Saga are a better choice. Moreover, Saga, in general, are not a silver bullet either, and have both benefits and drawbacks. Let's now look at how to implement queries that span services using events. As quite common in a microservice architecture to have queries that retrieve data from multiple services. For example, the query find orders for customer, returns the orders along with the information about the customer.

There are two query patterns. One option is API composition. This pattern implements a query by simply issuing sub queries for each of the services that contain the required data and then joining the results together. While it is relatively simple, it has two main drawbacks. First, since multiple services are invoked, there is a high degree of runtime coupling. Fortunately, we can reduce the impact of that coupling by using patterns such as the circuit breaker and implementing fallback strategies for each of the participants. The second drawback is that some queries cannot be efficiently implemented using API composition. They would require too many roundtrips or transfer too much data over the network. In those situations, it's often best to use the second option, CQRS.

CQRS stands for Command Query Responsibility Segregation. Instead of a single data model for commands and queries, you have a different model for each one. Applying CQRS in the microservice architecture, means that you have a dedicated data model or in other words, a replica for a query or a group of related queries. That replica is designed to implement those queries efficiently. The replica is kept up to date by subscribing to events that are published by the services that own that data. One key design decision you must make, is the type of database to use for the CQRS replica. The database only needs to efficiently support a specific set of queries along with the updates that are made by the event handlers. And since the goal is performance and scalability and no SQL database is often a good choice. If the goal is to store blobs of Jason for a REST API, then a document database like MongoDB or Dynamo DB is a good choice. If you need to implement text search queries then a text database such as Elasticsearch is often a good fit.

Alternatively, you might even use a graph database for social network style searches or financial fraud analysis. There are also scenarios where a relational database is a good fit. For example, a traditional business intelligence tool can query a CQRS view implemented by a SQL database. There's a lot more I can say about using events and more generally asynchronous messaging. But in summary, loose coupling is an essential characteristic of the microservice architecture. Many of the traditional database patterns suffer from tight coupling. Shared database creates tight Design-time coupling. Traditional distributed transactions suffer from runtime coupling.

Consequently, in order to ensure loose coupling, you need to use different patterns. First, you should use the database per service pattern. Each service should have its own database and ideally its own database server. Second, you should use the Saga pattern to implement transactions that span services. Finally, you should consider implementing queries that span services using the security pattern. So that's my talk. Thank you for listening. I hope that you found it useful.