Editorials

Service Bus

Today I’m writing about one implementation using the general concept of a service bus to solve writing data transactions to multiple data stores efficiently. The goal is for an application to save data once, and have it routed to multiple destinations reliably, while the application knows nothing about the destinations, and the destinations know nothing other than themselves and their subscription.

I like thinking of a service bus as a publish/subscribe generic utility. It can use anything for hosting what is posted to the queues. The point of it is that your application would register with a service bus to persist data. Your different data stores would subscribe to the service bus, registering to be notified (push/pull, who cares) that data has been submitted.

Using a service bus, the application submits Create/Update/Delete activities to the Service bus, a single location. It doesn’t know anything else for the data modification commands. It doesn’t know success other than the data was received by the Service bus. It knows nothing of other destinations, and really doesn’t need to care.

Any number of service bus consumers have their own instance of the data submitted by the client application. They consume that data on the schedule they consider timely. Using this technique, you could reasonably have multiple clients register for application CUD activities. Only the Service Bus cares about each client’s registration.

A good example would be a web site like an online store. They have a lot of different ways that the data needs to be consumed. First, during order entry they need something that is extremely fast, and doesn’t cross a lot of relationships. A user’s orders fit nicely in JSON objects. A complete order would not be a large amount of data, and could easily be persisted and retrieved as a single object. Locking and blocking are not a concern. The object is already an ACID transaction. Either it saves or it doesn’t; the whole thing; so without using an ACID capable engine, you have the essence of an ACID transaction without the overhead.

There is a lot of data that crosses multiple orders that needs to be managed near real time. Inventory, available products, and other data of this nature crosses multiple customer orders. This kind of data has a best fit in a relational data store. This may have slower performance that the individual orders, in order to maintain ACID transactions. But it is just as important for this data to be persisted.

Another data store that would be common would be some sort of staging tables for an OLAP system, or a large unstructured data system that might be used for data mining for Machine Learning. These data stores don’t need to gather data real time, necessarily. In fact, they probably don’t want real time data.

Using a service bus, all of these different data stores would be able to subscribe to be notified when the client pushes new activity onto the service bus. They consume the data from the service bus at the appropriate rate, while the application moves happily along in pure ignorance. All the application needs to know is if the data made it successfully to the service bus.

It’s important to note that this is not replication, mirroring, or other data duplications techniques. Each data store is completely unaware of each other. They each handle the event independently, so there are no dependencies on one another. Technically, you could use this same method for co-location. In this fashion, the remote consumers could get the same data without holding up the application until everything is replicated to a data store with slower response times.

If you want to be more elegant, a service bus at a remote location could be a subscriber to a local service bus. In this fashion, you could have only one process gathering data across a slower network. Then, it publishes the results at the remote location. Other consumers, such as data stores at the remote location subscribe to the remote service bus publications, performing at high speed. You slower network traffic is now reduced to a single consumer, and routed to multiple remote consumers at high network speeds.

There are plenty of service bus frameworks available. If you aren’t sure where to start, Microsoft has built service bus capabilities in the Azure platform, and it’s really powerful. If you like to host your own, they have packaged the Azure implementation so that it can run under Windows.

Not every application needs a service bus architecture. However, when you find yourself copying the same data from one data store to another, it may be a viable solution.

Cheers,

Ben