Editorials

Eventually Consistent

SSWUGtv – Security Tips
With Stephen Wynkoop
When you are protecting your software assets encryption is a common method. What types of things need to be encrypted? In today’s show and find valuable tips from Security Expert, Patrick Townsend.
Watch the Show

Eventually Consistent
This is a term with increasing importance as companies move toward NoSQL data storage solutions. The term represents the replication model used when persisting data. If the writes to multiple nodes are synchronous, then the data is consistent on all nodes. However, if the writes are asynchronous, then the data across all nodes is considered "eventually consistent".

What are the implications of Eventually Consistent on your applications? This is really not a new concept in many database centric applications. To increase performance I often use a NOLOCK hint or READUNCOMMITTED. In an SQL Server database these kinds of hints or connection settings result in reading data from the database that may be participating in a transaction that has not been committed. The data may not reflect a permanent state.

Generally, for some reports, this is accurate enough. The probability of reading a record that is not finally committed is lower when there are thousands, or millions of rows that result from your query. For those times when complete accuracy is required, we often use a different datastore such as a data warehouse, or data marts. These storage systems are typically updated less frequently reducing the risk of a “Dirty” read.

The same concept applies when working with NoSQL data stores that are asynchronous in replication. This means you could read data on one node that may be different from another node if the replication has not yet completed. In my experience, if I am trying to get the latest data from a data store, it will never be complete. Data is continuously being modified; as a result, anytime I read it could exclude data currently in flight. In this kind of scenario, data across multiple nodes wouldn’t be any different than data that occurred on the same node just after my query completed. Ultimately the data will be on every node.

Some systems vote. More sophisticated multi node data stores will actually perform a vote for a data element. The record is stored on multiple nodes. Each node votes for what the data is, and the answer is provided by a voting algorithm from all nodes participating. I would think if a timestamp is retained for the data it would return the record with the most recent timestamp. Some systems let you define your own voting algorithm.

Is Eventually Consistent good enough for your applications? If not, what are the issues it raises? Do you have processes in place allowing you to deal with data in flight, or variance across time or nodes? Why not share your experience by writing to btaylor@sswug.org.

Cheers,

Ben

$$SWYNK$$

Featured Article(s)
Secrets to Taking Command of Your Own Performance Review Part III
What makes up a Personal Business Commitment plan? Who creates it? Who approves it? How does it fit in the performance rating process?

Featured White Paper(s)
How to Use SQL Server’s Extended Events and Notifications to Proactively Resolve Performance Issues
read more)