Editorials

CAP Compared with ACID

CAP is a database persistence strategy embraced by most NoSQL storage engines. ACID is the most common persistence strategy used by relational database engines. These acronyms define two different approaches to persisting data.

CAP represents three goals of data persistence

  1. Consistency
  2. Availability
  3. Partitioning

Simplistically, you can fully implement two of the three goals of CAP in any system. There are some hybrids that move closer to implementing all three. More likely, they are postponing the impact of partitioned data. This is often described as eventually consistent.

When data is partitioned, duplicated across multiple data stores, there are many techniques for communicating the change. To increase availability with partitions, you decrease consistency in the actual data. However, you can take advantage of the consumption of data to allow the consistency of the data to ultimately be communicated to all storage, so that users are most likely to retrieve the latest data.

Relational databases do not follow the CAP methodology. In contrast, they follow the ACID goals. You’ll need to learn these for any database certification.

  • Atomic – My transactions are atomic. Everything is completed, or nothing is changed.
  • Consistent – Preserves all the database rules of relationships, etc.
  • Isolated – My transactions are not impacted by the transactions of others
  • Durable– Data is retained when a server is disrupted and restarted

While relational databases embrace ACID goals completely, this does not mean that NoSql data stores ignore ACID completely. The main difference is when everything is considered “DONE”.

Take SQL Server for an example. Your transaction is first written to a log. The changes are then updated in a memory image of your table(s). The log is marked as having been committed. At this point, the transaction is complete, and the response is sent back to the client as completed. Ultimately, the lazy writer writes the memory modifications to permanent storage.

In contrast, a CAP system can notify the user of success, when the equivalent of the entry into the transaction log is complete. It is not as concerned that a read in the next millisecond get exactly what was written. It can take a little time to get things synchronized and replicated without slowing the user down.

In many systems this much more than adequate. That’s why we continue to see the implementation of new data engines, and the merging of strategies by existing technologies.

This high level overview of the two different approaches to data persistence might encourage you to further research about data persistence strategies.

Cheers,

Ben