Editorials

Distributed DRI

What do you do when you need to enforce referential integrity across databases on the same SQL Instance, or maybe on a different instance or server altogether? This kind of eventuality occurs often in data warehousing where sharding I used to increase performance, or sharded OLTP databases separated for performance, often seen in MySQL large scale implementations.

Some implementations attempt to have only one instance of all data. For example, all lookup tables exist only in a centralized database. This makes it difficult for all shards to manage referential integrity against the lookup tables because they are hosted on a remote instance.

If your data engine supports remote database access some applications will create triggers to manage the integrity against the remote database. Imagine how fragile this becomes as you hard code the remote data store. Now if you need to move the database for load balancing or any other reason all your remote tables are hard coded.

To make things less fragile, some applications will use Synonyms and Remote Servers in an attempt to reduce the number of objects requiring updating if a database is moved. This approach is somewhat more elegant, but performance in some scenarios is incredibly slow.

Probably the best approach I have seen is to shard some data and replicate other data. Lookup tables which are generally static can be replicated to multiple databases fairly easily. Once replicated, traditional DRI techniques may be used across all database instances. All the remains to be implemented is a replication strategy, resolving all issues for all shard instances. This is the technique used by the Data Appliance Warehouse offered by Microsoft. Data is either replicated or sharded.

Share your implementation strategies for DRI in a sharded environment in comments here, or by Email to btaylor@sswug.org.

Cheers,

Ben