Replicating Data from Multiple Locations to a Centralized Database

SSWUGtv – Should you Buy a Surface Tablet?
With Stephen Wynkoop
Before you go and buy the Surface tablet, listen to Steve’s thoughts on it — it might surprise you!
Watch the Show

Replicating Data from Multiple Locations to a Centralized Database
Yesterday we started this discussion regarding replicating data from multiple locations into a single centralized database. The question was how to remotely assign unique keys that won’t overlap when merged into the centralized database.

Here are some thoughts and experiences from our readers…

Jason:
Years ago, when we were sitting down to write a new database architecture, the thought came up about merging multiple databases into a single database, with the idea being that we’d sell individual offices our software, then bring in corporate down the line.. So, we’d need an easy way to merge those individual databases together. We settled on using a varchar primary key and encoding so much unique data into it that the chance of a collision was so remote, we could accept the risk.

We had a little function in our code that gave us back a big string of <date><time of day down to the millisecond><username><unique three digit random value>… An example would be ‘20121204102733445janedoe431’ for a record created by the user janedoe, on December 4th, 2012, at
10:27:33:445 am. For there to be a collision, you’d have to have the same user name generate the record at the same time (again, down to the millisecond), with the same randomly generated three digit number. It also had the handy side benefit of telling us when a record was created, and by who, without needing extra fields.

There are some obvious performance issues if you have to scale that up a lot, but it worked for us and the database sizes we deal with.

A question I always had was how much performance benefit/penalty swapping around the parts of the key would give us, ie putting the more volatile items first, with the more static items in the back, and at what point that benefit/penalty would become visible. I don’t know enough about the guts of sqlserver to be able to answer that question…
In a tree in code, it would probably be beneficial to swap something more volatile up front, as long as it wasn’t too volatile (maybe month or something), but again, I don’t know much about the internals of SQLServer.

Mark:
In addition to the domain problem as you presented it, we add the additional business requirement that says that users in one region can and do make transactions for other regions within the data at the home region. We use merge replication with no horizontal partitioning and therefore all data exists at all regions as a result. So our primary key problem also extends to transactions of the same type for the same region being generated at two different locations. Additionally, the users want to keep some similarity of “next IDs” the same. So not only is data uniqueness an issue, the “next invoice ID” needs to be in the same sequence regardless of which copy of the data for a given region contains the transaction.

Our solution has several levels of support for this. The first level is direct connection between regions to get the appropriate next ID. So if a user in Region B, working on the local copy of the data for Region A wants to make a transaction for Region A, the local client application makes a WAN call to the service running at Region A – which calls a singleton at Region A to generate the appropriate next ID for that transaction. From there, all the data will be uniquely keyed with a code for Region A and the new ID. The person at Region B can work in local data at Region B but be doing the transactions for Region A with the same performance that the users at Region B see for transactions at Region B.

That works fine until there are WAN (Internet) outages. We have secondary and tertiary solutions when the network is out. No matter how you look at it, we do not consider this an ideal solution, but it has the least tradeoffs for performance and data integrity.

Ajay:
Add location Id string to regional databases primary key viz. L001, L002 so on.

Constantin:
Use GUIDs as keys.

Thanks for the comments. Tomorrow I’ll share some of the techniques I have used or read about. If you want to send your ideas, drop them in an Email to btaylor@sswug,.org.

Cheers,

Ben

$$SWYNK$$

Featured Article(s)
Tips for using SQL Server 2012 triggers
In this article, Alexander Chigrik shows some helpful tips to performance tune and optimize SQL Server 2012 triggers.

Featured White Paper(s)
How to Use SQL Server’s Extended Events and Notifications to Proactively Resolve Performance Issues
read more)

Replicating Data from Multiple Locations to a Centralized Database

Recent Posts

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Getting Started With Deep Learning in Your Browser Using TensorFlow.js

SQL Server Collation Overview and Examples

MySQL Escaping on the Client-Side With Go

VS Code Gets New Python Language Server, Named After Monty Python Character

How Hello World! changed – top level statements and functions (C# 9)

SSWUG.ORG