Editorials

Distributed Databases – One Real World Solution

Have You Heard? (STARTS in 1 day!)
The largest online event for SQL Server, Business Intelligence, SharePoint and .NET is about to happen – it’s just days away! The SSWUG.ORG Ultimate Virtual Conference is next week – it’s easy to register and save your spot – there is even a quick video on the home page of the conference that gives all the details –

But you need to act now – it’s only a few days away. When was the last time you attended a conference (70 sessions, 20 speakers) with a money-back guarantee, no travel, incredible content and full access to presenters? This is the biggest online event, period – check it out!

Watch the video here or Get Registered

(You could even win an Xbox 360 or MSDN subscription!)

Featured Article(s)
SQL 2008 encryption, keeping your data safe on multiple servers
Dealing with database level encryption in SQL 2008

New SelecTViews Show Available –
Ted Malone on the show today talking about all sorts of experience, tips and ideas. Also, Rick Pleczko, CEO of Idera Software joins us to talk about some interesting new developments.
[Watch the Show Here]

Distributed Databases – One Real World Solution
Following is a detailed example of a distributed database system provided by one of our members. It provides insight into some methods for implementing this kind of infrastructure, and some of the difficulties they had to overcome.

One of the key issues they had to overcome was database loss and failover. As you read this response you will note that creating multiple databases resulted in a more difficult failover process.

Thanks, Vimal, for responding with such detail.

Feel free to respond with your own solutions, questions or insights. Send your responses to btaylor@sswug.org.

Cheers,
Ben

Vimal Says:

Interesting note on Distributed Databases. Last year we set out on redesigning our desktop based “Dealer Management System” to a web based system. In addition to challenges like providing a rich user interface similar to the desktop version and accessing local resources like dot matrix printers for pre-printed stationary printing, we also had to solve scalability and availability issues. The web based application eliminated the need of local SQL server at each dealerships, but it also meant that any downtime on our datacenters would halt most operations at the dealerships. Also, a single SQL server based design would not allow us to scale well. We decided to distribute our database load and implemented a custom solution similar to data-dependent-routing. This solution has worked out well.

We implemented a combination of P-2-P replication, disk block based replication and custom application logic to solve the scalability and availability problems. Our goal was to perform a onetime lookup of the dealer during sign-in and redirect the user to the correct site. Once the user is within a site, all access to data and services and served from database and service instance residing on the same site. This reduces the need for costly WAN bandwidth across sites. This forces a design that makes a site standalone that can withstand failures at other sites.

A routing control database stores information about the sites, servers, databases, partition groups (a logical grouping of databases that hold the same set of dealers data of which only one database can be active at any given point of time) and mapping of dealerships to a partition group. This database has very few updates and serves only read requests – hence an ideal candidate for P-2-P replication. We keep two copies of this database per site – for distributing the read workload and also to reduce the need of site failover due to loss of single routing control.

The second type of database worth mentioning is company profiles. This database stores dealer information, their users, their preferences and relationships with various lenders. Data in this database changes more often than routing control. However, updates are far fewer than reads on this database too. Hence this database was replicated across sites using P-2-P replication. Similar to routing control, we keep two instances of this database per site.

The actual databases that host dealers application data are replicated using disk based replication. This forces us to maintain only one active instance of the database at any given point in time. As usage grows, we can create new databases and move dealers data to a new partition group. Similarly, failover involves updating the Routing control database to redirect users to the new instance of database.

Documents and images uploaded by dealers are stored on the file system of a distributed document management system. The document management system stores indexes of documents and associated metadata in a P-2-P replicated SQL server. Since copying files and images can take a long time on WAN, we keep copies of documents synched on all sites. This ensures that most of the documents and images will be available from other sites in case of failure on one of the sites.

Lastly, we have databases that log user activity and databases that store user sessions. These databases are not replicated at all. These database contain transient data hence we maintain separate copies on each server. Nightly jobs pull data from these different databases into a staging server for further processing – before we load into data warehouse.

Some of the benefits we realized were:

  1. There is no single database, server or site that can be a performance bottle neck.
  2. A rare query that might need a table scan now hits only a subset of the data and does not affect the entire user base.
  3. Database or server level downtime does not affect all the dealerships.
  4. Not being tied to a SAN vendor’s replication scheme helped us negotiate a good deal from our storage vendor. We can swap the disk based replication between software based solutions or disk based without affecting the application. Only the failover management tools change.
  5. Typically, the failover site requires processing, power and bandwidth similar to the main site (to meet your peak usage). However, the failover site is not used except for testing the failover settings. Our current setup does not have a designated failover site. All sites are active and serve customers. The infrastructure setup, application code and deployments are constantly tested. Moving dealers from databases to servers to sites also ensure that constant testing is going on for failover processes.

Some of the challenges that we are working on are:

  1. Collecting all the data from different databases, servers and sites into one location for data warehousing.
  2. To ensure uniqueness of IDs across sites and servers, we have to use GUIDs – not a good choice for clustered index.
  3. Third party applications and legacy applications are designed without our design constraints in mind. We end up creating interfaces (with our own ID schemes) on top of the original implementation.
  4. Existing API/interfaces provided by the application to other vendors cannot be changed – we need to maintain backward compatibility. This is not easy since we now have the concept of sites and databases and likely change in IDs to handle global uniqueness.
  5. Reverting back to the original site after a site failover is time consuming and complicated. I am not sure if this will ever be simplified.
  6. P-2-P replication and TDE are supported only on Enterprise editions of SQL server. The licensing cost adds up soon. Actual dealer’s data can be on SQL standard.
  7. P-2-P replication has some bugs. Our application ensure that we do not end up with conflicts due to updates in two locations. However, occasionally we get errors about replication conflicts on the same instance. This stops the replication altogether leaving us in a dangerous spot where data is not being replicated. We are testing out the new SQL 2008 service pack to see if it resolves the issue.
  8. There are use cases emerging up where an owner might own several dealerships. The owner needs the ability to log into our application once and be able to see all data from all his dealerships. This is tricky to implement – since the data for individual dealers can reside on multiple servers or sites. Firing off a query to all databases at once is one solution – but implementing sorting and paging on results from several databases is yet not clear to us.

I would be interested in hearing about your thoughts/recommendations and also from fellow SSWUG readers.