Remote Databases – Tips from the Trenches
Today I’ll continue the topic regarding accessing data from remote databases. For clarification, in this case we are talking about a Microsoft SQL Server database accessing data from a different SQL Server Database, maybe even mounted on a different Instance of SQL Server. This does not include querying data in other data repositories such as other DBMS engines, or ODBC compatible data stores.
David has been working with SQL Server distributed databases for many years. He wrote in to share some of his insight from managing and integrating different data islands. While there are different techniques for consolidating distributed data, one method is to have SQL Server access the remote data itself. Here are David’s thoughts:
Certainly using a single database is simpler and better-performing than accessing remote databases.
That said, there are a number of reasons why one may need to utilize remote databases. You mentioned some of these in your post (separating confidential data, optimizing admin tasks). Another big reason is having multiple applications from different vendors, each of which has its own database.
I have had to support multiple database environments for a number of years, both in single-server and multiple-server environments. Here are some things I have learned:
- MSSQL (2005 and 2008) seems to do a pretty good job of basic optimization of simple heterogeneous queries, and seems to make use of indexes if available.
- Complex or even modestly-complex queries can be problematic.
- Sometimes it is faster to query data from the remote server to an indexed temporary table, and do a JOIN to the temporary table
- Using VIEWs on the remote server works well: in this way data can be filtered and joined on the remote, before trying to access.
- JOINs that cannot make use of an index can be unbearably slow, especially in a multiple server environment
- Transaction control with remote servers is problematic. If you are updating data in a heterogeneous query, DTC (Distributed Transaction Coordinator) is required—and this is ugly to get working and to secure.
- DTC is not needed for just reading data from a remote server, or for single-server queries, or if only the remote is being updated, with no local data involved
- Transaction control between multiple databases on the same server seem to work fine.
- SNAPSHOT transaction isolation cannot be used against a remote server. If your local database requires SNAPSHOT, switching isolation levels can be tricky (but not impossible).
- While it is true that you must update hard-coded remote server identifiers when a server moves, this is not all that difficult to accomplish with a query that searches for that identifier in sys.SQL_modules
- Updating data on a remote server can behave differently than updating data on a remote database on the same server. I can reproduce, but not fully explain, a scenario in which a simple update on a remote server gets turned by SQL into an agonizing row-by-row cursored operation. (The identical query runs fine on a remote database on the local server.)
If given the choice, I would generally try to avoid intentionally architecting a solution that employed multiple databases. However, utilizing multiple databases can be made to work reasonably well, and some commercially available applications (notably Microsoft Dynamics GP / Great Plains) intentionally do incorporate multiple databases in its core architecture.
Finally, back to your assertion “Don’t write stored procedures only accessing data in a remote database”, I would say:
- A view on the remote server would probably be a better choice than a stored procedure
- A stored procedure on the remote to filter and gather data would probably be a better choice than a local stored procedure to get data from the remote.
- When updates are involved, generally updates should be done on the local database. In other words, create a stored procedure on the remote to do an update, and call from the remote if necessary
- Think about what you are asking SQL to do, and how the optimizer can meet your request: if it involves slurping a lot of data across the wire to do a table scan, of course this is going to perform poorly.
- Said another way, use of remote databases will magnify poor optimization choices that may go unnoticed in a single database.
I have found that Table Functions are also a nice technique for calling data from a remote database in that you can pass criteria when retrieving data. Sometimes this works better than views.
My biggest caution in this architecture is one that David alluded to. When you access data from a remote database, you must somehow identify where that data resides. That identification is hard coded and has to be maintained.
If you don’t access remote data through SQL Server, you must have a different method for handling the data consolidation through replication, data warehousing, or an advanced data tier capable of doing the heterogeneous joins outside of the SQL Server environment.
TOP (10) Worst Things to Do in a Database Progress
Voting has officially ended. I’ll start tallying the votes today…and the results should be posted sometime this week.
Trivia
What was the first version of SQL Server to ship with Enterprise Manager? What made it Unique?
Enterprise Manager originally shipped with SQL Server 4.21a, and was greatly enhanced with SQL Server 6.0. This was the first version of SQL Server Microsoft shipped without influence from Sybase regarding what was released.
What made it unique was that at the time, there were no GUI database management tools packaged with database engines. DBAs simply worked from SQL Scripts or used interactive queries.
Do you have thoughts you’d like to share about this or other topics. You know how to reach me. Send your Email to btaylor@sswug.org.
Cheers,
Ben
$$SWYNK$$
Featured Article(s)
Troubleshooting SQL Server 2008 Joins (Part 2)
In this article, Alexander Chigrik explains some problems that you can have when you work with SQL Server 2008 joins. He also tells how to resolve these problems.
Featured White Paper(s)
Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000
More than ever, organizations rely on data storage and analysis for business operations. Companies need the ability to deploy… (read more)