Uncategorized

Distributed Database Feedback

SSWUG.Org Virtual Workshop: Query Plans and Processing – TODAY
This morning, January 21, from 9AM – 12PM PST

Do you have slow running queries? Do you have times when your database just slogs along, and find it difficult to determine the cause? Do you need to improve your query writing and database tuning skills. This is the virtual conference for you. And it is this morning…

You won’t find a better instructor when it comes to understanding what makes your queries run in SQL Server. Kalen provides authoritative, in depth understanding about what makes your queries work, and how to improve performance with the tools available right out of the box. Do wait…go register and get in there now.

Featured Article(s)
Queries – Also for the Layman (Part 1 of 3)
OK, so you’ve gotten your data into your database. Now how do you get it out? This session will take you through some real life situations on how to get that query to not only get you your information, but also do it so that it doesn’t kill your system when it runs, uses the indexes you’ve built into the system, get the bugs out of it (trust me, there will be bugs), and run this query on your production systems.

Distributed Database Feedback
I received some feedback from Jeremiah Peschka with more insight about distributed databases and SQL Server. This has been an area of focus for Jeremiah as the Emerging Technology Expert at Quest Software, a company that specializes in tools for SQL Server.

To put his response in context, I have been discussing the potential merits of a widely distributed database in the form of many SQL Server database instances. technically, we can do that today using federated queries. A federated query simply has a view that does a Union All against multiple tables on multiple SQL Server instances. The problem with this method is that it is too immature.

It requires that you write the view, put together the infrastructure for related tables, handle replication, and a number of other requirements.

The idea is to have something as sophisticated as the Distributed Data Warehouse in SQL Server 2008 R2, with a centralized management and querying infrastructure. The difference is that the databases are not on some BIG MASSIVE SAN with lots of expensive servers; Instead the data is spread across lots of small servers with small instances of SQL Server, each handling partitions of the data.

Turns out there has been some work in that area…

Jeremiah Says:

Interestingly enough, Microsoft have done a bit of that with Dryad and DryadLINQ. Unfortunately, Dryad and DryadLINQ are more akin to Hadoop than to PDW. So, what are we to do?

Fear not, there is an answer. HadoopDB (http://db.cs.yale.edu/hadoopdb/hadoopdb.html) is a hybrid of MapReduce and an RDBMS. While there’s a bit of extra configuration and set up, it is freely available and it will talk to any database that can communicate over ODBC/JDBC.

I don’t know how much HadoopDB has been used against SQL Server, but it’s been used extensively with PostgreSQL and now that they’ve abstracted the database layer, it’s seeing more use against other RDBMSes.

Generally I don’t share your personal information in editorials so you don’t get a lot of unwanted contact. In this case, I’m sharing Jeremiah’s contact information (with his permission) in case there are those of you wishing to follow this topic up more thoroughly.

Jeremiah Peschka | Emerging Technology Expert | Quest Software
Editor in Chief, http://nosqlpedia.com
Twitter: @peschkaj
blog: http://facility9.com

Do you have thoughts or experiences you like to add to this dialog? Send your comments to btaylor@sswug.org.

Cheers,

Ben