Editorials

Help, My Data is Sharded!

Help. My Data Is Sharded!
I’ve come across a company that has very large databases that work together as a whole. They were separated in order to maintain disaster recovery requirements. In some databases a great amount of the data is modified or replaced daily. In others, the data is static in comparison.

Because they already have their data sharded, they were considering SQL Azure as a hosting platform. SQL Azure has a restricted database size…but you may have more than one database allowing for a large amount of aggregate data through sharding.

This company found that there would be a bit of work required in order to fit into SQL Azure. They were using stored procedures and triggers that addressed other databases from where the procedure or trigger was hosted, which SQL Azure does not support.

One fictitious way to describe the separation would be to have a database containing customers, and a second database containing purchase orders. In order to get a list of purchase orders for a customer a stored procedure in the Customer database would query the purchase order database. Or, perhaps a customer would be deleted. When that happened, a trigger would fire on the Customer table. The delete trigger would then remove all the purchase orders from the Orders database.

The triggers and procedure technique was used because there was no data access layer in their code. All code was embedded in web pages, and nothing was centralized for database manipulation.

As a result of using the sharded databases the company has found this architecture to be extremely powerful, and wish to maintain that aspect of their design. The question remaining is how to replace heterogeneous queries embedded in code or stored procedures (queries accessing two or more databases) with something that follows modern software design techniques.

I have a number of different patterns using different frameworks that solve this design readily. Rather than restrict our readers to my knowledge, let me first open this as a discussion to you and see what ideas you have. Why not write to btaylor@sswug.org and share your ideas or experiences? How would you resolve joining data from separate data stores? This is a real world scenario many companies must solve before deploying large databases on SQL Azure; so the exercise is definitely not in vain.

Please note that the example is not sharding the same data across databases of the same structure. Instead, different aspects of an overall database design are separated into different databases with different subject areas. As a result, one solution that is not viable in this scenario is to use Federated tables supported in SQL Azure.

Cheers,

Ben

SSWUGtv
With Stephen Wynkoop
How do you know if your systems (and your data) are protected? How can you be sure? Patrick Townsend has the answers!
Watch the Show

$$SWYNK$$

Featured Article(s)
Working with PEX Framework
Microsoft’s PEX Framework is an automated white box testing tool that can be integrated with Visual Studio. It has been developed by Microsoft research team and generates test suites with high code coverage. It can investigate and explore the managed code and suggest the tests which a particular test requires. PEX framework from Microsoft provides support for code contracts which annotates methods with conditions such as pre-conditions, post-conditions and invariants which needs verification before running a code successfully. You can test your business and data access layer with PEX and design and develop applications that are robust. In this article we will discuss the basics of unit testing and how we can use PEX Framework to unit test and application.

Featured White Paper(s)
Key Management in the Multi-Platform Environment
Written by Townsend Security

This White Paper discusses the challenges … (read more)

Featured Script
Create a DB Size Trending report
This proc will take the data from TblSizes and create a new table with database sizes and date… (read more)