Editorials

Non-Traditional Data Stores

Non-Traditional Data Stores
Why do data warehouses and relational database designs continue to grow when many large companies with massive scale systems have sought alternatives? Companies like Amazon, Facebook, Google, Ebay, etc. have all had to find alternative persistence to keep up with their workload. In our reader’s responses today I think the answer starts to become clear.

The key reason is that every data storage technique has a different behavioral or performance characteristic. There is definitely no one size fits all option. In fact, many of these companies still rely on other kinds of more traditional data storage for their business.

Following are some thoughts from Mark and Jim. Mark wants to know more. Jim wants to know why?

Mark
Having not yet had the time to experiment with either, but considering some cloud DB options right now, I’d be interested to see here if any of your users have experience and could compare SQL Fire with FatDB. Also, does anyone know if either of these can be easily extended (as in SQL Server’s .NET CLR assemblies integration…We currently have some proprietary security algorithms that we make available to the SQL Server engine via the native .NET assembly support)

It would be greatly appreciated if any of your users could shed some light and save me some legwork.

Jim
I am continually amazed at the gullibility of businesses chasing after the holy grail of IT: unlimited low cost computing. Whether it be client/server, open source, or horizontal scaling of databases on low-cost machines, there will always be business leaders who won’t partner with IT to check out the latest silver bullet. Instead, they will mandate a solution on IT, be given acclaim at reigning in IT costs, and be promoted to the next level away the position from assuming any responsibility for the long term consequences of their actions.

I checked out SQLFire and it doesn’t seem not much different than any of other shared nothing databases systems over the last 20 years (e.g., early Teradata or IBM’s shared nothing version of DB2 databases). Shared nothing databases systems rarely scale like the marketing claims. For OLTP processing, they are virtually useless due to inefficiencies in global lock management. For BI/OLAP, it fares better, but generally not great. Sure, you can smear your data over a number of nodes, but what happens if your application SQL workload scales in a different way from how your data is distributed? Now you face inefficient inter-nodal communication with intermediate result sets. You can replicate small amounts of commonly joined data across the nodes, but BI queries can be unpredictable. Basically, you’re faced with replicating large amounts of data to avoid the inter-nodal communication problem. The problem is amplified should you have “monster dimensions” in your data warehouse. One last point to consider on shared nothing: you SQL will run only as fast as the slowest node.

Unless you get the chance to run full volume comprehensive tests against your production data, I would take the marketing claims with a grain of sand. If you are considering a shared nothing architecture, you might also want to consider data warehouse appliances.

Get into the conversation with your thoughts, observations and/or experience by leaving a comment below or sending an email to btaylor@sswug.org.

Cheers,

Ben

$$SWYNK$$

Featured White Paper(s)
Encryption Key Management Simplified
Managing encryption keys is a fundamental step to securing encrypted data, meeting compliance requirements, and preventing da… (read more)