Editorials

Hierarchical Storage vs. Relational Storage

I had an interesting discussion recently with a colleague regarding data structures and the storage of data. We were comparing object oriented databases storing data hierarchically, hybrid object oriented databases that allow both relational and object storage, and pure relational storage.

The interesting part of our discussion was the performance of the engine based on different kinds of data access. A request to get all sales for a time period was simple to do when using a relational database. It was more complicated when using an object oriented database because each customer had to be interrogated for sales activity.

Although the request to get data was easier to write in an SQL environment, the real difference in storage was exposed as the volume of data grows. When you have Millions, Billions, or even Trillions of records to review the complexity of the query grows. Using NoSQL you have the ability to spread the request to multiple machines where the data resides, thus reducing the cost of the query. In SQL, you can do a similar method by sharding the data into multiple relational databases, thus increasing the overall performance.

As you can see, neither option cannot be optimized to handle volume. One thing I have found is that when you are working with rules that are hierarchically based, performance is much more dramatic when it is stored hierarchically. You arrive at the parent node, and all the data of interest to you is immediately at your fingertips. At that point, the only question is do you need cross hierarchy questions. If not, perhaps an object oriented persistence is a better choice.

We’ll talk about some of the reasons NoSql is not embraced tomorrow.

Cheers,

Ben