Editorials

No SQL Utilization

There are many different reasons to place data in NoSQL data stores. Most often it is used for performance. You can shard your massive volumes of data, and pass your smaller program to different machines for execution. This is the reverse of having a centralized program pulling from multiple data stores. In this case, the program is passed around (potentially) so that it is near the data, and the results are instead sent to a centralized server, and sometimes servers, for aggregation.

Other popular reasons, however, for using NoSql are simplicity and cost. It may be that you can simply store your objects in a serialized from without modification when saving to NoSql. Sometimes your data may simply be converted to JSON or XML and stored as is.

I have compared and implemented no cost solutions for archival data using NoSql in the Cloud. In this case it was much cheaper to save the data in NoSql instead of in my relational database. The cost for storage was pennies on the dollar, and the performance was actually better, allowing the SQL Server storage to be used for more important OLTP work.

What I lost in the process was primarily the impetus for my article yesterday. When using NoSql, I did not have anything available to tell me record counts, record sizes, etc. I rolled all of that myself because it was needed. I’m just wondering if that isn’t a common need. Certainly others would like to have metadata about their data in order to predict trends, costs, etc.

I understand that it would be difficult to know specifically where that data may be located. However, if you look at NoSql as an engine, regardless of the number of servers or storage devices, there is value in knowing capacity of the overall system, and the current utilization.

Is that crazy? Are there already tools available to make that information known? I haven’t seen them in Azure Table Storage, although that is still a pretty young product. What do you think?

Cheers,

Ben