Editorials

Why Put Unstructured Data in SQL?

There have been a lot of comments regarding the reason behind storing undefined data in your SQL Server in the first place. I think those are exceptional, reasonable questions you should ask yourself for any data you take the time to store.

Let me start out by saying that there should be rare cases of unknown or evolving data in your database. Many systems have none. This is not a replacement for doing a good job of analysis, or simplifying things for stuff we may dream up at a later time. This is for things that are important to use in a relational way without the complete understanding or query capability of it’s contents.

Often this kind of data would be documents. For example, I worked on a system that was a warehouse of mortgage rates and the criteria for qualification. To support the values we had in our warehouse we stored thousands of PDF, Excel and HTML documents to support our warehouse contents. When questions were raised, we would go back to the source document to resolve any conflict. Do those documents belong in a database? Well, putting them in an SQL Database was an option. You could put them in a blob or filestream object. Putting them in a database kept them in synch with our entire disaster recover plan.

Another system where I stored data in SQL without knowledge of the contents was due to security reasons. We needed to track a number of data points necessary for privacy. Rather than allow it to be broken up in the database, the data was encrypted, and stored in a blob. Again, the reason for putting it in SQL Server was disaster recovery, and ease of getting to the data. We didn’t need another mechanism to go lookup the encrypted data elsewhere. We dimply returned it with the data that was human readable.

A third scenario that exemplifies storing un-structured data in a structured database once again stems for the fact that the unstructured data is directly related to a single record of structured data. The structured data is used for human readable data which is used to configure the building of some software tool. The unstructured data is ultimately what is desired. My system doesn’t need to know what it contains. It simply needs to make the selection simple for the end user. So, I need to have the data, but I don’t need to be able to read, filter, sort, seek on any of the data.

In this case, I find storing this kind of data in JSON, XML or even a blob make sense. I have written queries to find data where I mix SQL and XPath to seek data stored in XML Columns. Every time I do I wish someone had taken the time to normalize the XML into SQL tables. It’s a pain to optimize and maintain, and fragments tables rather quickly. In contrast, the examples I provided above would never be used like an XML column. The data cannot be queried from SQL except to manage the contents as a single variable. That variable cannot be parsed by SQL, at least not in its native form.

Some common properties of these of all of these examples are:

  • No other storage tool currently available in the environment
  • Disaster Recovery is simplified with a single data store
  • Storing it elsewhere may be less efficient
  • The contents of the data is irrelevant to the relational data
  • The contents of the data does not need to be human readable within an SQL query

I hope this clarifies the reason I would even write on this topic. I’m trying to point out that there may be times when it is reasonable to save data in your SQL database that SQL doesn’t recognize the contents in any meaningful way.

Cheers,

Ben