Editorials

BLOBS in SQL

Today I want to return to the topic of FileStream and other BLOB (Binary Large Object) data types storage in SQL Server with the exception of XML. Let me start with a near dogmatic statement that BLOB data doesn’t belong in a relational database; especially when there is a lot of it. Let me provide a few reasons why.

  • Using BLOB data types puts a heavy load on the SQL Server engine (the reason for which they came up with the FileStream data type in the first place as discussed yesterday)
  • Because the BLOB data is contained in the database it is all backed up whenever the file object containing the BLOB data is backed up. You cannot simply backup individual BLOB objects like you can if they are located on a file server
  • BLOB data reduces performance of your SQL Server
    • You have less records per database read unless the BLOB data plus the reset of the table data is greater than a page (8k)
    • If stored natively in SQL Server BLOB data types, and the data is larger than a page, it is stored in a linked list rather than in a contiguous file. As a result, BLOB data itself may become fragmented
    • SQL Server storage should be optimized for random access activity. File access, the best suited for BLOB data should be optimized for sequential access
    • Including BLOB data in your SQL transactions increases the length of the transaction operation, and can block other processes needing SQL resources
  • Disaster recovery of BLOB data is more painful in SQL Server
    • If you use clustering the file data must be stored on a SAN in order for all nodes to access the BLOB
    • If you use replication, mirroring, log shipping, or always on protection the storage of your data can increase the time required to modify BLOB data, and anything else associated with the transaction
    • Backups are radically impacted, increasing as the volume of blob data is grows

In short, using BLOBs in a database is a really nice, easy, simple solution from a development perspective. It is really painful when it comes to performance and disaster recovery. While I wouldn’t say you should never use it, I will say that it should not be used when there is a lot of BLOB data.

Tomorrow we’ll take a look at some other ways to save your BLOB data that can scale and provide disaster recovery.

Now’s the time to keep me honest. If you don’t agree with my opinion please leave a comment here or drop an email to btaylor@sswug.org. Don’t worry if I don’t agree; your opinion counts just as much.

Cheers,

Ben