Editorials

BLOB Feedback – Write to File System

SSWUGtv
With Stephen Wynkoop
Managers often consider the impact of social media on the performance of their employees. Things aren’t getting done…but they used to. Then they start watching and restricting texting, IM, facebook, twitter, etc) to get a handle on things. Sometimes these restrictions make it harder to do your job. What is the appropriate use of social media in the workplace? Laura Rose has some answers.
Watch the Show

BLOB Feedback – Write to File System
A number of people have sent feedback on BLOB storage. Today I include a number in favour of storing BLOB data in the file system.

Mike Writes:
Ben, I think you should read a paper by the late Jim Gray that analyses some of the considerations when storing BLOB data in a relational database. You can find the paper at the Microsoft Research site, here: http://research.microsoft.com/apps/pubs/default.aspx?id=64525

Editor:
I read this paper and found the information it contains really helpful. It addresses the performance factor variance between storing data in the database and on an NT File System. It makes some recommendations based on performance.

However, performance is not the only factor to consider when working with BLOB data. It may also be important to factor in ACID transaction participation, replication, Backup and Recovery, Security, and other factors including performance. I highly recommend you review this paper if you are considering putting BLOB data in your database.

Tim writes:
Use FileStream instead of BLOBs
http://www.simple-talk.com/sql/learn-sql-server/an-introduction-to-sql-server-filestream/

Editor:
Our reader with the Document Management Database seemed to consider FileStream a factor in creating backups. I’m not sure why that may be. Perhaps someone may have a link for comparisons using FileStream objects in their database.

Tim Writes:
A brief scolding (with affection): Large Objects should just not be called Big Data, not with such disparate meanings. All forgiven. ‘Nuff said.

MS is behind the LOB curve. Some competitors distinguish CLOB, BLOB, LOB, GLOB (character, Binary, general and Graphical data)and even MIME LOB. Some specialized databases (actually packages built around a database) have a two level solution, similar to SQL’s “in row” option, except that they use the file system (SQL’s File Stream) as the 2nd level. SQL’s XML support is a step in the right direction, but more data type awareness would be to the advantage of the database community in general. Interesting when you consider that MS Access can display an image (can it play a sound file? Not sure). SharePoint leans in this direction for portal apps, but leaves operational apps lacking.

To the point: While “it depends” is a correct answer, the general answer has to be (IMNSHO): consider File Stream first; with 2012, also consider File Table; finally consider varchar/varbinary (max) or and XML types, if appropriate.

Exceptions: a small number of objects; mostly small LOBs that are frequently referenced (for a non-profit, we allowed volunteers and fund raisers to upload images that they used in personalized appeals); roll-your-on solutions with the LOBs in a separate (thus separately maintained) database or filegroup; and of course, more, but similar.

In a former job where we were stuck in 2000, moving to 2005, we developed a hand rolled File Stream solution.

We provided circulation audits for ad supported publications. This involved the clients uploading cover images, constructing charts and graphs into draft Word documents that the auditors worked, and saving the final PDF documents. Our first implementation used text and image data types.

It worked. But once we got up to volume, it was a nightmare.

The backup time and media consumption, and performance impact soon forced us to abandon these in favor of NTFS. We had to write .Net code to write and read the file data, develop filenames, deal with transaction integrity, etc. SQL, especially 2012, handles this so well, that considering any other solution for similar problems is …, well I can’t find a nice way of saying it.

These data, once written tend to be static (or in our case, some were static and some became so after a short period of activity), so repeated backups are wasteful. Unlike more typical DB data, we don’t tend to search, update or repeatedly retrieve LOBs. With a little smarts, multiple databases (e.g., DEV, QA, INT) can share one set of the data in the file system.

Want to add your comments to this thread? Send them to btaylor@sswug.org.

Cheers,

Ben

$$SWYNK$$

Featured Article(s)
How to secure SQL Server infrastructure?
This article details the SQL Server security best practices, which you can follow to secure your SQL Server infrastructure.

Featured White Paper(s)
All-At-Once Operations
Written by Itzik Ben-Gan with SolidQ

SQL supports a concept called all-at-onc… (read more)