Editorials

SQL Optimization Through File Segregation

We’ve talked about different ways to optimize disk access for your SQL Server database. Memory may be used for SQL Server cache, tempdb. Disk controllers may have their own cache. You can purchase many different kinds of disks, each with its own performance capabilities. Today we are going to take a brief look at combining disks to work as a unit, thus increasing performance.

There are two different ways to combine disks for a SQL Server instance. You can use separate disks to host different data files for your application, reducing contention to access different files. You can combine disks through a RAID controller, allowing multiple disks to service a single request, splitting the load.

Using the first technique, the easiest thing to implement is to place tempdb on its own drive. Tempdb is used by all databases on that instance of SQL Server. So, placing it on a separate drive reduces contention for those processes that are Tempdb intensive.

The operating system and swap files also benefit from not being on a drive shared with your database files. I’m not talking about a separate partition, although that is a good practice. I’m talking about a separate disk, when possible.

The next thing you would want to separate out would be the database log files. Because log files are written mostly to the tail, if you don’t have data files on the same disk, it is more likely that the heads will be closer to where they are needed, than if the heads need to move to support data file activity.

Data files consist of three different kinds of data. BLOB (Binary Large Object) data is found in the form of XML, and other data types such as BINARY(MAX) or CHAR(MAX). This data type is stored in link lists across multiple pages. It is a big benefit to separate this to a different database file. You can also separate it to a separate disk. Pre-allocation of the database file helps with performance…but, you still have contention for the disk resources.

A second kind of data found in a database is the relational data. The goal for this data is to have as many records on a page as possible. You want to increase the probability that when you read a page of data from the disk, that the data you require is contained in that single read. So, if you’re creating a report with 500 records, if you can read them all consecutively without having to jump around, your performance increases.

Index data is the third kind of data found in a database. I separate it out because the majority of the indexes in a database are often non-clustered indexes. A non-clustered index contains the sort criteria (Columns), and pointers to the actual data. So, if a clustered index is hosted on a separate disk from the relational data, then both files may be accessed simultaneously without impacting each other.

This is more than just theory. I have experienced implementations where a slower server out performed a faster server when the disk access was segregated for the different kinds of access.

At this point we haven’t touched on how RAID can be used to optimize this process. We’ll take a look at that tomorrow.

Cheers,

Ben