Editorials

Data Dissonance

Today I was talking with a colleague about a performance issue he was having on a SQL Server instance. It reminded me of a few war stories that may be of interest. The key point I want to make is based on data dissonance. Data dissonance is when you are joining or searching for data where one or more columns must be cast in order to make the desired comparison.

The first story comes from my friend today. He had a staging table where data was imported from another database. The primary key from the source had a column type of UNIQUEIDENTIFIER, sometimes referred to as a GUID. However, the staging table where data was imported for a MERGE process converted the primary key to VARCHAR(?). SSIS automatically casts the data during the import, so there are no errors. However, when consuming the data from the staging table, and attempting to merge it into the active tables, the performance was horrible, because the string value had to be converted back to a UNIQUEIDENTIFIER in order to merge the records correctly. Other queries against the staging tables from an application also performed poorly. Lookups against the primary key once again had to be converted from GUID to string in order for the query to find the desired key.

Another time years ago I was asked to assist with a poorly performing query, once again based on a staging table being merged into production data. This time the issue was again data dissonance. It was a little harder to find. The staging table had the key value based on NVARCHAR, but the production table key was based on the VARCHAR data type. Merging data between the two tables required that one or the other be cast to the same data type. Since SQL Server implicitly converts data types between VARCHAR and NVARCHAR, it was only by reviewing the query plan that we were able to discover the conversion process, and the lack of index utilization for the merge queries. Changing the data type of the staging table to match the production table reduced the time to process the merge from over a minute to under 10 seconds.

The same kinds of performance killers are found in many other kinds of queries. I have seen performance hits when doing math against different numeric data types. SQL Server will once again implicitly convert numbers for most data types, using a plan based on its own rules of precedence. Not only will you get unexpected results from rounding, etc., but your query performance, if you are doing a lot of math on a lot of records, may suffer from the constant conversion.

The key thing to take away from these war stories is that it does matter what data types you choose. Moreover, consistency can also make a big difference on how things perform. How about adding your war stories to our comments? I’m sure they will be of interest to all of us.

Cheers,

Ben