Editorials

Create a HASH Value in SQL Server

SQL Server has two built in functions for generating a hash value, CHECKSUM and HASHBYTES. CHECKSUM is not as sophisticated as HASHBYTES resulting in a higher probability of a duplicate hash for different input values. Still, it provides a lot of power, and is used for the HASH JOIN technique built into SQL Server itself.

One thing that is really cool about the CHECKSUM function is that it produces a value based on one or more inputs. In fact, it can produce a hash for an entire record in a table by creating a hash value for every column found in a row. You can even produce CHECKSUM(*) which builds the hash value for every column based on the current design of the table, introducing the columns in the ordinal position for which they are found in the table definition. Or you can specify the column in a specific order you determine.

I like using the CHECKSUM TSQL function when comparing the changes of sql objects over time. For example, if I have a history table containing the definition of a stored procedure, it would be a very heavy process to compare two instances with the complete TSQL used to create them, especially for complex stored procedures with a lot of text to generate them. However, if I create a CHECKSUM of the TSQL it results in a 64bit integer. I can compare this value quickly to determine if the stored procedure has changed from one instance to another.

Another way CHECKSUM may be used is to produce a hash value of data not contained in tables, etc. It doesn’t have to be the TSQL of a stored procedure. It can create a hash for any data including things you make up or for which you are interested.

Creating Hash values is a capability that has been around for a long time, and libraries are found in most modern languages, not just in a database engine. So what is the best place to generate a hash value? Should you create the hash values in your application code, and submit that to SQL Server already calculated, or, do you create the hash values in SQL Server itself? I don’t think there is any one definitive solution? Why not share your opinion here, or drop me an email at btaylor@sswug.org.

Cheers,

Ben