Editorials

Hash Join

SQL Server uses three different techniques when joining tables depending on the size of the table and the indexes available. When no good index exists on either table to implement an efficient join process, the query optimizer will often select a HASH Join.

A hash join is performed by creating a hash value for the attributes used to join both tables, once on each table, and then the rows are joined based on the hash values. It can use an existing index one the tables that may help reduce the number of records requiring a hash value. Once the hash is created on both tables it can then join the results on the resulting hash. There’s more to it than that, but you have an overview of what is going on in this case.

What if you had to create an index on tables every time you had to join them? Things could be pretty slow, depending on the number of records in the table. Essentially that is what a hash join does. I know from experience that if an index does not exist during a join there comes a point where it is faster to create that index, and then do a join; even if you are creating that index on a temporary table. That is what you are doing here.

So, now when you see a hash join in a SQL Server query plan, you have an idea of what is going on. You now have the option to decide if the performance of the hash join is good enough, or to determine if there is a better index scenario or query method to execute the join of the two tables.

Cheers,

Ben