Editorials

Set Operations Prevail

I was recently reminded of differences in performance for different SQL Server persistence techniques. I had a set of data that was generated by my application needing to be merged into an existing relational database consisting of 8 tables. A root table with three dependent tables. Each dependent table had a one or two dependent tables.

Level 1-1

Level 2-1 Level 2-2 Level 2-3

Level 3-1 Level 3-2 Level 3-3 Level 3-4

This was a SQL Server database, and the client used the traditional Microsoft technique of an Identity column for the primary key. So, not using natural keys, the merge process was a bit painful. They were smart enough to create a hash value of the contents of all 8 tables, and place the hash in the Level 1 table simplifying the merge.

The original implementation used a single record by record implementation. When a new hash was found the application would insert the Level 1-1 record and get the identity value back. Each parent table having a child table had to be performed one at a time in order to return the assigned identity value. With the overall objects broken down the resulting data of 2,000 Level 1-1 records resulted in nearly 20,000 records for the entire structure. This wasn’t incredibly distressing, just the time to perform the work if the overall ACID transaction was the entire 20,000 records.

The answer is to do the whole thing in sets. You could use User Defined Table Variables and perform set operations for each set of data, making a round trip for each. This requires the use of an ADO transaction which could slow things down.

You could write a single stored procedure and pass the whole thing in table variables, and then the transaction can be done inside the stored procedure. Moreover, all of the SQL commands can be performed as sets instead of individually for each record. An ACID transaction can be maintained inside the single stored procedure.

Our final solution used temporary tables created on the SQL Connection prior to calling the stored procedure. The temp tables were populated using Bulk Copy commands resulting in performance of 2 seconds compared to two minutes using the original one record at a time method.

The moral of the story: Set operations in relational databases generally perform much faster than single value operations.

Cheers,

Ben