Editorials

Following up on EXCEPT

Today I am following up on my earlier editorial on the EXCEPT operation, I did some performance comparisons as well as expanded the question to the INTERSECT operation. The INTERSECT operation is the exact opposite of EXCEPT. The query returns all unique records from the first set also found in a second set. This is the equivalent to selecting a distinct set of records in one query, with a test for existence in a second set.

I compared the performance using the older LEFT JOIN syntax with EXCEPT, INNER JOIN syntax with INTERSECT and EXISTS syntax with INTERSECT. I didn’t do a scientific comparison with multiple executions; I simply ran some BIG queries testing the extraction of ½ terabyte record tables for an ETL process I was working on. EXCEPT and INTERSECT had the best performance hands down in SQL Server from a non-scientific point of view. Initial results were near immediate, and final results were a significant amount faster. When I ran queries across linked servers the EXCEPT and INTERSECT demonstrated even better performance.

Maybe someone has some actual performance comparisons comparing the different techniques.

One of the things I like about the EXCEPT and INTERSECT clauses is that it includes not only the existence or non-existence of data from two sets, but the results are grouped by default as well.

Sean Maloney wrote in with the observation that EXCEPT and INTERSECT are more clear when it comes to the intention of the query. While an EXCEPT Operation is clear in a single word, if you use a LEFT OUTER JOIN and a WHERE clause where the outer join set is null you will get the same result, but the fact that your intention is to return the EXCEPT is harder to ascertain because the intention is broken up in two different clauses, and other things may be going on in those clauses as well.

Some insights from Sean Maloney:

In this case I generally will use and encourage the use of the except construction. As one of the commenters indirectly suggested at, if the vendor is not asleep the vendor staff responsible for optimizer algorithms may hopefully be able to leverage the additional freedom to make the except version perform better then older less explicit T SQL statements.


Similar sorts of questions have often been asked in the past with respect to the use of explicit join statement logic versus an "equivalent" group of where criteria constructs. As in many of those cases, beyond possible vendor based optimizations it may sometimes be quite helpful to others going forward to be able to determine from an explicit except intersect or join construct what the intent of the original T SQL author may have been. Whereas the "equivalent" where constructs may typically require significantly more time to understand from a relational set oriented point of view.

Thanks for the contributions folks. Feel free to respond here online, or drop an email to btaylor@sswug.org.

Cheers,

Ben