Editorials

Set Operation Final Thoughts

Today I’m sharing some final thoughts on the Set Operations sent from Maurice who regularly to this column with practical examples from his experience.

Maurice Writes:

Some examples to says that achieving row exclusion using outer join / not exists is not the equivalent of EXCEPT, and considerations on performance. EXCEPT as other set operations is based on internal optimizer operators that are very quick to deal with a comparing rows between them in a merged set of rows. The other SET operators are INTERSECT, DISTINCT, UNION AND GROUP BY. I don’t know which secret algorithm they used but it is damn quick.

But before talking about this, you need to understand that SET techniques is not a perfect all fit to find matching/unmatching rows.

Let’s say a table A with a key field K, and a table B with a matching Key. This query says "give me rows identified by K that doesn’t exists in B for this K table values".

Select K From A

except

Select K from B

Now if you want to get more details on A table rows, you can’t add those columns to the except query. First you need to add them to both selects.

Select K, someOtherData From A

except

Select K, someOtherData from B

But this changes the results. In the first query, if K value from B exists also in A it will be discarded. In the next one, K values may exists in both tables, if there is a difference only on someOtherData column. If your goal was to just about returning A rows details for which there is no matching K, it doesn’t anymore. You need a rewrite of your query this way:

;With ARowsHavingNoKValuesInB as

(Select K From A except Select K from B) — this is fast for large sets

Select ARowsNotInB.*

From

ARowsHavingNoKValuesInB as ARowsNotInB — if this input has not many rows this doesn’t cost much.

Join A

ON A.K = ARowsNotInB.K

You need to self join K values obtained from the Except query back to A table to get the rest of the details of A’s rows. If the number of A rows Except B rows is expected to be few this is still good. Except is much faster, so you may still pay an extra (the join) and have a winner query. If the expected source non-matching rows is large, you are better to directly use an Not Exists of Left Join which resolve to about the same query plan.

Select A.* From A Where not exists (select * From B Where B.K = A.K)

Or

Select A.* From A left outer join B ON B.K = A.K Where B.K is NULL

There is another way to reproduce Except’s behavior by using union all + group + having on pre SQL2005 versions. This gives more information about the differences.

With

BothRowSet as (Select ‘A’ as RowSrc, K From A Union ALL Select ‘B’ as RowSrc, K From B)

, NonMatchingRows as (Select K, Min(RowSrc) as RowSrc From BothRowSet Group by K Having Min(rowSrc) <> Max (rowSrc) — Having says not there for both Source and Target)

Select NM.RowSrc as RowSrc, A.*

From NonMatchingRows NM Join A On A.K = NM.K And NM.rowSrc = ‘A’

Union ALL

Select NM.RowSrc as RowSrc, B.*

From NonMatchingRows NM Join B On B.K = NM.K And NM.rowSrc = ‘B’

Order By K — interesting to look results in that orderToday I’m sharing some final thoughts on the Set Operations sent from Maurice who regularly to this column with practical examples from his experience.


Using except which is more costly, because the SET operation is done twice, but much more easier to read.

With

A_ExceptB as (Select K From A Except Select K From B)

, B_ExceptA as (Select K From B Except Select K From A)

Select 'A' as RowSrc, * From A_ExceptB Join A On A.K = A_ExceptB.K

Union All

Select 'B' as RowSrc, * From B_ExceptA Join B On B.K = B_ExceptA.K

Order By K -- interesting to look results in that order

When comparison was done using linked server I got a very good performance also, but in that case rows had to travel from one server to the other, but it was still better than a client program getting rows from both sources to do the same. Performance is also faster if the column set is not large.

Thanks for the examples Maurice. This is a great way to share with the SSWUG community.

Cheers,

Ben