Max Degrees of Parallelism and Query Optimization

SSWUGtv
With Stephen Wynkoop
Laura Rose is back again in this edition of SSWUG tv with some great advice on "What do you do when your peer is promoted – how do you shift your relationship?"
Watch the Show

Max Degrees of Parallelism and Query Optimization
In response to the editorial from last Friday regarding SQL Server Parallel Queries, Derek shares his experience using Max Degrees of Parallelism in SQL Server 2008 R2 while optimizing the performance of his query.

Derek writes:
First, I wanted to let you know that I appreciate the professional demeanor you convey in your newsletters and the humility in the tone of your language; it is a good reminder to leave my hubris at home and be professional in all my interactions.

That said, I wanted to share an experience with parallelism in SQL Server 2008 R2. I recently had to construct a stored procedure that needed to return a rather large and wide result set for a pharmacy claims extract file requiring several joins including an indexed view with several joins inside the view and the results needed to be "distinct" and ordered. The problem was that it seemed the query would never stop. I managed to get orders of magnitude performance improvement by separating the jobs of querying from ordering and distinct.

Based on the way the query engine works, it doesn’t guarantee the order of results, and the process of ordering and making distinct the results in the same process as the query proved to be extremely costly because of parallelism. Parallelism was causing the results to be merge joined in memory during the order and distinct process, which was very inefficient since it didn’t have all of the results at the same time that it was trying sort them. Make sense?

So, I removed any sorting on the results AND using the query option MAXDOP 1 to force the server to put more load on the disc (I had very heavy memory usage compared to disc I/O) and spit the results into a temp table.

After that operation completed, I then ran query off the temp table using DISTINCT and ORDER BY and allowed SQL Server to automagically determine how much parallelism to use. For this operation, parallelism is very useful since sorting is a very memory intensive operation.

What used to be a query that would never stop became a series of steps that ran in a very reasonable amount of time.

I’d love to read about the experiences and issues with parallelism your other readers have had to see how they solved them and why.

Editor’s Note:
I’m pretty good at tuning queries too. However, I would say that I work more from intuition rather than a statistical and analytic approach. There are others that can tell you why the things I do work…I appreciate their expertise an awful lot.

When doing ranking, sorting, aggregations and the like, one thing I have found for increasing performance is to do so with the least amount of data possible. For example, if I had a query with a number of key columns that could be linked to additional data related to that key, I would leave all the other data out of the Group By query so I did not have to include that in the group by clause or do the other punt technique of returning MAX(). Using this technique, I group the smallest set of data possible, then join back to the fact tables on the key when the grouping is complete.

This technique is not always the most efficient…but I have seen it work often enough to consider it when I have a query returning say 30 columns, and I am only grouping or aggregating 20 of those columns. The other columns are simply descriptive data; dropping them off the heavy workload results in better performance sometimes.

Surprisingly, this technique often benefits from multiple parallel processes. In my testing for my previous editorial I had to turn Parallel Processing off when I had too many concurrent queries using the same technique. After a while, there was too much context switching. It was more efficient to do things in a serial fashion.

Tomorrow we’ll be digging into other threading techniques using a Dot Net Timer.

If you would like to share your experiences or questions with threading in SQL Server, Dot Net, Java, Oracle, etc. you can send your comments to btaylor@sswug.org.

Cheers,

Ben

$$SWYNK$$

Featured Article(s)
Troubleshooting problems with bulk copy in SQL Server 2008 R2
In this article, Alexander Chigrik explains some problems that you can have when you work with bulk copy in SQL Server 2008 R2.

Featured White Paper(s)
Top 10 Tips for Optimizing SQL Server Performance
read more)

Max Degrees of Parallelism and Query Optimization

Recent Posts

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Getting Started With Deep Learning in Your Browser Using TensorFlow.js

SQL Server Collation Overview and Examples

MySQL Escaping on the Client-Side With Go

VS Code Gets New Python Language Server, Named After Monty Python Character

How Hello World! changed – top level statements and functions (C# 9)

SSWUG.ORG