Editorials

Turn On Auto Create Statistics

Turn On Auto Create Statistics
In response to the previous editorial, Maurice, one of our most frequent contributing readers, provides this insight from his personal experience.

In one of our development team, there was a practice of not turning on auto create statistics option at database level. There was a fear of slowing queries, which is a misconception I will explain later.

When we turned on the option, even for a few seconds, ran the query once, and then ran it again, we got a much better plan the next time. I decided to have a look at all the new statistics created. It was stunning to see the large number of statistics created in this short amount of time on many tables even those not related to the query we were working on. Other statistics not related to our query were produced from other queries running on this database. But even those produced for other queries were useful for the query we were working on, because both query had some search argument in common.

This mean one important thing : The auto-created statistics are an accurate portrait of what search arguments used by an application. They are essentials to the optimizer to guest costs of using index or not, taking this path or not. They give a glimpse to the optimizer on the relative amount of rows in each table given the query search argument. These amounts are crucial to guess the less costly path, given an index is present or not. SQL Server doesn’t only take into account indexes, it also estimates the number of row to process in each path.

When SQL Server scan a table or an index for given search arguments, it gathers at the same time distribution statistics that match theses search arguments. So there is no need to rescan data to get them. There is a misconception that auto created statistics are expensive. This is not our experience. Hundreds of them were created in matter of seconds, without any perceptible performance degradation. I had a case where the plan changed between two consecutives executions. In reality I think this happens frequently, when new statistics are found to be added.

In the case of the query we were working on, with auto-created statistics it was easy to get a good plan, with no query rewrite. Before auto-creating them we tried some good rewrites, even asked to force an index use into the plan, without success. Statistics are almost mandatory today to get a good optimizer job, on SQL2005 and above. With statistics the original query was using the index that we were trying unsuccessfully to force. Frankly the access plan even made more sense than the query.

One last thing. Some access plan may look the same graphically but sometimes you need to display properties of each plan node to see the true thing, and specially look at seek predicates on seek operations. For example you can have an index seek on a table, but if it is a composite key (key made of more than one column like Year, city), some columns may not be used in the seek. The seek may performed using the first column (ex: year) but without using city column. Externally the plan have the same structure but internally you get a different operation. This is a seek but not the way you would have done it. Seeking on the year only implies that a lot more index row are checked especially if the seek is looped. In your case you would have added to the seek condition, a condition also on the city column. When you compare plan take a look at the width of links between plan nodes. If they are thicker this means that they are a lot of rows going from one operator to the other. The thickness mean “expensive” and as it grows, the number of rows grows exponentially.

So auto-created statistics are a must, and also updating statistics from time to time, through maintenance by update statistics statement, is a good guess for most use cases.

I have now changed my practice of turning off Auto Create Statistics. I still turn off Auto Update Statistics because I have scheduled jobs to fulfill that need that occur during off-peak database access hours. Thanks, Maurice, for sending your response to btaylor@sswug.org.

Cheers,

Ben

$$SWYNK$$

Featured Article(s)
Using 11gR2’s DBFS Database File System
The Oracle Database File System (DBFS) is an Oracle Database 11g feature providing a standard file system interface to store and access files within the database. The files are stored in a database table as SecureFiles LOBs. An important point to note here is that a DBFS file system is mounted on a regular mount point, which means you can access it like any other file system in your Linux system.

Featured White Paper(s)
How to Use SQL Server’s Extended Events and Notifications to Proactively Resolve Performance Issues
read more)