Understanding the architecture of SQL Server Relational Engine
Author: Basit A. Farooq
Editor’s note: In this article, learn about the architecture of SQL Server Relational Engine.
Relational Engine also known as the query processor because it produces the optimal execution plan for your query or each query stored in a batch or stored procedure, and then executes this query plan to deliver the desired results to the client in a format specified in the submitted Transact-SQL statement.
The following figure outlines the query optimization process:
The key phases of the query optimization process is described here:
Parsing and binding
Relational Engine includes a command parser that checks the query to make sure that its syntax is valid. If the query has a syntax error, the error is returned to the client through protocol layer. If the query syntax is valid, the command parser generates a parse tree and proceeds to algebrizer. Algebrizer primary function is to perform binding, which means validating whether the tables and columns used in the query exist, loading the metadata information for the tables and columns, identifying all data types used for the query, adding information about the required implicit data conversions (typecasting) to the algebrizer tree, replacing views with definitions, verifying GROUP BY and aggregate functions are used in the right place, and performing simple syntax-based optimizations. The output of this phase is the tree of logical operators that are necessary to carry out the work the query has requested.
Query optimization
It is one of the most complex phases of the query processing. The first step of this phase is to perform trivial optimization where SQL Server tries to determine whether trivial plan for a query exists. A trivial plan is one that has a known, constant CPU and I/O cost. The trivial plan results when SQL Server optimizer determines that there is only one viable plan to run the query. SQL Server query optimizer performs trivial optimization based on the complexity of a submitted query. For example, SQL Server query optimizer creates a trivial query plan for a SELECT query on a single table but with no indexes, or for a SELECT on a single table but with no ORDER BY or GROUP BY clause, or for a SELECT on a single table with an SARG (Search Arguments) on a unique key, or for a SELECT query that involves no parameters, or for a SELECT query that uses predefined system functions, or an INSERT statement using a VALUES clause to insert data into a single table. This is because; there is only one viable way to execute such Transact-SQL statements.
So when you submit a query, SQL Server query optimizer first determines that whether the query plan would be trivial. To do that, it investigates the query and relevant metadata to determine that there is only one viable method to run the query. Due to the trivial optimization, SQL Server query optimizer can avoid lot of the work that is required to initiate and carry out the cost-based optimization. If a query has a trivial plan, SQL Server query optimizer returns the trivial query plan to the query executor and no additional work is required.
If a trivial plan is not available, SQL Server query optimizer retrieves all available statistics for columns and indexes that can help it to find the optimal execution plan from the plan cache. At this stage, SQL Server query optimizer performs more syntactical transformations of the query itself, such as cumulative properties and operations that can be rearranged. After this, SQL Server query optimizer begins optimization process.
Note: Plan cache is an area of memory that is used to store the query execution plans. You can use sys.dm_exec_cached_plans dynamic management view, to find query plans that are cached in SQL Server plan cache.
Initially, SQL Server query optimizer looks in the plan cache for the simple query plan. The simple query plan is one that usually use nested-loop join and one index per table. If the SQL Server query optimizer does not find a simple query plan in the plan cache, it then looks for a more complex query plan possibilities by analyzing multiple indexes on the table to find a good enough query plan. In a situation where table is used in a join, and it has not got a suitable index for the join query criteria, SQL Server query optimizer attempts to seek possibilities for complex query plan by assessing the cost of using a hash join.
If SQL Server query optimizer unable to find the appropriate query plan in the plan cache, it enters into full cost-based optimization phase. In this phase, SQL Server query optimizer uses the logical tree to devise every possible way to run the query. If your machine has multiple processors and the “Cost threshold for parallelism” and “Max degree of parallelism” configuration options are correctly configured, SQL Server query optimizer only then creates the parallel query execution plan for parallel processing. Note that the SQL Server query optimizer chooses nonparallel plan over a parallel query plan only when the cost of the least expensive parallel query plan is greater than the cost of least expensive nonparallel query plan.
SQL Server query optimizer then chooses the least expensive query plan in terms of the required CPU processing and I/Os, and then passes it along to the query executor for processing. Note that the reason I used the word “least expensive” here is because, SQL Server query optimizer is “cost-based” optimizer. That means; the faster or better query plan may exist in the plan cache, but the query optimizer will always choose the execution plan that it deems will have the lowest cost in terms of the required CPU processing and I/Os.
So, all in all SQL Server query optimizer main objective is to create a low-cost execution plan, and then passes it along to the query executor for processing. You can query sys.dm_exec_query_optimizer_info dynamic management view, to see detailed statistics about the operation of the SQL Server query optimizer.
Query execution, plan caching
After the execution plan is created, or retrieved from the plan cache, SQL Server query executor uses the selected query execution plan and work in conjunction with the storage engine to run the query and return the results the client through protocol layer in the format specified in the submitted Transact-SQL statement.
Note that SQL Server query optimizer may change estimated execution plan during actual execution process, if:
- The tables and columns statistics were out of date.
- Nonparallel plan exceeds the threshold for a parallel plan execution.
- Data in the underlying query tables changes significantly.
In addition, recompilation of an execution plan also results, if the underlying table’s data, indexes or statistics change significantly between each run. If not already, this estimated plan is then stored in the plan cache. At most, SQL Server caches two instances of the query plan at any time in the plan cache: parallel executio plan and nonparallel execution plan.
Query plan aging
SQL Server saves each query plan with age and the cost factor. The cost factor reflects the total cost when compiling the query. The cost factor is incremented by 1, each time query plan is referenced. SQL Server does not decrement this cost factor until the size of the plan cache reaches the 50% of the size of SQL Server buffer pool. When this happens, and the next time plan cache is accessed, SQL Server decrements the cost factor for all cached query plans by 1. SQL Server periodically cleans the plan cache. This happens when:
- SQL Server buffer pool requires more memory for another object.
- Cost factor of the query plan reaches 0.
- Query plan is not reference by any connection.
SQL Server 2014 improved design for cardinality estimation
Cardinality refers to number of unique values that exist in the data. To improve the quality of the query plan, Microsoft re-designed the query optimizer cardinality estimator algorithm logic in SQL Server 2014. By default, all databases created with SQL Server 2014 have this feature enabled. To enable new cardinality estimator on databases created with prior versions, the compatibility level for the databases must be set to 120. For information about new cardinality estimator, see Books Online topic “Cardinality Estimation (SQL Server)”.
Optimizing for ad-hoc workloads
By default, SQL Server caches all query plans in the plan cache. If your SQL Server is experiencing memory pressure, it is recommended to optimize SQL Server for ad-hoc workloads. You can do this by running the following Transact-SQL code:
USE [master];
GO
EXEC [sp_configure] ‘show advanced options’, 1;
GO
RECONFIGURE;
GO
EXEC [sp_configure] ‘optimize for ad hoc workloads’, 1;
GO
RECONFIGURE;
GO
EXEC [sp_configure] ‘show advanced options’, 0;
GO
RECONFIGURE;
GO
Setting this option will only affect plans that are already in the plan cache. This option is only available on SQL Server 2008 and higher SQL Server versions.
Manually clearing the plan cache
If you want to clear the plan cache manually, you run DBCC FREEPROCCACHE. You must avoid running this command in production environment, because clearing the plan cache forces queries and store procedures to recompile, which reduces the query performance temporarily.