SQL Server

Top 20 tips to optimize your T-SQL code (Part-2)

Top 20 tips to optimize your T-SQL code
(Part-2)

Author: Basit A. Farooq

In this second of the two part article series, I will share the last 10 tips that you can follow to optimize T-SQL code.

Tip#11: Avoid overindexing tables

Indexes are the solution to many performance problems, but too many indexes on tables affect the performance of INSERT, UPDATE, and DELETE statements. This is because SQL Server updates all indexes on the table when you add (INSERT), change (UPDATE), or remove (DELETE) data from a table. Therefore, it is recommended that you only create required indexes on the tables by analyzing the data access requirements of the application or users.

Tip#12: Some more tips related to design and optimize indexes

Here are some more tips and guidelines that you can follow to make indexes more effective and improve performance during the creation, implementation, and maintenance of indexes:

  • When using clustered indexes, create a clustered index before creating nonclustered indexes – As we know that the leaf layer of a clustered index is made up of data pages that contain table rows, and the leaf layer of a nonclustered index is made up of index pages that contain pointers to the data rows. In addition, SQL Server sorts table rows in the clustered index order based on key column values, while the nonclustered index does not affect the table sort order. When we define the nonclustered index on a table first, the nonclustered index contains a nonclustered index key value and a row locator, which points to a heap that contains a key value. However, if the table has a clustered index, a leaf node of the nonclustered index points to a leaf node location in the clustered index. So, when you create or rebuild the clustered index, the leaf node structure of the nonclustered index also changes. Therefore, you need to follow this rule because the creation or changing of the clustered index will also change the nonclustered indexes of the tables.
  • Index columns used in foreign keys – Foreign key columns are always good candidates for nonclustered indexes because they are mostly used in JOIN operations.
  • Index columns frequently used in joins – Be sure to create nonclustered indexes on columns that are frequently used in JOIN operations as this will improve query performance when the JOIN operation is being performed by reducing the time required to locate the required rows in each table.
  • Use composite indexes and covering indexes to give the query optimizer greater flexibility – When you use composite or covering indexes, you create fewer indexes for your queries because a composite index is defined from two or more columns from the same table. This improves the query performance because the query requires less disk I/O than the same query that uses a single column index. These types of indexes improve query performance by reducing the overall disk I/O because all of the data needed to satisfy the query exists within the index itself.
  • Limit key columns to columns with a high level of selectability – We need to limit key columns to columns with a high level of selectability because the higher the level of selectivity in a column, the more likely that it is a key column candidate. For example, good candidates for index key columns are the ones used in the DISTINCT, WHERE, ORDER BY, GROUP BY, and LIKE clauses.
  • Pad indexes and specify the fill factor to reduce page splits – When the database engine needs to add a row to a full index page, the database engine has to split this page to make additional space for the new row. This process of splitting pages will help keep the index hierarchy intact. Obviously, this process is resource intensive as it depends on the size of the index and other activities in the database. The process can result in a significant loss in performance, and to prevent splits, or at least reduce the need for them, you should pad the index and specify the fill factor value. The fill factor value specifies the percentage of space on each leaf-level page to be filled with data, reserving the remainder of space for future growth. The fill factor can either be set to 0 or to a percentage between 1 and 100. The server-wide default for the fill factor value is 0, which means the leaf-level pages are filled to capacity. A padding index leaves an open space on each page at the intermediate level of the index. The padding option in indexing is useful only when the fill factor is specified as it uses the percentage specified by the fill factor. By default, SQL Server ensures that each index page has enough space to accommodate at least one row of the maximum index size, given the set of keys on the intermediate pages. However, when you pad an index, if the percentage specified for the fill factor is not large enough to accommodate a row, SQL Server internally overrides the percentage to allow the minimum. For more information, refer to the Specify Fill Factor for an Index article at http://msdn.microsoft.com/en-us/library/ms177459.aspx.

Tip#13: Use char and varchar columns data type, instead of nchar/nvarchar data type

Use char/varchar columns data type, instead of nchar/nvarchar data type if we do not need to store Unicode data. The char/varchar data value uses only one byte to store one character; whereas the nchar/nvarchar value uses two bytes to store one character, so the char/varchar columns use two times less space to store data as compared to nchar/nvarchar data columns.

Tip#14: Know when to use correlated subqueries and when to use non-correlated subqueries

When writing a sub-query (a SELECT statement within the WHERE or HAVING clause of another SQL statement), try to use a correlated (refers to at least one value from the outer query) sub-query when the return is relatively small and/or other criteria are efficient i.e. if the tables within the sub-query have efficient indexes. Moreover, use non-correlated (does not refer to the outer query) sub-query when dealing with large tables from which you expect a large return (many rows) and/or if the tables within the sub-query do not have efficient indexes. Also, ensure that multiple sub-queries are in the most efficient order. Finally, remember that rewriting a sub-query as a join can sometimes increase efficiency.

Tip#15: Use stored procedures instead of heavy-duty queries

Always prefer stored procedures against T-SQL patches. This is because the code within a stored procedure is executed as a single unit or batch. The benefit of this is that the network traffic is greatly reduced, as several T-SQL statements contained in the stored procedure are not required to travel through the network individually. Only the name of the stored procedure and its parameters are transmitted over the network. Moreover, SQL Server always caches a stored procedure execution plan in an area of SQL Server memory called procedure cache, and it is likely to remain in the procedure cache (provided there is enough memory available, or not executed with WITH RECOMPILE option) and be reused, while ad hoc SQL plans created when running ad hoc T-SQL statements might or might not be stored in the procedure cache. Therefore, SQL Server does not retrieve and reload the stored procedure from disk and nor does it parse, optimize, and compile the stored procedure each time it runs.

Tip#16: Use TRUNCATE TABLE instead of DELETE statement

Instead of using DELETE statement, try using TRUNCATE TABLE statement to remove all rows from a table. TRUNCATE TABLE statement is much faster when compared to the DELETE statement, without WHERE clause, because it uses fewer system and database transaction log resources. This is due to the fact that unlike DELETE statement, when you run TRUNACT TABLE statement, SQL Server does not log individual row deletions in a transaction log. Instead, SQL Server just logs page de-allocations that occur as a result of this operation.

Tip#17: Keep Transactions Small

Keep your transactions as short as possible because running several large transactions simultaneously increases the likelihood of a deadlock. If possible, breakdown the one large transaction in to several small transactions and then execute these transactions in batches. This is because exclusive or update locks are held longer for large transactions, which eventually block other activities and leads to possible deadlock situations. Executing the large transaction in batches will help to minimize the network round trips during the transaction, reducing the possible delays in completing the transaction and releasing the locks.

Tip#18: Reduce Transactions Time

Reduce the transaction time by making sure that you are not performing the same reads over and over again. If your application needs to read the same data more than once, then cache the data into variables, temporary tables or table variables. You can then reread the data from cache. This will help to reduce the lock time on actual resource. We can also reduce lock time by making sure that our application grabs the locks at the latest possible moment and release it at its earliest time.

Tip#19: Avoid cursors, where possible

Avoid cursors as much as possible. Instead, use a set-based approach to updating or inserting data from one table to another. This is because; the same locking rules will apply to a SELECT statement in a cursor definition that applies to another SELECT statement. When using cursors, ensure you have the correct isolation level or locking hint specified for your cursor SELECT statement. That’s because SQL Server holds the locks for both SELECT statements within a cursor and the independent SELECT statement until both transactions is completed (This only applies if SQL Server is running in explicit or implicit transaction mode).

Tip#20: Choose right data types for your tables, stored procedures, and variables

A data type determines the type of data that can be stored in a database table column. When you create a table, you must decide on the data type to be used for?the column definitions. You can also use data types to define variables and store procedure input and output parameters. You must select a data type for each column or variable appropriate for the data stored in that column or variable. In addition, you must consider storage requirements and choose data types that allow for efficient storage. For example, you should always use tinyint instead of smallint, int, or bigint if you want to store whole positive integers between 0 and 255. This is because tinyint is a fixed 1-byte field, whereas smallint is 2 bytes, int is 4 bytes, and bigint is a fixed 8-byte field. Similarly, use char/varchar columns data type, instead of nchar/nvarchar data type if you do not need to store Unicode data. This is because char/varchar use one byte to store one character, whereas nchar/nvarchar uses two bytes to store one character.

Choosing the right data types for your tables, stored procedures, and variables?not only improves performance by ensuring a correct execution plan, but it also improves data integrity by ensuring that the correct data is stored within a database. For example, if you use a datetime data type for a column of dates, then only valid dates will be stored in this column. However, if you use a character or numeric?data type for the column, then eventually, someone will be able to store any type of character or numeric data value in the column that does not represent a date.