Editorials

Bad SQL With Strings

Today I want to take a look at what could be considered bad SQL. The difference is that the bad, or at least less efficient, SQL is the result of design decisions when creating the tables. The root cause is the choice of the string data type when storing characters in a table. When fixed length data types are used, CHAR and NCHAR, it results in a lot of extra string manipulation whenever you address the string.

In earlier version of SQL Server, using a fixed string length data type had some performance benefits. The reason was that when the string length is fixed, SQL Server might have been able to update a record in place, instead of deleting the old record and inserting a new record. If you had a variable length string in the table, and it was being modified, then the engine didn’t consider updating in place in most cases. This resulted in slower performance if you needed to do a lot of updates.

I came across a system that integrated with an external database. Since the external database had fixed length strings, the new database also included fixed length strings, in order to more easily integrate with the legacy system. As a result, a number of in-efficient SQL queries were found when interacting with the strings. Be aware that fixed length strings allow both leading and trailing space characters, whereas variable length strings only allow retain leading spaces, unless the source data has trailing spaces. A fixed length data type will pad trailing spaces to get to the desired string length, if not enough characters are provided.

Frequently your client does not wish to have trailing spaces. So, in order to consume the data it must know to remove trailing spaces. This is a lot of work that is not necessary, especially if you are constantly trimming the data. Worse yet, I have seen a lot of queries on the database server, something that is a precious resource, doing string trimming. For example, one query put together a USA address value by concatenating the City, State and Postal Code. Here was the concatenation syntax:

LTRIM(RTRIM(City)) + ‘, ‘ + StateAbbreviation + LTRIM(RTRIM(PostalCode))

Using a variable length data type means you can do the concatenation without the RTRIM functions. If you do not have leading spaces when you save the data, then the LTRIM function may also be abandoned. Now you have a more efficient query running on your SQL Server, or in your client code.

Another issue that arises with fixed length strings is when you perform comparisons for a where clause or join criteria. If you compare a fixed length string with a value that is not fixed length, they will not be equal. If you pass in a string for a where clause, make sure it has the appropriate trailing spaces. You can accomplish this by padding it to the correct length, use a stored procedure with a fixed length data type which will automatically pad the value with trailing spaces. The issue is that most of the time in our application we don’t want or care to have trailing spaces.

Back to my legacy application integration. How do you handle the difference there? My thought is that, “It is a legacy application.” Why would you create a new application using an old technique? That’s the reason for creating the new application in the first place. So, use an adapter pattern when working with the legacy system. Put the load for conversion in the portion of the application that interacts with the old and new systems. The legacy system is what you want to go away eventually. So why maintain a method you want to get rid of? Use variable length data types in the new database, and adapt them when working with the legacy system. Otherwise you are going to have to adapt the data constantly forever, even once the legacy system is retired.

In summary, if you are seeing a lot of LTRIM(RTRIM in your SQL code, or .Trim() in your application code, then there is a high probability you could resolve that by using variable length data types in your database.

Cheers,

Ben