Editorials

Integrating Regular Expressions with SQL

Integrating Regular Expressions with SQL
A number of people have responded with their take on the use of Regular Expressions in SQL. Many point out how valuable Regular Expressions have been to them using engines already supporting regular expressions. Others extend on this with links to CLR user defined Regular Expression extensions for SQL Server 2005 and later.

Based on these responses it appears to me that including Regular Expressions in a future ANSI standard would be after the fact in that many SQL engines already support them, and others have user definable extensions that are often used.

Here’s what folks are saying…

John:
For data cleansing, this is essential. I end up having to BCP the data out to a flat file then use Perl to do this. I definitely encourage you to master this skill – it will pay you handsomely in dividends.

Note that there is a dll you can load the dll to allow VBScript regular expressions. See attached file (the code is not mine – you would need to Google the original author).

JAW:
REGX first showed up in the UNIX world.
Oracle supported them since 9i and we are on the cusp of 12c. I believe DB2 supports them as well.
I have installed CLR packages in SQL server to support them.
If you are porting code from say Oracle to SQL Server and REGX is used then it makes the conversion easier.
I was hoping SQL 2012 would support them.

Jason:
I certainly wouldn’t be opposed to a Match() function to be used in place of LIKE but you’ll probably want to compile and cache the Regex object because it’s a fairly complex scripting language. A regex replace function would also be useful with the same caveat about caching.

I haven’t done enough SQL CLR to know for sure but I’m assuming you can use a static instance of System.Runtime.Caching.ObjectCache. You can build your own cache but it’s a fairly complicated thing to get right involving synchronization, weak references and other somewhat esoteric tricks to avoid memory leaks.

There’s even a decent chance that regex matches on larger text blobs on the SQL side might cause less load on the SQL server than a client-side filter because of all the separate requests necessary to get at the contents of a blob that’s over the in-row storage limit.

Nubar:
Oracle already has regexp support:

http://docs.oracle.com/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm

and it follows the POSIX standard too.

Steven:
Should SQL Support Regular Expressions? I think so. I was out yesterday looking at SQL CLR and trying to decide if I should use it and want I wanted was Regular Expressions. Unfortunately, SQL CLR is not something that I want to use because of the following reasons.

  1. I don’t like to make my code even more locked in to Microsoft.
  2. SQL CLR doesn’t use the same routines that VB.Net does and so you have to learn another set of functions.

But, if Regular Expressions were made some thing that SQL, both SQL Server and Oracle, had it would be so much better.


Walden:
I’ve got to admit, I read your article this morning and was saying to myself "Huh? But I use regexes in my SQL statements." Then I remembered that we did exactly what you’re suggesting could be done, we wrote a couple CLR extensions that perform the regex testing for us. This seems to fit the bill for our needs and is simple enough to do that it could be done by anyone. Therefore I’d argue that there are other more pressing issues for the SQL Server team to work on and time should be spent on those issues, and people can extend SQL Server with the CLR if they way regexes.

Chris:
PostgreSQL actually supports regular expressions and tries to use indexes where this is possible. Indexes can be used where the beginning of the string is an exact match, as part of the regex.

Clever uses of functions like reverse (to reverse the order of characters in the string) combined with function-based indexes allow you to use indexes essentially where you are matching the end of the string as well.


I actually use regular expressions on PostgreSQL from time to time. I find them very very helpful when they are needed. My only hesitation would be the fact that I don’t know enough about Microsoft SQL Server indexing to know how one might be able to make this index friendly.

My guess based on my limited knowledge is that you’d have to add a computed column that would then be indexed.

Maurice:
There is plently of source code written in CLR that support this requirement. It was not supplied by Microsoft but was implement in the way the CLR support regular expressions. So this solution should be in no way different as the one Microsoft would supply except may be in the function names. Here is an example.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

If you want to write your own CLR implementation of Regular Expressions, the link from Maurice to the MSDN Magazine will get you there. I’m still open for anyone with ideas for 2012 goals/resolutions. If you have something others may want to consider, send it on in to me at btaylor@sswug,org.

Cheers,

Ben

$$SWYNK$$

Featured Article(s)
Building a Solid and Workable Disaster Recovery Strategy
Even for those of you who are leveraging cloud solutions for hosting your infrastructure, you still need to consider what happens if your provider has a major outage.

Featured White Paper(s)
Structuring the Unstructured: How to Dimensionalize Semi-Structured Business Data
Written by Interactive Edge

The Business Intelligence industry … (read more)