[sql-server] SQL Server Regular expressions in T-SQL

Is there any regular expression library written in T-SQL (no CLR, no extended SP, pure T-SQL) for SQL Server, and that should work with shared hosting?

Edit:

  • Thanks, I know about PATINDEX, LIKE, xp_ sps and CLR solutions
  • I also know it is not the best place for regex, the question is theoretical :)
  • Reduced functionality is also accepted

This question is related to sql-server regex tsql

The answer is


If you are using SQL Server 2016 or above, you can use sp_execute_external_script along with R. It has functions for Regular Expression searches, such as grep and grepl.

Here's an example for email addresses. I'll query some "people" via the SQL Server database engine, pass the data for those people to R, let R decide which people have invalid email addresses, and have R pass back that subset of people to SQL Server. The "people" are from the [Application].[People] table in the [WideWorldImporters] sample database. They get passed to the R engine as a dataframe named InputDataSet. R uses the grepl function with the "not" operator (exclamation point!) to find which people have email addresses that don't match the RegEx string search pattern.

EXEC sp_execute_external_script 
 @language = N'R',
 @script = N' RegexWithR <- InputDataSet;
OutputDataSet <- RegexWithR[!grepl("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,4}))", RegexWithR$EmailAddress), ];',
 @input_data_1 = N'SELECT PersonID, FullName, EmailAddress FROM Application.People'
 WITH RESULT SETS (([PersonID] INT, [FullName] NVARCHAR(50), [EmailAddress] NVARCHAR(256)))

Note that the appropriate features must be installed on the SQL Server host. For SQL Server 2016, it is called "SQL Server R Services". For SQL Server 2017, it was renamed to "SQL Server Machine Learning Services".

Closing Thoughts Microsoft's implementation of SQL (T-SQL) doesn't have native support for RegEx. This proposed solution may not be any more desirable to the OP than the use of a CLR stored procedure. But it does offer an additional way to approach the problem.


How about the PATINDEX function?

The pattern matching in TSQL is not a complete regex library, but it gives you the basics.

(From Books Online)

Wildcard  Meaning  
% Any string of zero or more characters.

_ Any single character.

[ ] Any single character within the specified range 
    (for example, [a-f]) or set (for example, [abcdef]).

[^] Any single character not within the specified range 
    (for example, [^a - f]) or set (for example, [^abcdef]).

In case anyone else is still looking at this question, http://www.sqlsharp.com/ is a free, easy way to add regular expression CLR functions into your database.


There is some basic pattern matching available through using LIKE, where % matches any number and combination of characters, _ matches any one character, and [abc] could match a, b, or c... There is more info on the MSDN site.


If you are using SQL Server 2016 or above, you can use sp_execute_external_script along with R. It has functions for Regular Expression searches, such as grep and grepl.

Here's an example for email addresses. I'll query some "people" via the SQL Server database engine, pass the data for those people to R, let R decide which people have invalid email addresses, and have R pass back that subset of people to SQL Server. The "people" are from the [Application].[People] table in the [WideWorldImporters] sample database. They get passed to the R engine as a dataframe named InputDataSet. R uses the grepl function with the "not" operator (exclamation point!) to find which people have email addresses that don't match the RegEx string search pattern.

EXEC sp_execute_external_script 
 @language = N'R',
 @script = N' RegexWithR <- InputDataSet;
OutputDataSet <- RegexWithR[!grepl("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,4}))", RegexWithR$EmailAddress), ];',
 @input_data_1 = N'SELECT PersonID, FullName, EmailAddress FROM Application.People'
 WITH RESULT SETS (([PersonID] INT, [FullName] NVARCHAR(50), [EmailAddress] NVARCHAR(256)))

Note that the appropriate features must be installed on the SQL Server host. For SQL Server 2016, it is called "SQL Server R Services". For SQL Server 2017, it was renamed to "SQL Server Machine Learning Services".

Closing Thoughts Microsoft's implementation of SQL (T-SQL) doesn't have native support for RegEx. This proposed solution may not be any more desirable to the OP than the use of a CLR stored procedure. But it does offer an additional way to approach the problem.


There is some basic pattern matching available through using LIKE, where % matches any number and combination of characters, _ matches any one character, and [abc] could match a, b, or c... There is more info on the MSDN site.


If anybody is interested in using regex with CLR here is a solution. The function below (C# .net 4.5) returns a 1 if the pattern is matched and a 0 if the pattern is not matched. I use it to tag lines in sub queries. The SQLfunction attribute tells sql server that this method is the actual UDF that SQL server will use. Save the file as a dll in a place where you can access it from management studio.

// default using statements above
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;

namespace CLR_Functions
{   
    public class myFunctions
    {
        [SqlFunction]
        public static SqlInt16 RegexContain(SqlString text, SqlString pattern)
        {            
            SqlInt16 returnVal = 0;
            try
            {
                string myText = text.ToString();
                string myPattern = pattern.ToString();
                MatchCollection mc = Regex.Matches(myText, myPattern);
                if (mc.Count > 0)
                {
                    returnVal = 1;
                }
            }
            catch
            {
                returnVal = 0;
            }

            return returnVal;
        }
    }
}

In management studio import the dll file via programability -- assemblies -- new assembly

Then run this query:

CREATE FUNCTION RegexContain(@text NVARCHAR(50), @pattern NVARCHAR(50))
RETURNS smallint 
AS
EXTERNAL NAME CLR_Functions.[CLR_Functions.myFunctions].RegexContain

Then you should have complete access to the function via the database you stored the assembly in.

Then use in queries like so:

SELECT * 
FROM 
(
    SELECT
        DailyLog.Date,
        DailyLog.Researcher,
        DailyLog.team,
        DailyLog.field,
        DailyLog.EntityID,
        DailyLog.[From],
        DailyLog.[To],
        dbo.RegexContain(Researcher, '[\p{L}\s]+') as 'is null values'
    FROM [DailyOps].[dbo].[DailyLog]
) AS a
WHERE a.[is null values] = 0

How about the PATINDEX function?

The pattern matching in TSQL is not a complete regex library, but it gives you the basics.

(From Books Online)

Wildcard  Meaning  
% Any string of zero or more characters.

_ Any single character.

[ ] Any single character within the specified range 
    (for example, [a-f]) or set (for example, [abcdef]).

[^] Any single character not within the specified range 
    (for example, [^a - f]) or set (for example, [^abcdef]).

You can use VBScript regular expression features using OLE Automation. This is way better than the overhead of creating and maintaining an assembly. Please make sure you go through the comments section to get a better modified version of the main one.

http://blogs.msdn.com/b/khen1234/archive/2005/05/11/416392.aspx

DECLARE @obj INT, @res INT, @match BIT;
DECLARE @pattern varchar(255) = '<your regex pattern goes here>';
DECLARE @matchstring varchar(8000) = '<string to search goes here>';
SET @match = 0;

-- Create a VB script component object
EXEC @res = sp_OACreate 'VBScript.RegExp', @obj OUT;

-- Apply/set the pattern to the RegEx object
EXEC @res = sp_OASetProperty @obj, 'Pattern', @pattern;

-- Set any other settings/properties here
EXEC @res = sp_OASetProperty @obj, 'IgnoreCase', 1;

-- Call the method 'Test' to find a match
EXEC @res = sp_OAMethod @obj, 'Test', @match OUT, @matchstring;

-- Don't forget to clean-up
EXEC @res = sp_OADestroy @obj;

If you get SQL Server blocked access to procedure 'sys.sp_OACreate'... error, use sp_reconfigure to enable Ole Automation Procedures. (Yes, unfortunately that is a server level change!)

More information about the Test method is available here

Happy coding


How about the PATINDEX function?

The pattern matching in TSQL is not a complete regex library, but it gives you the basics.

(From Books Online)

Wildcard  Meaning  
% Any string of zero or more characters.

_ Any single character.

[ ] Any single character within the specified range 
    (for example, [a-f]) or set (for example, [abcdef]).

[^] Any single character not within the specified range 
    (for example, [^a - f]) or set (for example, [^abcdef]).

There is some basic pattern matching available through using LIKE, where % matches any number and combination of characters, _ matches any one character, and [abc] could match a, b, or c... There is more info on the MSDN site.


If anybody is interested in using regex with CLR here is a solution. The function below (C# .net 4.5) returns a 1 if the pattern is matched and a 0 if the pattern is not matched. I use it to tag lines in sub queries. The SQLfunction attribute tells sql server that this method is the actual UDF that SQL server will use. Save the file as a dll in a place where you can access it from management studio.

// default using statements above
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;

namespace CLR_Functions
{   
    public class myFunctions
    {
        [SqlFunction]
        public static SqlInt16 RegexContain(SqlString text, SqlString pattern)
        {            
            SqlInt16 returnVal = 0;
            try
            {
                string myText = text.ToString();
                string myPattern = pattern.ToString();
                MatchCollection mc = Regex.Matches(myText, myPattern);
                if (mc.Count > 0)
                {
                    returnVal = 1;
                }
            }
            catch
            {
                returnVal = 0;
            }

            return returnVal;
        }
    }
}

In management studio import the dll file via programability -- assemblies -- new assembly

Then run this query:

CREATE FUNCTION RegexContain(@text NVARCHAR(50), @pattern NVARCHAR(50))
RETURNS smallint 
AS
EXTERNAL NAME CLR_Functions.[CLR_Functions.myFunctions].RegexContain

Then you should have complete access to the function via the database you stored the assembly in.

Then use in queries like so:

SELECT * 
FROM 
(
    SELECT
        DailyLog.Date,
        DailyLog.Researcher,
        DailyLog.team,
        DailyLog.field,
        DailyLog.EntityID,
        DailyLog.[From],
        DailyLog.[To],
        dbo.RegexContain(Researcher, '[\p{L}\s]+') as 'is null values'
    FROM [DailyOps].[dbo].[DailyLog]
) AS a
WHERE a.[is null values] = 0

There is some basic pattern matching available through using LIKE, where % matches any number and combination of characters, _ matches any one character, and [abc] could match a, b, or c... There is more info on the MSDN site.


How about the PATINDEX function?

The pattern matching in TSQL is not a complete regex library, but it gives you the basics.

(From Books Online)

Wildcard  Meaning  
% Any string of zero or more characters.

_ Any single character.

[ ] Any single character within the specified range 
    (for example, [a-f]) or set (for example, [abcdef]).

[^] Any single character not within the specified range 
    (for example, [^a - f]) or set (for example, [^abcdef]).

In case anyone else is still looking at this question, http://www.sqlsharp.com/ is a free, easy way to add regular expression CLR functions into your database.


You can use VBScript regular expression features using OLE Automation. This is way better than the overhead of creating and maintaining an assembly. Please make sure you go through the comments section to get a better modified version of the main one.

http://blogs.msdn.com/b/khen1234/archive/2005/05/11/416392.aspx

DECLARE @obj INT, @res INT, @match BIT;
DECLARE @pattern varchar(255) = '<your regex pattern goes here>';
DECLARE @matchstring varchar(8000) = '<string to search goes here>';
SET @match = 0;

-- Create a VB script component object
EXEC @res = sp_OACreate 'VBScript.RegExp', @obj OUT;

-- Apply/set the pattern to the RegEx object
EXEC @res = sp_OASetProperty @obj, 'Pattern', @pattern;

-- Set any other settings/properties here
EXEC @res = sp_OASetProperty @obj, 'IgnoreCase', 1;

-- Call the method 'Test' to find a match
EXEC @res = sp_OAMethod @obj, 'Test', @match OUT, @matchstring;

-- Don't forget to clean-up
EXEC @res = sp_OADestroy @obj;

If you get SQL Server blocked access to procedure 'sys.sp_OACreate'... error, use sp_reconfigure to enable Ole Automation Procedures. (Yes, unfortunately that is a server level change!)

More information about the Test method is available here

Happy coding


Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to tsql

Passing multiple values for same variable in stored procedure Count the Number of Tables in a SQL Server Database Change Date Format(DD/MM/YYYY) in SQL SELECT Statement Stored procedure with default parameters Format number as percent in MS SQL Server EXEC sp_executesql with multiple parameters SQL Server after update trigger How to compare datetime with only date in SQL Server Text was truncated or one or more characters had no match in the target code page including the primary key in an unpivot Printing integer variable and string on same line in SQL