[java] How to implement a SQL like 'LIKE' operator in java?

I need a comparator in java which has the same semantics as the sql 'like' operator. For example:

myComparator.like("digital","%ital%");
myComparator.like("digital","%gi?a%");
myComparator.like("digital","digi%");

should evaluate to true, and

myComparator.like("digital","%cam%");
myComparator.like("digital","tal%");

should evaluate to false. Any ideas how to implement such a comparator or does anyone know an implementation with the same semantics? Can this be done using a regular expression?

This question is related to java sql regex string wildcard

The answer is


Apache Cayanne ORM has an "In memory evaluation"

It may not work for unmapped object, but looks promising:

Expression exp = ExpressionFactory.likeExp("artistName", "A%");   
List startWithA = exp.filterObjects(artists); 

This's my take on this, it's in Kotlin but can be converted to Java with little effort:

val percentageRegex = Regex("""(?<!\\)%""")
val underscoreRegex = Regex("""(?<!\\)_""")

infix fun String.like(predicate: String): Boolean {

    //Split the text by every % not preceded by a slash.
    //We transform each slice before joining them with .* as a separator.
    return predicate.split(percentageRegex).joinToString(".*") { percentageSlice ->

        //Split the text by every _ not preceded by a slash.
        //We transform each slice before joining them with . as a separator.
        percentageSlice.split(underscoreRegex).joinToString(".") { underscoreSlice ->

            //Each slice is wrapped in "Regex quotes" to ignore all
            // the metacharacters they contain.
            //We also remove the slashes from the escape sequences
            // since they are now treated literally.
            Pattern.quote(
                underscoreSlice.replace("\\_", "_").replace("\\%", "%")
            )
        }

    }.let { "^$it$" }.toRegex().matches(this@like)
}

It might not be the most performant of all the solutions here, but it's probably the most accurate.

It ignores all the other Regex metacharacters other than % and _ and also supports escaping them with a slash.


You could turn '%string%' to contains(), 'string%' to startsWith() and '%string"' to endsWith().

You should also run toLowerCase() on both the string and pattern as LIKE is case-insenstive.

Not sure how you'd handle '%string%other%' except with a Regular Expression though.

If you're using Regular Expressions:


public static boolean like(String source, String exp) {
        if (source == null || exp == null) {
            return false;
        }

        int sourceLength = source.length();
        int expLength = exp.length();

        if (sourceLength == 0 || expLength == 0) {
            return false;
        }

        boolean fuzzy = false;
        char lastCharOfExp = 0;
        int positionOfSource = 0;

        for (int i = 0; i < expLength; i++) {
            char ch = exp.charAt(i);

            // ????
            boolean escape = false;
            if (lastCharOfExp == '\\') {
                if (ch == '%' || ch == '_') {
                    escape = true;
                    // System.out.println("escape " + ch);
                }
            }

            if (!escape && ch == '%') {
                fuzzy = true;
            } else if (!escape && ch == '_') {
                if (positionOfSource >= sourceLength) {
                    return false;
                }

                positionOfSource++;// <<<----- ???1
            } else if (ch != '\\') {// ????,?????????
                if (positionOfSource >= sourceLength) {// ?????source????
                    return false;
                }

                if (lastCharOfExp == '%') { // ??????%,?????
                    int tp = source.indexOf(ch);
                    // System.out.println("?????=%,?????=" + ch + ",position=" + position + ",tp=" + tp);

                    if (tp == -1) { // ????????,????
                        return false;
                    }

                    if (tp >= positionOfSource) {
                        positionOfSource = tp + 1;// <<<----- ????

                        if (i == expLength - 1 && positionOfSource < sourceLength) { // exp??????????,????source?????????
                            return false;
                        }
                    } else {
                        return false;
                    }
                } else if (source.charAt(positionOfSource) == ch) {// ????????ch??
                    positionOfSource++;
                } else {
                    return false;
                }
            }

            lastCharOfExp = ch;// <<<----- ??
            // System.out.println("?????=" + ch + ",position=" + position);
        }

        // expr???????,???????,??source???????????source???
        if (!fuzzy && positionOfSource < sourceLength) {
            // System.out.println("?????=" + lastChar + ",position=" + position );

            return false;
        }

        return true;// ????true
    }
Assert.assertEquals(true, like("abc_d", "abc\\_d"));
        Assert.assertEquals(true, like("abc%d", "abc\\%%d"));
        Assert.assertEquals(false, like("abcd", "abc\\_d"));

        String source = "1abcd";
        Assert.assertEquals(true, like(source, "_%d"));
        Assert.assertEquals(false, like(source, "%%a"));
        Assert.assertEquals(false, like(source, "1"));
        Assert.assertEquals(true, like(source, "%d"));
        Assert.assertEquals(true, like(source, "%%%%"));
        Assert.assertEquals(true, like(source, "1%_"));
        Assert.assertEquals(false, like(source, "1%_2"));
        Assert.assertEquals(false, like(source, "1abcdef"));
        Assert.assertEquals(true, like(source, "1abcd"));
        Assert.assertEquals(false, like(source, "1abcde"));

        // ????case?????
        Assert.assertEquals(true, like(source, "_%_"));
        Assert.assertEquals(true, like(source, "_%____"));
        Assert.assertEquals(true, like(source, "_____"));// 5?
        Assert.assertEquals(false, like(source, "___"));// 3?
        Assert.assertEquals(false, like(source, "__%____"));// 6?
        Assert.assertEquals(false, like(source, "1"));

        Assert.assertEquals(false, like(source, "a_%b"));
        Assert.assertEquals(true, like(source, "1%"));
        Assert.assertEquals(false, like(source, "d%"));
        Assert.assertEquals(true, like(source, "_%"));
        Assert.assertEquals(true, like(source, "_abc%"));
        Assert.assertEquals(true, like(source, "%d"));
        Assert.assertEquals(true, like(source, "%abc%"));
        Assert.assertEquals(false, like(source, "ab_%"));

        Assert.assertEquals(true, like(source, "1ab__"));
        Assert.assertEquals(true, like(source, "1ab__%"));
        Assert.assertEquals(false, like(source, "1ab___"));
        Assert.assertEquals(true, like(source, "%"));

        Assert.assertEquals(false, like(null, "1ab___"));
        Assert.assertEquals(false, like(source, null));
        Assert.assertEquals(false, like(source, ""));

public static boolean like(String toBeCompare, String by){
    if(by != null){
        if(toBeCompare != null){
            if(by.startsWith("%") && by.endsWith("%")){
                int index = toBeCompare.toLowerCase().indexOf(by.replace("%", "").toLowerCase());
                if(index < 0){
                    return false;
                } else {
                    return true;
                }
            } else if(by.startsWith("%")){
                return toBeCompare.endsWith(by.replace("%", ""));
            } else if(by.endsWith("%")){
                return toBeCompare.startsWith(by.replace("%", ""));
            } else {
                return toBeCompare.equals(by.replace("%", ""));
            }
        } else {
            return false;
        }
    } else {
        return false;
    }
}

may be help you


Ok this is a bit of a weird solution, but I thought it should still be mentioned.

Instead of recreating the like mechanism we can utilize the existing implementation already available in any database!

(Only requirement is, your application must have access to any database).

Just run a very simple query each time,that returns true or false depending on the result of the like's comparison. Then execute the query, and read the answer directly from the database!

For Oracle db:

SELECT
CASE 
     WHEN 'StringToSearch' LIKE 'LikeSequence' THEN 'true'
     ELSE 'false'
 END test
FROM dual 

For MS SQL Server

SELECT
CASE 
     WHEN 'StringToSearch' LIKE 'LikeSequence' THEN 'true'
     ELSE 'false'
END test

All you have to do is replace "StringToSearch" and "LikeSequence" with bind parameters and set the values you want to check.


Java strings have .startsWith() and .contains() methods which will get you most of the way. For anything more complicated you'd have to use regex or write your own method.


Regular expressions are the most versatile. However, some LIKE functions can be formed without regular expressions. e.g.

String text = "digital";
text.startsWith("dig"); // like "dig%"
text.endsWith("tal"); // like "%tal"
text.contains("gita"); // like "%gita%"

The Comparator and Comparable interfaces are likely inapplicable here. They deal with sorting, and return integers of either sign, or 0. Your operation is about finding matches, and returning true/false. That's different.


Every SQL reference I can find says the "any single character" wildcard is the underscore (_), not the question mark (?). That simplifies things a bit, since the underscore is not a regex metacharacter. However, you still can't use Pattern.quote() for the reason given by mmyers. I've got another method here for escaping regexes when I might want to edit them afterward. With that out of the way, the like() method becomes pretty simple:

public static boolean like(final String str, final String expr)
{
  String regex = quotemeta(expr);
  regex = regex.replace("_", ".").replace("%", ".*?");
  Pattern p = Pattern.compile(regex,
      Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
  return p.matcher(str).matches();
}

public static String quotemeta(String s)
{
  if (s == null)
  {
    throw new IllegalArgumentException("String cannot be null");
  }

  int len = s.length();
  if (len == 0)
  {
    return "";
  }

  StringBuilder sb = new StringBuilder(len * 2);
  for (int i = 0; i < len; i++)
  {
    char c = s.charAt(i);
    if ("[](){}.*+?$^|#\\".indexOf(c) != -1)
    {
      sb.append("\\");
    }
    sb.append(c);
  }
  return sb.toString();
}

If you really want to use ? for the wildcard, your best bet would be to remove it from the list of metacharacters in the quotemeta() method. Replacing its escaped form -- replace("\\?", ".") -- wouldn't be safe because there might be backslashes in the original expression.

And that brings us to the real problems: most SQL flavors seem to support character classes in the forms [a-z] and [^j-m] or [!j-m], and they all provide a way to escape wildcard characters. The latter is usually done by means of an ESCAPE keyword, which lets you define a different escape character every time. As you can imagine, this complicates things quite a bit. Converting to a regex is probably still the best option, but parsing the original expression will be much harder--in fact, the first thing you would have to do is formalize the syntax of the LIKE-like expressions themselves.


Check out https://github.com/hrakaroo/glob-library-java.

It's a zero dependency library in Java for doing glob (and sql like) type of comparisons. Over a large data set it is faster than translating to a regular expression.

Basic syntax

MatchingEngine m = GlobPattern.compile("dog%cat\%goat_", '%', '_', GlobPattern.HANDLE_ESCAPES);
if (m.matches(str)) { ... }

To implement LIKE functions of sql in java you don't need regular expression in They can be obtained as:

String text = "apple";
text.startsWith("app"); // like "app%"
text.endsWith("le"); // like "%le"
text.contains("ppl"); // like "%ppl%"

Yes, this could be done with a regular expression. Keep in mind that Java's regular expressions have different syntax from SQL's "like". Instead of "%", you would have ".*", and instead of "?", you would have ".".

What makes it somewhat tricky is that you would also have to escape any characters that Java treats as special. Since you're trying to make this analogous to SQL, I'm guessing that ^$[]{}\ shouldn't appear in the regex string. But you will have to replace "." with "\\." before doing any other replacements. (Edit: Pattern.quote(String) escapes everything by surrounding the string with "\Q" and "\E", which will cause everything in the expression to be treated as a literal (no wildcards at all). So you definitely don't want to use it.)

Furthermore, as Dave Webb says, you also need to ignore case.

With that in mind, here's a sample of what it might look like:

public static boolean like(String str, String expr) {
    expr = expr.toLowerCase(); // ignoring locale for now
    expr = expr.replace(".", "\\."); // "\\" is escaped to "\" (thanks, Alan M)
    // ... escape any other potentially problematic characters here
    expr = expr.replace("?", ".");
    expr = expr.replace("%", ".*");
    str = str.toLowerCase();
    return str.matches(expr);
}

http://josql.sourceforge.net/ has what you need. Look for org.josql.expressions.LikeExpression.


i dont know exactly about the greedy issue, but try this if it works for you:

public boolean like(final String str, String expr)
  {
    final String[] parts = expr.split("%");
    final boolean traillingOp = expr.endsWith("%");
    expr = "";
    for (int i = 0, l = parts.length; i < l; ++i)
    {
      final String[] p = parts[i].split("\\\\\\?");
      if (p.length > 1)
      {
        for (int y = 0, l2 = p.length; y < l2; ++y)
        {
          expr += p[y];
          if (i + 1 < l2) expr += ".";
        }
      }
      else
      {
        expr += parts[i];
      }
      if (i + 1 < l) expr += "%";
    }
    if (traillingOp) expr += "%";
    expr = expr.replace("?", ".");
    expr = expr.replace("%", ".*");
    return str.matches(expr);
}

.* will match any characters in regular expressions

I think the java syntax would be

"digital".matches(".*ital.*");

And for the single character match just use a single dot.

"digital".matches(".*gi.a.*");

And to match an actual dot, escape it as slash dot

\.

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to wildcard

Matching strings with wildcard Rename multiple files in cmd Google Spreadsheet, Count IF contains a string using wildcards in LDAP search filters/queries Can the "IN" operator use LIKE-wildcards (%) in Oracle? Using SED with wildcard Need to perform Wildcard (*,?, etc) search on a string using Regex Check if a file exists with wildcard in shell script Pattern matching using a wildcard wildcard * in CSS for classes