[java] How to compare character ignoring case in primitive types

I am writing these lines of code:

String name1 = fname.getText().toString();
String name2 = sname.getText().toString();
aru = 0;

count1 = name1.length();
count2 = name2.length();
for (i = 0; i < count1; i++)
{  
    for (j = 0; j < count2; j++)
    { 
        if (name1.charAt(i)==name2.charAt(j))
            aru++;
    }
    if(aru!=0)
        aru++;
}

I want to compare the Characters of two Strings ignoring the case. Simply using IgnoreCase doesn't work. Adding '65' ASCII value doesn't work either. How do I do this?

This question is related to java string character case-sensitive case-insensitive

The answer is


You can't actually do the job quite right with toLowerCase, either on a string or in a character. The problem is that there are variant glyphs in either upper or lower case, and depending on whether you uppercase or lowercase your glyphs may or may not be preserved. It's not even clear what you mean when you say that two variants of a lower-case glyph are compared ignoring case: are they or are they not the same? (Note that there are also mixed-case glyphs: \u01c5, \u01c8, \u01cb, \u01f2 or ?, ?, ?, ?, but any method suggested here will work on those as long as they should count as the same as their fully upper or full lower case variants.)

There is an additional problem with using Char: there are some 80 code points not representable with a single Char that are upper/lower case variants (40 of each), at least as detected by Java's code point upper/lower casing. You therefore need to get the code points and change the case on these.

But code points don't help with the variant glyphs.

Anyway, here's a complete list of the glyphs that are problematic due to variants, showing how they fare against 6 variant methods:

  1. Character toLowerCase
  2. Character toUpperCase
  3. String toLowerCase
  4. String toUpperCase
  5. String equalsIgnoreCase
  6. Character toLowerCase(toUpperCase) (or vice versa)

For these methods, S means that the variants are treated the same as each other, D means the variants are treated as different from each other.

Behavior     Unicode                             Glyphs
===========  ==================================  =========
1 2 3 4 5 6  Upper  Lower  Var Up Var Lo Vr Lo2  U L u l l2
- - - - - -  ------ ------ ------ ------ ------  - - - - -
D D D D S S  \u0049 \u0069 \u0130 \u0131         I i I i   
S D S D S S  \u004b \u006b \u212a                K k K     
D S D S S S  \u0053 \u0073        \u017f         S s   ?   
D S D S S S  \u039c \u03bc        \u00b5         ? µ   µ   
S D S D S S  \u00c5 \u00e5 \u212b                Å å Å     
D S D S S S  \u0399 \u03b9        \u0345 \u1fbe  ? ?   ? ? 
D S D S S S  \u0392 \u03b2        \u03d0         ? ß   ?   
D S D S S S  \u0395 \u03b5        \u03f5         ? e   ?   
D D D D S S  \u0398 \u03b8 \u03f4 \u03d1         T ? ? ?   
D S D S S S  \u039a \u03ba        \u03f0         ? ?   ?   
D S D S S S  \u03a0 \u03c0        \u03d6         ? p   ?   
D S D S S S  \u03a1 \u03c1        \u03f1         ? ?   ?   
D S D S S S  \u03a3 \u03c3        \u03c2         S s   ?   
D S D S S S  \u03a6 \u03c6        \u03d5         F f   ?   
S D S D S S  \u03a9 \u03c9 \u2126                O ? ?     
D S D S S S  \u1e60 \u1e61        \u1e9b         ? ?   ?   

Complicating this still further is that there is no way to get the Turkish I's right (i.e. the dotted versions are different than the undotted versions) unless you know you're in Turkish; none of these methods give correct behavior and cannot unless you know the locale (i.e. non-Turkish: i and I are the same ignoring case; Turkish, not).

Overall, using toUpperCase gives you the closest approximation, since you have only five uppercase variants (or four, not counting Turkish).

You can also try to specifically intercept those five troublesome cases and call toUpperCase(toLowerCase(c)) on them alone. If you choose your guards carefully (just toUpperCase if c < 0x130 || c > 0x212B, then work through the other alternatives) you can get only a ~20% speed penalty for characters in the low range (as compared to ~4x if you convert single characters to strings and equalsIgnoreCase them) and only about a 2x penalty if you have a lot in the danger zone. You still have the locale problem with dotted I, but otherwise you're in decent shape. Of course if you can use equalsIgnoreCase on a larger string, you're better off doing that.

Here is sample Scala code that does the job:

def elevateCase(c: Char): Char = {
  if (c < 0x130 || c > 0x212B) Character.toUpperCase(c)
  else if (c == 0x130 || c == 0x3F4 || c == 0x2126 || c >= 0x212A)
    Character.toUpperCase(Character.toLowerCase(c))
  else Character.toUpperCase(c)
}

Generic methods to compare a char at a position between 2 strings with ignore case.

public static boolean isEqualIngoreCase(char one, char two){
    return Character.toLowerCase(one)==Character .toLowerCase(two);
}

public static boolean isEqualStringCharIgnoreCase(String one, String two, int position){
    char oneChar = one.charAt(position);
    char twoChar = two.charAt(position);
    return isEqualIngoreCase(oneChar, twoChar);
}

Function call

boolean isFirstCharEqual = isEqualStringCharIgnoreCase("abc", "ABC", 0)

This is how the JDK does it (adapted from OpenJDK 8, String.java/regionMatches):

static boolean charactersEqualIgnoringCase(char c1, char c2) {
  if (c1 == c2) return true;

  // If characters don't match but case may be ignored,
  // try converting both characters to uppercase.
  char u1 = Character.toUpperCase(c1);
  char u2 = Character.toUpperCase(c2);
  if (u1 == u2) return true;

  // Unfortunately, conversion to uppercase does not work properly
  // for the Georgian alphabet, which has strange rules about case
  // conversion.  So we need to make one last check before
  // exiting.
  return Character.toLowerCase(u1) == Character.toLowerCase(u2);
}

I suppose that works for Turkish too.


You have to consider the Turkish I problem when comparing characters/ lowercasing / uppercasing:

I suggest to convert to String and use toLowerCase with invariant culture (in most cases at least).

public final static Locale InvariantLocale = new Locale(Empty, Empty, Empty); str.toLowerCase(InvariantLocale)

See similar C# string.ToLower() and string.ToLowerInvariant()

Note: Don't use String.equalsIgnoreCase http://nikolajlindberg.blogspot.co.il/2008/03/beware-of-java-comparing-turkish.html


You can change the case of String before using it, like this

String name1 = fname.getText().toString().toLowerCase(); 
String name2 = sname.getText().toString().toLowerCase();

Then continue with rest operation.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to character

Set the maximum character length of a UITextField in Swift Max length UITextField Remove last character from string. Swift language Get nth character of a string in Swift programming language How many characters can you store with 1 byte? How to convert integers to characters in C? Converting characters to integers in Java How to check the first character in a string in Bash or UNIX shell? Invisible characters - ASCII How to delete Certain Characters in a excel 2010 cell

Examples related to case-sensitive

Are PostgreSQL column names case-sensitive? How can I convert uppercase letters to lowercase in Notepad++ How do I commit case-sensitive only filename changes in Git? In VBA get rid of the case sensitivity when comparing words? Changing capitalization of filenames in Git How to compare character ignoring case in primitive types Contains case insensitive Should URL be case sensitive? MySQL case sensitive query Are table names in MySQL case sensitive?

Examples related to case-insensitive

Are PostgreSQL column names case-sensitive? How to make String.Contains case insensitive? SQL- Ignore case while searching for a string String contains - ignore case How to compare character ignoring case in primitive types How to perform case-insensitive sorting in JavaScript? Contains case insensitive Case insensitive string as HashMap key Case insensitive searching in Oracle How to replace case-insensitive literal substrings in Java