How to compare character ignoring case in primitive types

Question

I am writing these lines of code   String name1   fname getText   toString    String name2   sname getText   toString    aru   0   count1   name1 length    count2   name2 length    for  i   0  i  lt  count1  i            for  j   0  j  lt  count2  j                   if  name1 charAt i   name2 charAt j               aru              if aru  0          aru        I want to compare the Characters of two Strings ignoring the case  Simply using IgnoreCase doesn t work  Adding  65  ASCII value doesn t work either  How do I do this

User · Answer

You can t actually do the job quite right with toLowerCase  either on a string or in a character   The problem is that there are variant glyphs in either upper or lower case  and depending on whether you uppercase or lowercase your glyphs may or may not be preserved   It s not even clear what you mean when you say that two variants of a lower-case glyph are compared ignoring case  are they or are they not the same    Note that there are also mixed-case glyphs   u01c5   u01c8   u01cb   u01f2 or             but any method suggested here will work on those as long as they should count as the same as their fully upper or full lower case variants    There is an additional problem with using Char  there are some 80 code points not representable with a single Char that are upper lower case variants  40 of each   at least as detected by Java s code point upper lower casing   You therefore need to get the code points and change the case on these   But code points don t help with the variant glyphs   Anyway  here s a complete list of the glyphs that are problematic due to variants  showing how they fare against 6 variant methods    Character toLowerCase Character toUpperCase String toLowerCase String toUpperCase String equalsIgnoreCase Character toLowerCase toUpperCase   or vice versa    For these methods  S means that the variants are treated the same as each other  D means the variants are treated as different from each other   Behavior     Unicode                             Glyphs                                                            1 2 3 4 5 6  Upper  Lower  Var Up Var Lo Vr Lo2  U L u l l2 - - - - - -  ------ ------ ------ ------ ------  - - - - - D D D D S S   u0049  u0069  u0130  u0131         I i I i    S D S D S S   u004b  u006b  u212a                K k K      D S D S S S   u0053  u0073         u017f         S s        D S D S S S   u039c  u03bc         u00b5                      S D S D S S   u00c5  u00e5  u212b                              D S D S S S   u0399  u03b9         u0345  u1fbe             D S D S S S   u0392  u03b2         u03d0                     D S D S S S   u0395  u03b5         u03f5           e        D D D D S S   u0398  u03b8  u03f4  u03d1         T          D S D S S S   u039a  u03ba         u03f0                    D S D S S S   u03a0  u03c0         u03d6           p        D S D S S S   u03a1  u03c1         u03f1                    D S D S S S   u03a3  u03c3         u03c2         S s        D S D S S S   u03a6  u03c6         u03d5         F f        S D S D S S   u03a9  u03c9  u2126                O          D S D S S S   u1e60  u1e61         u1e9b                      Complicating this still further is that there is no way to get the Turkish I s right  i e  the dotted versions are different than the undotted versions  unless you know you re in Turkish  none of these methods give correct behavior and cannot unless you know the locale  i e  non-Turkish  i and I are the same ignoring case  Turkish  not    Overall  using toUpperCase gives you the closest approximation  since you have only five uppercase variants  or four  not counting Turkish    You can also try to specifically intercept those five troublesome cases and call toUpperCase toLowerCase c   on them alone   If you choose your guards carefully  just toUpperCase if c  lt  0x130    c  gt  0x212B  then work through the other alternatives  you can get only a  20  speed penalty for characters in the low range  as compared to  4x if you convert single characters to strings and equalsIgnoreCase them  and only about a 2x penalty if you have a lot in the danger zone   You still have the locale problem with dotted I  but otherwise you re in decent shape   Of course if you can use equalsIgnoreCase on a larger string  you re better off doing that   Here is sample Scala code that does the job   def elevateCase c  Char   Char       if  c  lt  0x130    c  gt  0x212B  Character toUpperCase c    else if  c    0x130    c    0x3F4    c    0x2126    c  gt   0x212A      Character toUpperCase Character toLowerCase c     else Character toUpperCase c

User · Answer

This is how the JDK does it  adapted from OpenJDK 8  String java regionMatches   static boolean charactersEqualIgnoringCase char c1  char c2      if  c1    c2  return true        If characters don t match but case may be ignored       try converting both characters to uppercase    char u1   Character toUpperCase c1     char u2   Character toUpperCase c2     if  u1    u2  return true        Unfortunately  conversion to uppercase does not work properly      for the Georgian alphabet  which has strange rules about case      conversion   So we need to make one last check before      exiting    return Character toLowerCase u1     Character toLowerCase u2      I suppose that works for Turkish too

User · Answer

The Character class of Java API has various functions you can use   You can convert your char to lowercase at both sides   Character toLowerCase name1 charAt i      Character toLowerCase name2 charAt j     There are also a methods you can use to verify if the letter is uppercase or lowercase   Character isUpperCase  P   Character isLowerCase  P

User · Answer

Generic methods to compare a char at a position between 2 strings with ignore case   public static boolean isEqualIngoreCase char one  char two       return Character toLowerCase one   Character  toLowerCase two      public static boolean isEqualStringCharIgnoreCase String one  String two  int position       char oneChar   one charAt position       char twoChar   two charAt position       return isEqualIngoreCase oneChar  twoChar       Function call  boolean isFirstCharEqual   isEqualStringCharIgnoreCase  abc    ABC   0

User · Answer

You can change the case of String before using it  like this  String name1   fname getText   toString   toLowerCase     String name2   sname getText   toString   toLowerCase      Then continue with rest operation

User · Answer

You have to consider the Turkish I problem when comparing characters  lowercasing   uppercasing   I suggest to convert to String and use toLowerCase with invariant culture  in most cases at least    public final static Locale InvariantLocale   new Locale Empty  Empty  Empty   str toLowerCase InvariantLocale   See similar C   string ToLower   and string ToLowerInvariant    Note  Don t use String equalsIgnoreCase http   nikolajlindberg blogspot co il 2008 03 beware-of-java-comparing-turkish html

[java] How to compare character ignoring case in primitive types

Examples related to java

Examples related to string

Examples related to character

Examples related to case-sensitive

Examples related to case-insensitive