Case-insensitive string comparison in C

Question

What is the best way of doing case-insensitive string comparison in C   without transforming a string to all uppercase or all lowercase   Please indicate whether the methods are Unicode-friendly and how portable they are

User · Answer

See std  lexicographical compare      lexicographical compare example  include  lt iostream gt      std  cout  std  boolalpha  include  lt algorithm gt      std  lexicographical compare  include  lt cctype gt      std  tolower     a case-insensitive comparison function  bool mycomp  char c1  char c2        return std  tolower c1   lt  std  tolower c2      int main          char foo      Apple       char bar      apartment        std  cout  lt  lt  std  boolalpha       std  cout  lt  lt   Comparing foo and bar lexicographically  foo  lt  bar   n        std  cout  lt  lt   Using default comparison  operator lt           std  cout  lt  lt  std  lexicographical compare foo  foo   5  bar  bar   9       std  cout  lt  lt    n        std  cout  lt  lt   Using mycomp as comparison object         std  cout  lt  lt  std  lexicographical compare foo  foo   5  bar  bar   9  mycomp       std  cout  lt  lt    n        return 0      Demo

User · Answer

If you have to compare a source string more often with other strings one elegant solution is to use regex   std  wstring first   L Test   std  wstring second   L TEST    std  wregex pattern first  std  wregex  icase   bool isEqual   std  regex match second  pattern

User · Answer

Short and nice  No other dependencies  than extended std C lib   strcasecmp str1 c str    str2 c str       0   returns true if str1 and str2 are equal  strcasecmp may not exist  there could be analogs stricmp  strcmpi  etc   Example code    include  lt iostream gt   include  lt string gt   include  lt string h gt    For strcasecmp    Also could be found in  lt mem h gt   using namespace std       Simple wrapper inline bool str ignoreCase cmp std  string const amp  s1  std  string const amp  s2        if s1 length      s2 length            return false      optimization since std  string holds length in variable      return strcasecmp s1 c str    s2 c str       0         Function object - comparator struct StringCaseInsensetiveCompare       bool operator   std  string const amp  s1  std  string const amp  s2            if s1 length      s2 length                return false      optimization since std  string holds length in variable          return strcasecmp s1 c str    s2 c str       0            bool operator   const char  s1  const char   s2            return strcasecmp s1 s2   0                 Convert bool to string inline char const  bool2str bool b   return b  true   false      int main         cout lt  lt  bool2str strcasecmp  asd   AsD    0   lt  lt endl      cout lt  lt  bool2str strcasecmp string  aasd   c str   string  AasD   c str     0   lt  lt endl      StringCaseInsensetiveCompare cmp      cout lt  lt  bool2str cmp  A   a     lt  lt endl      cout lt  lt  bool2str cmp string  Aaaa   string  aaaA      lt  lt endl      cout lt  lt  bool2str str ignoreCase cmp string  Aaaa   string  aaaA      lt  lt endl      return 0      Output   true true true true true

User · Answer

bool insensitive c compare char A  char B     static char mid c     Z     a     2    Z     static char up2lo    A  -  a       the offset between upper and lowers    if   a   gt   A and A  gt    z  or  A   gt   A and  Z   gt   A        if   a   gt   B and B  gt    z  or  A   gt   B and  Z   gt   B            check that the character is infact a letter            trying to turn a 3 into an E would not be pretty                   if  A  gt  mid c and B  gt  mid c or A  lt  mid c and B  lt  mid c                      return A    B                    else                     if  A  gt  mid c              A   A -  a     A              if  B  gt  mid c     convert all uppercase letters to a lowercase ones             B   B -  a     A                 this could be changed to B   B   up2lo            return A    B                        this could probably be made much more efficient  but here is a bulky version with all its bits bare   not all that portable  but works well with whatever is on my computer  no idea  I am of pictures not words

User · Answer

For my basic case insensitive string comparison needs I prefer not to have to use an external library  nor do I want a separate string class with case insensitive traits that is incompatible with all my other strings   So what I ve come up with is this     bool icasecmp const string amp  l  const string amp  r        return l size      r size            amp  amp  equal l cbegin    l cend    r cbegin                   string  value type l1  string  value type r1                    return toupper l1     toupper r1          bool icasecmp const wstring amp  l  const wstring amp  r        return l size      r size            amp  amp  equal l cbegin    l cend    r cbegin                   wstring  value type l1  wstring  value type r1                    return towupper l1     towupper r1           A simple function with one overload for char and another for whar t  Doesn t use anything non-standard so should be fine on any platform   The equality comparison won t consider issues like variable length encoding and Unicode normalization  but basic string has no support for that that I m aware of anyway and it isn t normally an issue   In cases where more sophisticated lexicographical manipulation of text is required  then you simply have to use a third party library like Boost  which is to be expected

User · Answer

As of early 2013  the ICU project  maintained by IBM  is a pretty good answer to this   http   site icu-project org   ICU is a  complete  portable Unicode library that closely tracks industry standards    For the specific problem of string comparison  the Collation object does what you want   The Mozilla Project adopted ICU for internationalization in Firefox in mid-2012  you can track the engineering discussion  including issues of build systems and data file size  here    https   groups google com forum   topic mozilla dev platform sVVpS2sKODw https   bugzilla mozilla org show bug cgi id 724529  tracker  https   bugzilla mozilla org show bug cgi id 724531  build system

User · Answer

I wrote a case-insensitive version of char traits for use with std  basic string in order to generate a std  string that is not case-sensitive when doing comparisons  searches  etc using the built-in std  basic string member functions   So in other words  I wanted to do something like this   std  string a    Hello  World    std  string b    hello  world     assert  a    b         which std  string can t handle   Here s the usage of my new char traits   std  istring a    Hello  World    std  istring b    hello  world     assert  a    b         and here s the implementation       ---          Case-Insensitive char traits for std  string s          Use               To declare a std  string which preserves case but ignores case in comparisons  amp  search              use the following syntax                   std  basic string lt char  char traits nocase lt char gt   gt  noCaseString               A typedef is declared below which simplifies this use for chars                   typedef std  basic string lt char  char traits nocase lt char gt   gt  istring       ---         template lt class C gt      struct char traits nocase   public std  char traits lt C gt                static bool eq  const C amp  c1  const C amp  c2                          return   toupper c1       toupper c2                       static bool lt  const C amp  c1  const C amp  c2                          return   toupper c1   lt    toupper c2                      static int compare  const C  s1  const C  s2  size t N                         return  strnicmp s1  s2  N                      static const char  find  const C  s  size t N  const C amp  a                         for  size t i 0   i lt N     i                                 if    toupper s i        toupper a                         return s i                             return 0                      static bool eq int type  const int type amp  c1  const int type amp  c2                          return   toupper c1       toupper c2                                  template lt  gt      struct char traits nocase lt wchar t gt    public std  char traits lt wchar t gt                static bool eq  const wchar t amp  c1  const wchar t amp  c2                          return   towupper c1       towupper c2                       static bool lt  const wchar t amp  c1  const wchar t amp  c2                          return   towupper c1   lt    towupper c2                      static int compare  const wchar t  s1  const wchar t  s2  size t N                         return  wcsnicmp s1  s2  N                      static const wchar t  find  const wchar t  s  size t N  const wchar t amp  a                         for  size t i 0   i lt N     i                                 if    towupper s i        towupper a                         return s i                             return 0                      static bool eq int type  const int type amp  c1  const int type amp  c2                          return   towupper c1       towupper c2                                  typedef std  basic string lt char  char traits nocase lt char gt   gt  istring      typedef std  basic string lt wchar t  char traits nocase lt wchar t gt   gt  iwstring

User · Answer

str1 size      str2 size    amp  amp  std  equal str1 begin    str1 end    str2 begin       auto a  auto b  return std  tolower a   std  tolower b       You can use the above code in C  14 if you are not in a position to use boost  You have to use std  towlower for wide chars

User · Answer

Boost includes a handy algorithm for this    include  lt boost algorithm string hpp gt     Or  for fewer header dependencies     include  lt boost algorithm string predicate hpp gt   std  string str1    hello  world    std  string str2    HELLO  WORLD     if  boost  iequals str1  str2            Strings are identical

User · Answer

If you are on a POSIX system  you can use strcasecmp   This function is not part of standard C  though  nor is it available on Windows   This will perform a case-insensitive comparison on 8-bit chars  so long as the locale is POSIX   If the locale is not POSIX  the results are undefined  so it might do a localized compare  or it might not    A wide-character equivalent is not available   Failing that  a large number of historic C library implementations have the functions stricmp   and strnicmp     Visual C   on Windows renamed all of these by prefixing them with an underscore because they aren   t part of the ANSI standard  so on that system they   re called  stricmp or  strnicmp   Some libraries may also have wide-character or multibyte equivalent functions  typically named e g  wcsicmp  mbcsicmp and so on    C and C   are both largely ignorant of internationalization issues  so there s no good solution to this problem  except to use a third-party library   Check out IBM ICU  International Components for Unicode  if you need a robust library for C C     ICU is for both Windows and Unix systems

User · Answer

FYI  strcmp   and stricmp   are vulnerable to buffer overflow  since they just process until they hit a null terminator  It s safer to use  strncmp   and  strnicmp

User · Answer

My first thought for a non-unicode version was to do something like this  bool caseInsensitiveStringCompare const string amp  str1  const string amp  str2        if  str1 size      str2 size              return false            for  string  const iterator c1   str1 begin    c2   str2 begin    c1    str1 end      c1    c2            if  tolower static cast lt unsigned char gt   c1      tolower static cast lt unsigned char gt   c2                  return false                      return true

User · Answer

Looks like above solutions aren t using compare method and implementing total again so here is my solution and hope it works for you  It s working fine     include lt iostream gt   include lt cstring gt   include lt cmath gt  using namespace std  string tolow string a        for unsigned int i 0 i lt a length   i                  a i  tolower a i              return a    int main         string str1 str2      cin gt  gt str1 gt  gt str2      int temp tolow str1  compare tolow str2        if temp gt 0          cout lt  lt 1      else if temp  0          cout lt  lt 0      else         cout lt  lt -1

User · Answer

Take advantage of the standard char traits  Recall that a std  string is in fact a typedef for std  basic string lt char gt   or more explicitly  std  basic string lt char  std  char traits lt char gt   gt   The char traits type describes how characters compare  how they copy  how they cast etc  All you need to do is typedef a new string over basic string  and provide it with your own custom char traits that compare case insensitively   struct ci char traits   public char traits lt char gt        static bool eq char c1  char c2    return toupper c1     toupper c2         static bool ne char c1  char c2    return toupper c1     toupper c2         static bool lt char c1  char c2    return toupper c1   lt   toupper c2         static int compare const char  s1  const char  s2  size t n            while  n--    0                 if  toupper  s1   lt  toupper  s2    return -1              if  toupper  s1   gt  toupper  s2    return 1                s1    s2                    return 0            static const char  find const char  s  int n  char a            while  n--  gt  0  amp  amp  toupper  s     toupper a                    s                    return s            typedef std  basic string lt char  ci char traits gt  ci string    The details are on Guru of The Week number 29

User · Answer

Are you talking about a dumb case insensitive compare or a full normalized Unicode compare   A dumb compare will not find strings that might be the same but are not binary equal    Example   U212B  ANGSTROM SIGN  U0041  LATIN CAPITAL LETTER A    U030A  COMBINING RING ABOVE  U00C5  LATIN CAPITAL LETTER A WITH RING ABOVE     Are all equivalent but they also have different binary representations   That said  Unicode Normalization should be a mandatory read especially if you plan on supporting Hangul  Tha   and other asian languages   Also  IBM pretty much patented most optimized Unicode algorithms and made them publicly available  They also maintain an implementation   IBM ICU

User · Answer

I ve had good experience using the International Components for Unicode libraries - they re extremely powerful  and provide methods for conversion  locale support  date and time rendering  case mapping  which you don t seem to want   and collation  which includes case- and accent-insensitive comparison  and more   I ve only used the C   version of the libraries  but they appear to have a Java version as well    Methods exist to perform normalized compares as referred to by  Coincoin  and can even account for locale - for example  and this a sorting example  not strictly equality   traditionally in Spanish  in Spain   the letter combination  ll  sorts between  l  and  m   so  lz   lt   ll   lt   ma

User · Answer

An easy way to compare strings that are only different by lowercase and capitalized characters is to do an ascii comparison  All capital and lowercase letters differ by 32 bits in the ascii table  using this information we have the following         for  int i   0  i  lt  string2 length    i                 if  string1 i     string2 i     int string1 i      int string2 j   32   int string1 i      int string2 i  -32               count          continue            else              break            if count    string2 length                  then we have a match

User · Answer

A simple way to compare two string in c    tested for windows  is using  stricmp     Case insensitive  could use equivalent  stricmp    result    stricmp  string1  string2        If you are looking to use with std  string  an example   std  string s1   string  Hello    if    stricmp s1 c str     HELLO      0     std  cout  lt  lt   The string are equals      For more information here  https   msdn microsoft com it-it library e0z9k731 aspx

User · Answer

You can use strcasecmp on Unix  or stricmp on Windows   One thing that hasn t been mentioned so far is that if you are using stl strings with these methods  it s useful to first compare the length of the two strings  since this information is already available to you in the string class  This could prevent doing the costly string comparison if the two strings you are comparing aren t even the same length in the first place

User · Answer

If you don t want to use Boost library then here is solution to it using only C   standard io header    include  lt iostream gt   struct iequal       bool operator   int c1  int c2  const                  case insensitive comparison of two characters          return std  toupper c1     std  toupper c2             bool iequals const std  string amp  str1  const std  string amp  str2           use std  equal   to compare range of characters using the functor above      return std  equal str1 begin    str1 end    str2 begin    iequal        int main void        std  string str 1    HELLO       std  string str 2    hello        if iequals str 1 str 2                 std  cout lt  lt  String are equal  lt  lt std  endl                else               std  cout lt  lt  String are not equal  lt  lt std  endl              return 0

User · Answer

The Boost String library has a lot of algorithms for doing case-insenstive comparisons and so on   You could implement your own  but why bother when it s already been done

User · Answer

Visual C   string functions supporting unicode  http   msdn microsoft com en-us library cc194799 aspx  the one you are probably looking for is  wcsnicmp

User · Answer

Assuming you are looking for a method and not a magic function that already exists  there is frankly no better way  We could all write code snippets with clever tricks for limited character sets  but at the end of the day at somepoint you have to convert the characters   The best approach for this conversion is to do so prior to the comparison  This allows you a good deal of flexibility when it comes to encoding schemes  which your actual comparison operator should be ignorant of   You can of course  hide  this conversion behind your own string function or class  but you still need to convert the strings prior to comparison

User · Answer

The trouble with boost is that you have to link with and depend on boost  Not easy in some cases  e g  android   And using char traits means all your comparisons are case insensitive  which isn t usually what you want  This should suffice  It should be reasonably efficient  Doesn t handle unicode or anything though  bool iequals const string amp  a  const string amp  b        unsigned int sz   a size        if  b size      sz          return false      for  unsigned int i   0  i  lt  sz    i          if  tolower a i      tolower b i                return false      return true     Update  Bonus C  14 version   include  lt algorithm gt    bool iequals const string amp  a  const string amp  b        return std  equal a begin    a end                          b begin    b end                             char a  char b                              return tolower a     tolower b

User · Answer

boost  iequals is not utf-8 compatible in the case of string  You can use boost  locale   comparator lt char collator base  secondary gt  cmpr  cout  lt  lt   cmpr str1  str2     str1  lt  str2     str1  gt   str2    lt  lt  endl     Primary -- ignore accents and character case  comparing base letters only  For example  facade  and  Fa  ade  are the same  Secondary -- ignore character case but consider accents   facade  and  fa  ade  are different but  Fa  ade  and  fa  ade  are the same  Tertiary -- consider both case and accents   Fa  ade  and  fa  ade  are different  Ignore punctuation  Quaternary -- consider all case  accents  and punctuation  The words must be identical in terms of Unicode representation  Identical -- as quaternary  but compare code points as well

User · Answer

Just use strcmp   for case sensitive and strcmpi   or stricmp   for case insensitive comparison  Which are both in the header file  lt string h gt   format   int strcmp const char  const char         for case sensitive int strcmpi const char  const char        for case insensitive   Usage   string a  apple  b  ApPlE  c  ball   if strcmpi a c str   b c str     0          if it is a match it will return 0      cout lt  lt a lt  lt   and   lt  lt b lt  lt   are the same  lt  lt   n   if strcmpi a c str   b c str   lt 0      cout lt  lt a 0  lt  lt   comes before ball   lt  lt b 0  lt  lt    so   lt  lt a lt  lt   comes before   lt  lt b    Output  apple and ApPlE are the same  a comes before b  so apple comes before ball

User · Answer

Just a note on whatever method you finally choose  if that method happens to include the use of strcmp that some answers suggest   strcmp doesn t work with Unicode data in general  In general  it doesn t even work with byte-based Unicode encodings  such as utf-8  since strcmp only makes byte-per-byte comparisons and Unicode code points encoded in utf-8 can take more than 1 byte  The only specific Unicode case strcmp properly handle is when a string encoded with a byte-based encoding contains only code points below U 00FF - then the byte-per-byte comparison is enough

User · Answer

Late to the party  but here is a variant that uses std  locale  and thus correctly handles Turkish   auto tolower   std  bind1st      std  mem fun           amp std  ctype lt char gt   tolower        amp std  use facet lt std  ctype lt char gt   gt           std  locale        gives you a functor that uses the active locale to convert characters to lowercase  which you can then use via std  transform to generate lower-case strings   std  string left    fOo   transform left begin    left end    left begin    tolower     This also works for wchar t based strings

User · Answer

Doing this without using Boost can be done by getting the C string pointer with c str   and using strcasecmp   std  string str1   aBcD   std  string str2    AbCd    if  strcasecmp str1 c str    str2 c str       0          case insensitive equal

User · Answer

I m trying to cobble together a good answer from all the posts  so help me edit this   Here is a method of doing this  although it does transforming the strings  and is not Unicode friendly  it should be portable which is a plus   bool caseInsensitiveStringCompare  const std  string amp  str1  const std  string amp  str2         std  string str1Cpy  str1        std  string str2Cpy  str2        std  transform  str1Cpy begin    str1Cpy end    str1Cpy begin      tolower        std  transform  str2Cpy begin    str2Cpy end    str2Cpy begin      tolower        return   str1Cpy    str2Cpy        From what I have read this is more portable than stricmp   because stricmp   is not in fact part of the std library  but only implemented by most compiler vendors   To get a truly Unicode friendly implementation it appears you must go outside the std library  One good 3rd party library is the IBM ICU  International Components for Unicode   Also boost  iequals provides a fairly good utility for doing this sort of comparison

[c++] Case-insensitive string comparison in C++

Examples related to c++

Examples related to string