Replace non ASCII character from string

Question

I have strings   A fun    o    ugent in which I need to replace character like          with empty strings  How can I match only those non ASCII characters  I am using a function public static String matchAndReplaceNonEnglishChar String tmpsrcdta        String newsrcdta   null      char array     Arrays stringToCharArray tmpsrcdta       if  array    null          return newsrcdta       for  int i   0  i  lt  array length  i              int nVal    int  array i           boolean bISO                      Is character ISO control                 Character isISOControl array i            boolean bIgnorable                      Is Ignorable identifier                 Character isIdentifierIgnorable array i               Remove tab and other unwanted characters           if  nVal    9    bISO    bIgnorable              array i                 else if  nVal  gt  255              array i                   newsrcdta   Arrays charArrayToString array        return newsrcdta     but it is not working properly  what improvement it is needed   here I have one more problem is that final string is getting replaced by space character which create the extra space in string

User · Answer

FailedDev s answer is good  but can be improved  If you want to preserve the ascii equivalents  you need to normalize first   String subjectString             subjectString   Normalizer normalize subjectString  Normalizer Form NFD   String resultString   subjectString replaceAll      x00-  x7F            gt  will produce  oau    That way  characters like          will be mapped to  oau   which at least preserves some information  Without normalization  the resulting String will be blank

User · Answer

Updated solution  can be used with  quot Normalize quot   Canonical decomposition  and  quot replaceAll quot   to replace it with the appropriate characters  import java text Normalizer  import java text Normalizer Form  import java util regex Pattern   public final class NormalizeUtils        public static String normalizeASCII final String string            final String normalize   Normalizer normalize string  Form NFD            return Pattern compile  quot   p InCombiningDiacriticalMarks   quot                          matcher normalize                         replaceAll  quot  quot

User · Answer

This would be the Unicode solution  String s    A fun    o    ugent   String r   s replaceAll    P InBasic Latin            p InBasic Latin  is the Unicode block that contains all letters in the Unicode range U 0000  U 007F  see regular-expression info    P InBasic Latin  is the negated  p InBasic Latin

User · Answer

You can try something like this  Special Characters range for alphabets starts from 192  so you can avoid such characters in the result   String name    A fun    o    StringBuilder result   new StringBuilder    for char val   name toCharArray          if val  lt  192  result append val     System out println  Result   result toString

User · Answer

The ASCII table contains 128 codes  with a total of 95 printable characters  of which only 52 characters are letters    0-127  ASCII codes   32-126  printable characters   48-57  digits  0-9   65-90  uppercase letters  A-Z   97-122  lowercase letters  a-z        You can use String codePoints method to get a stream over int values of characters of this string and filter out non-ASCII characters  String str1    quot A fun    o    ugent quot    String str2   str1 codePoints            filter ch - gt  ch  lt  128           mapToObj Character  toString           collect Collectors joining      System out println str2      A funo  ugent  Or you can explicitly specify character ranges  For example filter out everything except letters  String str3   str1 codePoints            filter ch - gt  ch  gt    A   amp  amp  ch  lt    Z                     ch  gt    a   amp  amp  ch  lt    z            mapToObj Character  toString           collect Collectors joining      System out println str3      Afunougent   See also  How do I not take Special Characters in my Password Validation  without Regex

User · Answer

CharMatcher retainFrom can be used  if you re using the Google Guava library  String s    quot A fun    o quot   String stripped   CharMatcher ascii   retainFrom s   System out println stripped      Prints  quot A funo quot

User · Answer

This will search and replace all non ASCII letters   String resultString   subjectString replaceAll      x00-  x7F

User · Answer

Or you can use the function below for removing non-ascii character from the string  You will get know internal working  private static String removeNonASCIIChar String str        StringBuffer buff   new StringBuffer        char chars     str toCharArray         for  int i   0  i  lt  chars length  i              if  0  lt  chars i   amp  amp  chars i   lt  127                buff append chars i                        return buff toString

[java] Replace non ASCII character from string

Examples related to java

Examples related to regex

Examples related to string

Examples related to replace

Examples related to char