[java] Replace non ASCII character from string

I have strings A função, Ãugent in which I need to replace character like ç,ã,Ã with empty strings.

How can I match only those non ASCII characters?

I am using a function

public static String matchAndReplaceNonEnglishChar(String tmpsrcdta) {
    String newsrcdta = null;
    char array[] = Arrays.stringToCharArray(tmpsrcdta);
    if (array == null)
        return newsrcdta;

    for (int i = 0; i < array.length; i++) {
        int nVal = (int) array[i];
        boolean bISO =
                // Is character ISO control
                Character.isISOControl(array[i]);
        boolean bIgnorable =
                // Is Ignorable identifier
                Character.isIdentifierIgnorable(array[i]);
        // Remove tab and other unwanted characters..
        if (nVal == 9 || bISO || bIgnorable)
            array[i] = ' ';
        else if (nVal > 255)
            array[i] = ' ';
    }
    newsrcdta = Arrays.charArrayToString(array);

    return newsrcdta;
}

but it is not working properly..what improvement it is needed...here I have one more problem is that final string is getting replaced by space character which create the extra space in string.

This question is related to java regex string replace char

The answer is


Or you can use the function below for removing non-ascii character from the string. You will get know internal working.

private static String removeNonASCIIChar(String str) {
    StringBuffer buff = new StringBuffer();
    char chars[] = str.toCharArray();

    for (int i = 0; i < chars.length; i++) {
        if (0 < chars[i] && chars[i] < 127) {
            buff.append(chars[i]);
        }
    }
    return buff.toString();
}

CharMatcher.retainFrom can be used, if you're using the Google Guava library:

String s = "A função";
String stripped = CharMatcher.ascii().retainFrom(s);
System.out.println(stripped); // Prints "A funo"

[Updated solution]

can be used with "Normalize" (Canonical decomposition) and "replaceAll", to replace it with the appropriate characters.

import java.text.Normalizer;
import java.text.Normalizer.Form;
import java.util.regex.Pattern;

public final class NormalizeUtils {

    public static String normalizeASCII(final String string) {
        final String normalize = Normalizer.normalize(string, Form.NFD);

        return Pattern.compile("\\p{InCombiningDiacriticalMarks}+")
                      .matcher(normalize)
                      .replaceAll("");
    } ...

The ASCII table contains 128 codes, with a total of 95 printable characters, of which only 52 characters are letters:

  • [0-127] ASCII codes
    • [32-126] printable characters
      • [48-57] digits [0-9]
      • [65-90] uppercase letters [A-Z]
      • [97-122] lowercase letters [a-z]

You can use String.codePoints method to get a stream over int values of characters of this string and filter out non-ASCII characters:

String str1 = "A função, Ãugent";

String str2 = str1.codePoints()
        .filter(ch -> ch < 128)
        .mapToObj(Character::toString)
        .collect(Collectors.joining());

System.out.println(str2); // A funo, ugent

Or you can explicitly specify character ranges. For example filter out everything except letters:

String str3 = str1.codePoints()
        .filter(ch -> ch >= 'A' && ch <= 'Z'
                || ch >= 'a' && ch <= 'z')
        .mapToObj(Character::toString)
        .collect(Collectors.joining());

System.out.println(str3); // Afunougent

See also: How do I not take Special Characters in my Password Validation (without Regex)?


This would be the Unicode solution

String s = "A função, Ãugent";
String r = s.replaceAll("\\P{InBasic_Latin}", "");

\p{InBasic_Latin} is the Unicode block that contains all letters in the Unicode range U+0000..U+007F (see regular-expression.info)

\P{InBasic_Latin} is the negated \p{InBasic_Latin}


You can try something like this. Special Characters range for alphabets starts from 192, so you can avoid such characters in the result.

String name = "A função";

StringBuilder result = new StringBuilder();
for(char val : name.toCharArray()) {
    if(val < 192) result.append(val);
}
System.out.println("Result "+result.toString());

FailedDev's answer is good, but can be improved. If you want to preserve the ascii equivalents, you need to normalize first:

String subjectString = "öäü";
subjectString = Normalizer.normalize(subjectString, Normalizer.Form.NFD);
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");

=> will produce "oau"

That way, characters like "öäü" will be mapped to "oau", which at least preserves some information. Without normalization, the resulting String will be blank.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to replace

How do I find and replace all occurrences (in all files) in Visual Studio Code? How to find and replace with regex in excel How to replace text in a column of a Pandas dataframe? How to replace negative numbers in Pandas Data Frame by zero Replacing few values in a pandas dataframe column with another value How to replace multiple patterns at once with sed? Using tr to replace newline with space replace special characters in a string python Replace None with NaN in pandas dataframe Batch script to find and replace a string in text file within a minute for files up to 12 MB

Examples related to char

How can I convert a char to int in Java? C# - How to convert string to char? How to take character input in java Char Comparison in C Convert Char to String in C cannot convert 'std::basic_string<char>' to 'const char*' for argument '1' to 'int system(const char*)' How to get the real and total length of char * (char array)? Why is conversion from string constant to 'char*' valid in C but invalid in C++ char *array and char array[] C++ - How to append a char to char*?