[java] How to replace � in a string

I have a string that contains a character � I haven't been able to replace it correctly.

String.replace("�", "");

doesn't work, does anyone know how to remove/replace the � in the string?

This question is related to java string unicode encoding mojibake

The answer is


You are asking to replace the character "?" but for me that is coming through as three characters 'ï', '¿' and '½'. This might be your problem... If you are using Java prior to Java 1.5 then you only get the UCS-2 characters, that is only the first 65K UTF-8 characters. Based on other comments, it is most likely that the character that you are looking for is '?', that is the Unicode replacement character. This is the character that is "used to replace an incoming character whose value is unknown or unrepresentable in Unicode".

Actually, looking at the comment from Kathy, the other issue that you might be having is that javac is not interpreting your .java file as UTF-8, assuming that you are writing it in UTF-8. Try using:

javac -encoding UTF-8 xx.java

Or, modify your source code to do:

String.replaceAll("\uFFFD", "");

Use the unicode escape sequence. First you'll have to find the codepoint for the character you seek to replace (let's just say it is ABCD in hex):

str = str.replaceAll("\uABCD", "");

dissect the URL code and unicode error. this symbol came to me as well on google translate in the armenian text and sometimes the broken burmese.


for detail

import java.io.UnsupportedEncodingException;

/**
 * File: BOM.java
 * 
 * check if the bom character is present in the given string print the string
 * after skipping the utf-8 bom characters print the string as utf-8 string on a
 * utf-8 console
 */

public class BOM
{
    private final static String BOM_STRING = "Hello World";
    private final static String ISO_ENCODING = "ISO-8859-1";
    private final static String UTF8_ENCODING = "UTF-8";
    private final static int UTF8_BOM_LENGTH = 3;

    public static void main(String[] args) throws UnsupportedEncodingException {
        final byte[] bytes = BOM_STRING.getBytes(ISO_ENCODING);
        if (isUTF8(bytes)) {
            printSkippedBomString(bytes);
            printUTF8String(bytes);
        }
    }

    private static void printSkippedBomString(final byte[] bytes) throws UnsupportedEncodingException {
        int length = bytes.length - UTF8_BOM_LENGTH;
        byte[] barray = new byte[length];
        System.arraycopy(bytes, UTF8_BOM_LENGTH, barray, 0, barray.length);
        System.out.println(new String(barray, ISO_ENCODING));
    }

    private static void printUTF8String(final byte[] bytes) throws UnsupportedEncodingException {
        System.out.println(new String(bytes, UTF8_ENCODING));
    }

    private static boolean isUTF8(byte[] bytes) {
        if ((bytes[0] & 0xFF) == 0xEF && 
            (bytes[1] & 0xFF) == 0xBB && 
            (bytes[2] & 0xFF) == 0xBF) {
            return true;
        }
        return false;
    }
}

As others have said, you posted 3 characters instead of one. I suggest you run this little snippet of code to see what's actually in your string:

public static void dumpString(String text)
{
    for (int i=0; i < text.length(); i++)
    {
        System.out.println("U+" + Integer.toString(text.charAt(i), 16) 
                           + " " + text.charAt(i));
    }
}

If you post the results of that, it'll be easier to work out what's going on. (I haven't bothered padding the string - we can do that by inspection...)


Character issues like this are difficult to diagnose because information is easily lost through misinterpretation of characters via application bugs, misconfiguration, cut'n'paste, etc.

As I (and apparently others) see it, you've pasted three characters:

codepoint   glyph   escaped    windows-1252    info
=======================================================================
U+00ef      ï       \u00ef     ef,             LATIN_1_SUPPLEMENT, LOWERCASE_LETTER
U+00bf      ¿       \u00bf     bf,             LATIN_1_SUPPLEMENT, OTHER_PUNCTUATION
U+00bd      ½       \u00bd     bd,             LATIN_1_SUPPLEMENT, OTHER_NUMBER

To identify the character, download and run the program from this page. Paste your character into the text field and select the glyph mode; paste the report into your question. It'll help people identify the problematic character.


profilage bas� sur l'analyse de l'esprit (french)

should be translated as:

profilage basé sur l'analyse de l'esprit

so, in this case � = é


No above answer resolve my issue. When i download xml it apppends <xml to my xml. I simply

xml = parser.getXmlFromUrl(url);

xml = xml.substring(3);// it remove first three character from string,

now it is running accurately.


Change the Encoding to UTF-8 while parsing .This will remove the special characters


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to mojibake

How to convert these strange characters? (ë, Ã, ì, ù, Ã) How do I remove  from the beginning of a file? "’" showing on page instead of " ' " How to replace � in a string