[java] "Unmappable character for encoding UTF-8" error

I'm getting a compile error at the following method.

public static boolean isValidPasswd(String passwd) {
    String reg = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$";
    return Pattern.matches(reg, passwd);
}
at Utility.java:[76,74] unmappable character for 
enoding UTF-8. 74th character is' " '

How can I fix this? Thanks.

This question is related to java maven-2 encoding utf-8

The answer is


I observed this issue while using Eclipse. I needed to add encoding in my pom.xml file and it resolved. http://ctrlaltsolve.blogspot.in/2015/11/encoding-properties-in-maven.html


Thanks Michael Konietzka (https://stackoverflow.com/a/4996583/1019307) for your answer.

I did this in Eclipse / STS:

Preferences > General > Content Types > Selected "Text" 
    (which contains all types such as CSS, Java Source Files, ...)
Added "UTF-8" to the default encoding box down the bottom and hit 'Add'

Bingo, error gone!


The compiler is using the UTF-8 character encoding to read your source file. But the file must have been written by an editor using a different encoding. Open your file in an editor set to the UTF-8 encoding, fix the quote mark, and save it again.

Alternatively, you can find the Unicode point for the character and use a Unicode escape in the source code. For example, the character A can be replaced with the Unicode escape \u0041.

By the way, you don't need to use the begin- and end-line anchors ^ and $ when using the matches() method. The entire sequence must be matched by the regular expression when using the matches() method. The anchors are only useful with the find() method.


For IntelliJ users, this is pretty easy once you find out what the original encoding was. You can select the encoding from the bottom right corner of your Window, you will be prompted with a dialog box saying:

The encoding you've chosen ('[encoding type]') may change the contents of '[Your file]'. Do you want to reload the file from disk or convert the text and save in the new encoding?

So if you happen to have a few characters saved in some odd encoding, what you should do is first select 'Reload' to load the file all in the encoding of the bad characters. For me this turned the ? characters into their proper value.

IntelliJ can tell if you most likely did not pick the right encoding and will warn you. Revert back and try again.

Once you can see the bad characters go away, change the encoding select box in the bottom right corner back to the format you originally intended (if you are Googling this error message, that will likely be UTF-8). This time select the 'Convert' button on the dialog.

For me, I needed to reload as 'windows-1252', then convert back to 'UTF-8'. The offending characters were single quotes (‘ and ’) likely pasted in from a Word doc (or e-mail) with the wrong encoding, and the above actions will convert them to UTF-8.


"error: unmappable character for encoding UTF-8" means, java has found a character which is not representing in UTF-8. Hence open the file in an editor and set the character encoding to UTF-8. You should be able to find a character which is not represented in UTF-8.Take off this character and recompile.


In eclipse try to go to file properties (Alt+Enter) and change the Resource → 'Text File encoding' → Other to UTF-8. Reopen the file and check there will be junk character somewhere in the string/file. Remove it. Save the file.

Change the encoding Resource → 'Text File encoding' back to Default.

Compile and deploy the code.


The Java compiler assumes that your input is UTF-8 encoded, either because you specified it to be or because it's your platform default encoding.

However, the data in your .java files is not actually encoded in UTF-8. The problem is probably the ¬ character. Make sure your editor (or IDE) of choice actually safes its file in UTF-8 encoding.


You have encoding problem with your sourcecode file. It is maybe ISO-8859-1 encoded, but the compiler was set to use UTF-8. This will results in errors when using characters, which will not have the same bytes representation in UTF-8 and ISO-8859-1. This will happen to all characters which are not part of ASCII, for example ¬ NOT SIGN.

You can simulate this with the following program. It just uses your line of source code and generates a ISO-8859-1 byte array and decode this "wrong" with UTF-8 encoding. You can see at which position the line gets corrupted. I added 2 spaces at your source code to fit position 74 to fit this to ¬ NOT SIGN, which is the only character, which will generate different bytes in ISO-8859-1 encoding and UTF-8 encoding. I guess this will match indentation with the real source file.

 String reg = "      String reg = \"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$\";";
 String corrupt=new String(reg.getBytes("ISO-8859-1"),"UTF-8");
 System.out.println(corrupt+": "+corrupt.charAt(74));
 System.out.println(reg+": "+reg.charAt(74));     

which results in the following output (messed up because of markup):

String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=?.,-])(?=[^\s]+$).{8,24}$";: ?

String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=¬.,-])(?=[^\s]+$).{8,24}$";: ¬

See "live" at https://ideone.com/ShZnB

To fix this, save the source files with UTF-8 encoding.


I'm in the process of setting up a CI build server on a Linux box for a legacy system started in 2000. There is a section that generates a PDF that contains non-UTF8 characters. We are in the final steps of a release, so I cannot replace the characters giving me grief, yet for Dilbertesque reasons, I cannot wait a week to solve this issue after the release. Fortunately, the "javac" command in Ant has an "encoding" parameter.

 <javac destdir="${classes.dir}" classpathref="production-classpath" debug="on"
     includeantruntime="false" source="${java.level}" target="${java.level}"

     encoding="iso-8859-1">

     <src path="${production.dir}" />
 </javac>

The following compiles for me:

class E{
   String s = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¼.,-])(?=[^\\s]+$).{8,24}$";
}

See:

enter image description here


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to maven-2

Maven:Non-resolvable parent POM and 'parent.relativePath' points at wrong local POM Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project. Plugin org.apache.maven.plugins:maven-clean-plugin:2.5 or one of its dependencies could not be resolved Maven Error: Could not find or load main class MAVEN_HOME, MVN_HOME or M2_HOME Where is my m2 folder on Mac OS X Mavericks Maven won't run my Project : Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec 'mvn' is not recognized as an internal or external command, operable program or batch file SLF4J: Class path contains multiple SLF4J bindings Create local maven repository

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8