[java] Byte and char conversion in Java

If I convert a character to byte and then back to char, that character mysteriously disappears and becomes something else. How is this possible?

This is the code:

char a = 'È';       // line 1       
byte b = (byte)a;   // line 2       
char c = (char)b;   // line 3
System.out.println((char)c + " " + (int)c);

Until line 2 everything is fine:

  • In line 1 I could print "a" in the console and it would show "È".

  • In line 2 I could print "b" in the console and it would show -56, that is 200 because byte is signed. And 200 is "È". So it's still fine.

But what's wrong in line 3? "c" becomes something else and the program prints ? 65480. That's something completely different.

What I should write in line 3 in order to get the correct result?

This question is related to java encoding unicode utf-16

The answer is


A character in Java is a Unicode code-unit which is treated as an unsigned number. So if you perform c = (char)b the value you get is 2^16 - 56 or 65536 - 56.

Or more precisely, the byte is first converted to a signed integer with the value 0xFFFFFFC8 using sign extension in a widening conversion. This in turn is then narrowed down to 0xFFC8 when casting to a char, which translates to the positive number 65480.

From the language specification:

5.1.4. Widening and Narrowing Primitive Conversion

First, the byte is converted to an int via widening primitive conversion (§5.1.2), and then the resulting int is converted to a char by narrowing primitive conversion (§5.1.3).


To get the right point use char c = (char) (b & 0xFF) which first converts the byte value of b to the positive integer 200 by using a mask, zeroing the top 24 bits after conversion: 0xFFFFFFC8 becomes 0x000000C8 or the positive number 200 in decimals.


Above is a direct explanation of what happens during conversion between the byte, int and char primitive types.

If you want to encode/decode characters from bytes, use Charset, CharsetEncoder, CharsetDecoder or one of the convenience methods such as new String(byte[] bytes, Charset charset) or String#toBytes(Charset charset). You can get the character set (such as UTF-8 or Windows-1252) from StandardCharsets.


new String(byteArray, Charset.defaultCharset())

This will convert a byte array to the default charset in java. It may throw exceptions depending on what you supply with the byteArray.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to utf-16

Byte and char conversion in Java Convert UTF-8 with BOM to UTF-8 with no BOM in Python Difference between UTF-8 and UTF-16? What is Unicode, UTF-8, UTF-16? UTF-8, UTF-16, and UTF-32