Isn t the size of character in Java 2 bytes

Question

I used RandomAccessFile to read a byte from a text file   public static void readFile RandomAccessFile fr        byte   cbuff   new byte 1       fr read cbuff 0 1       System out println new String cbuff        Why am I seeing one full character being read by this

User · Answer

Looks like your file contains ASCII characters  which are encoded in just 1 byte  If text file was containing non-ASCII character  e g  2-byte UTF-8  then you get just the first byte  not whole character

User · Answer

Java stores all it s  chars  internally as two bytes  However  when they become strings etc  the number of bytes will depend on your encoding   Some characters  ASCII  are single byte  but many others are multi-byte   Java supports Unicode  thus according to   Java Character Docs  The max value supported is   uFFFF   hex FFFF  dec 65535   or 11111111 11111111 binary  two bytes

User · Answer

There are some great answers here but I wanted to point out the jvm is free to store a char value in any size space    2 bytes   On many architectures there is a penalty for performing unaligned memory access so a char might easily be padded to 4 bytes   A volatile char might even be padded to the size of the CPU cache line to prevent false sharing   https   en wikipedia org wiki False sharing  It might be non-intuitive to new Java programmers that a character array or a string is NOT simply multiple characters   You should learn and think about strings and arrays distinctly from  multiple characters     I also want to point out that java characters are often misused   People don t realize they are writing code that won t properly handle codepoints over 16 bits in length

User · Answer

A char represents a character in Java      It is 2 bytes large  at least that s what the valid value range suggests    That doesn t necessarily mean that every representation of a character is 2 bytes long  In fact many encodings only reserve 1 byte for every character  or use 1 byte for the most common characters    When you call the String byte    constructor you ask Java to convert the byte   to a String using the platform default encoding  Since the platform default encoding is usually a 1-byte encoding such as ISO-8859-1 or a variable-length encoding such as UTF-8  it can easily convert that 1 byte to a single character   If you run that code on a platform that uses UTF-16  or UTF-32 or UCS-2 or UCS-4 or      as the platform default encoding  then you will not get a valid result  you ll get a String containing the Unicode Replacement Character instead    That s one of the reasons why you should not depend on the platform default encoding  when converting between byte   and char   String or between InputStream and Reader or between OutputStream and Writer  you should always specify which encoding you want to use  If you don t  then your code will be platform-dependent       that s not entirely true  a char represents a UTF-16 codepoint  Either one or two UTF-16 codepoints represent a Unicode codepoint  A Unicode codepoint usually represents a character  but sometimes multiple Unicode codepoints are used to make up a single character  But the approximation above is close enough to discuss the topic at hand

User · Answer

Java allocates 2 of 2 bytes for character as it follows UTF-16  It occupies minimum 2 bytes while storing a character  and maximum of 4 bytes  There is no 1 byte or 3 bytes of storage for character

User · Answer

The constructor String byte   bytes  takes the bytes from the buffer and encodes them to characters   It uses the platform default charset to encode bytes to characters  If you know  your file contains text  that is encoded in a different charset  you can use the String byte   bytes  String charsetName  to use the correct encoding  from bytes to characters

User · Answer

In ASCII text file each character is just one byte

[java] Isn't the size of character in Java 2 bytes?

Examples related to java

Examples related to string

Examples related to char