I would imagine that it has to do with encoding. A char
is 16 bytes and some encodings will use one byte for a character whereas another will use two or even more. When Java was originally designed, they assumed that any Unicode character would fit in 2 bytes, whereas now a Unicode character can require up to 4 bytes (UTF-32). There is no way for Scanner
to represent a UTF-32 codepoint in a single char
.
You can specify an encoding to Scanner
when you construct an instance, and if not provided, it will use the platform character-set. But this still doesn't handle the issue with 3 or 4 byte Unicode characters, since they cannot be represented as a single char
primitive (since char
is only 16 bytes). So you would end up getting inconsistent results.