[unicode] What is the difference between UTF-8 and Unicode?

This article explains all the details http://kunststube.net/encoding/

WRITING TO BUFFER

if you write to a 4 byte buffer, symbol ? with UTF8 encoding, your binary will look like this:

00000000 11100011 10000001 10000010

if you write to a 4 byte buffer, symbol ? with UTF16 encoding, your binary will look like this:

00000000 00000000 00110000 01000010

As you can see, depending on what language you would use in your content this will effect your memory accordingly.

e.g. For this particular symbol: ? UTF16 encoding is more efficient since we have 2 spare bytes to use for the next symbol. But it doesn't mean that you must use UTF16 for Japan alphabet.

READING FROM BUFFER

Now if you want to read the above bytes, you have to know in what encoding it was written to and decode it back correctly.

e.g. If you decode this : 00000000 11100011 10000001 10000010 into UTF16 encoding, you will end up with ? not ?

Note: Encoding and Unicode are two different things. Unicode is the big (table) with each symbol mapped to a unique code point. e.g. ? symbol (letter) has a (code point): 30 42 (hex). Encoding on the other hand, is an algorithm that converts symbols to more appropriate way, when storing to hardware.

30 42 (hex) - > UTF8 encoding - > E3 81 82 (hex), which is above result in binary.

30 42 (hex) - > UTF16 encoding - > 30 42 (hex), which is above result in binary.

enter image description here

Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to character-encoding

Changing PowerShell's default output encoding to UTF-8 JsonParseException : Illegal unquoted character ((CTRL-CHAR, code 10) Change the encoding of a file in Visual Studio Code What is the difference between utf8mb4 and utf8 charsets in MySQL? How to open html file? All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"? UTF-8 output from PowerShell ERROR 1115 (42000): Unknown character set: 'utf8mb4' "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte How to make php display \t \n as tab and new line instead of characters

Examples related to terminology

The differences between initialize, define, declare a variable What is the difference between a web API and a web service? What does "opt" mean (as in the "opt" directory)? Is it an abbreviation? What's the name for hyphen-separated case? What is Bit Masking? What is ADT? (Abstract Data Type) What exactly are iterator, iterable, and iteration? What is a web service endpoint? What is the difference between Cloud, Grid and Cluster? How to explain callbacks in plain english? How are they different from calling one function from another function?