[database] Strange Characters in database text: Ã, Ã, ¢, â‚ €,

I encountered today quite a similar problem : mysqldump dumped my utf-8 base encoding utf-8 diacritic characters as two latin1 characters, although the file itself is regular utf8.

For example : "é" was encoded as two characters "é". These two characters correspond to the utf8 two bytes encoding of the letter but it should be interpreted as a single character.

To solve the problem and correctly import the database on another server, I had to convert the file using the ftfy (stands for "Fixes Text For You). (https://github.com/LuminosoInsight/python-ftfy) python library. The library does exactly what I expect : transform bad encoded utf-8 to correctly encoded utf-8.

For example : This latin1 combination "é" is turned into an "é".

ftfy comes with a command line script but it transforms the file so it can not be imported back into mysql.

I wrote a python3 script to do the trick :

#!/usr/bin/python3
# coding: utf-8

import ftfy

# Set input_file
input_file = open('mysql.utf8.bad.dump', 'r', encoding="utf-8")
# Set output file
output_file = open ('mysql.utf8.good.dump', 'w')

# Create fixed output stream
stream = ftfy.fix_file(
    input_file,
    encoding=None,
    fix_entities='auto', 
    remove_terminal_escapes=False, 
    fix_encoding=True, 
    fix_latin_ligatures=False, 
    fix_character_width=False, 
    uncurl_quotes=False, 
    fix_line_breaks=False, 
    fix_surrogates=False, 
    remove_control_chars=False, 
    remove_bom=False, 
    normalization='NFC'
)

# Save stream to output file
stream_iterator = iter(stream)
while stream_iterator:
    try:
        line = next(stream_iterator)
        output_file.write(line)
    except StopIteration:
        break