[csv] How to check encoding of a CSV file

I have a CSV file and I wish to understand its encoding. Is there a menu option in Microsoft Excel that can help me detect it

OR do I need to make use of programming languages like C# or PHP to deduce it.

This question is related to csv encoding

The answer is


If you use Python, just use a print() function to check the encoding of a csv file. For example:

with open('file_name.csv') as f:
    print(f)

The output is something like this:

<_io.TextIOWrapper name='file_name.csv' mode='r' encoding='utf8'>

In Python, You can Try...

from encodings.aliases import aliases
alias_values = set(aliases.values())

for encoding in set(aliases.values()):
    try:
        df=pd.read_csv("test.csv", encoding=encoding)
        print('successful', encoding)
    except:
        pass

You can also use python chardet library

# install the chardet library
!pip install chardet

# import the chardet library
import chardet 

# use the detect method to find the encoding
# 'rb' means read in the file as binary
with open("test.csv", 'rb') as file:
    print(chardet.detect(file.read()))

Use chardet https://github.com/chardet/chardet (documentation is short and easy to read).

Install python, then pip install chardet, at last use the command line command.

I tested under GB2312 and it's pretty accurate. (Make sure you have at least a few characters, sample with only 1 character may fail easily).

file is not reliable as you can see.

enter image description here


In Linux systems, you can use file command. It will give the correct encoding

Sample:

file blah.csv

Output:

blah.csv: ISO-8859 text, with very long lines

Or you can execute in python console or in Jupyter Notebook:

import csv
data = open("file.csv","r") 
data

You will see information about the data object like this:

<_io.TextIOWrapper name='arch.csv' mode='r' encoding='cp1250'>

As you can see it contains encoding infotmation.