If you use Python, just use a print() function to check the encoding of a csv file. For example:
with open('file_name.csv') as f:
print(f)
The output is something like this:
<_io.TextIOWrapper name='file_name.csv' mode='r' encoding='utf8'>
In Python, You can Try...
from encodings.aliases import aliases
alias_values = set(aliases.values())
for encoding in set(aliases.values()):
try:
df=pd.read_csv("test.csv", encoding=encoding)
print('successful', encoding)
except:
pass
You can also use python chardet library
# install the chardet library
!pip install chardet
# import the chardet library
import chardet
# use the detect method to find the encoding
# 'rb' means read in the file as binary
with open("test.csv", 'rb') as file:
print(chardet.detect(file.read()))
Use chardet https://github.com/chardet/chardet (documentation is short and easy to read).
Install python, then pip install chardet, at last use the command line command.
I tested under GB2312 and it's pretty accurate. (Make sure you have at least a few characters, sample with only 1 character may fail easily).
file
is not reliable as you can see.
In Linux systems, you can use file command. It will give the correct encoding
Sample:
file blah.csv
Output:
blah.csv: ISO-8859 text, with very long lines
Or you can execute in python console or in Jupyter Notebook:
import csv
data = open("file.csv","r")
data
You will see information about the data object like this:
<_io.TextIOWrapper name='arch.csv' mode='r' encoding='cp1250'>
As you can see it contains encoding infotmation.
Source: Stackoverflow.com