[postgresql] invalid byte sequence for encoding "UTF8"

I'm trying to import some data into my database. So I've created a temporary table,

create temporary table tmp(pc varchar(10), lat decimal(18,12), lon decimal(18,12), city varchar(100), prov varchar(2));

And now I'm trying to import the data,

 copy tmp from '/home/mark/Desktop/Canada.csv' delimiter ',' csv

But then I get the error,

ERROR:  invalid byte sequence for encoding "UTF8": 0xc92c

How do I fix that? Do I need to change the encoding of my entire database (if so, how?) or can I change just the encoding of my tmp table? Or should I attempt to change the encoding of the file?

This question is related to postgresql import

The answer is


psql=# copy tmp from '/path/to/file.csv' with delimiter ',' csv header encoding 'windows-1251';

Adding encoding option worked in my case.


I got the same error when I was trying to copy a csv generated by Excel to a Postgres table (all on a Mac). This is how I resolved it:

1) Open the File in Atom (the IDE that I use)

2) Make an insignificant change in the file. Save the file. Undo the change. Save again.

Presto! Copy command worked now.

(I think Atom saved it in a format which worked)


some of lolutions may be very sambles

i there any spaces in the name of comlun will be cause this problem

review every columns name for exaple "colum_name "#>>rong "colum_nam" #>>right


I ran into this problem under Windows while working exclusively with psql (no graphical tools). To fix this problem, permanently change the default encoding of psql (client) to match the default encoding of the PostgreSQL server. Run the following command in CMD or Powershell:

setx PGCLIENTENCODING UTF8

Close and reopen you command prompt/Powershell for the change to take effect.

Change the encoding of the backup file from Unicode to UTF8 by opening it with Notepad and going to File -> Save As. Change the Encoding dropdown from Unicode to UTF8. (Also change the Save as type from Text Documents (.txt) to All Files in order to avoid adding the .txt extension to your backup file's name). You should now be able to restore your backup.


It depends on what type of machine/encoding generated your import file.

If you're getting it from an English or Western European version of Windows, your best bet is probably setting it to 'WIN1252'. If you are getting it from a different source, consult the list of character encodings here:

http://www.postgresql.org/docs/8.3/static/multibyte.html

If you're getting it from a Mac, you may have to run it through the "iconv" utility first to convert it from MacRoman to UTF-8.


follow the below steps to solve this issue in pgadmin:

  1. SET client_encoding = 'ISO_8859_5';

  2. COPY tablename(column names) FROM 'D:/DB_BAK/csvfilename.csv' WITH DELIMITER ',' CSV ;


I had the same problem, and found a nice solution here: http://blog.e-shell.org/134

This is caused by a mismatch in your database encodings, surely because the database from where you got the SQL dump was encoded as SQL_ASCII while the new one is encoded as UTF8. .. Recode is a small tool from the GNU project that let you change on-the-fly the encoding of a given file.

So I just recoded the dumpfile before playing it back:

postgres> gunzip -c /var/backups/pgall_b1.zip | recode iso-8859-1..u8 | psql test

In Debian or Ubuntu systems, recode can be installed via package.


Short Example to Solve this Problem in PHP-

$val = "E'\377'";
iconv(mb_detect_encoding($val, mb_detect_order(), true), "UTF-8", $val);

Error Detail: As POSTGRES database do not handle other than UTF-8 Characters when we try to pass above given inputs to a column its give error of "invalid byte sequence for encoding "UTF8": 0xab" .

So just convert that value into UTF-8 before insertion in POSTGRES Database.


Alternative cause on Windows with pgadmin v4.4:

Column names with non-ASCII characters will somehow mess up the psql import command, and give you this unintuitive error message. Your UTF8 csv data is probably fine.

Solution:

Rename your fields.

Example:

"Résultat" -> resultat

It is also very possible with this error that the field is encrypted in place. Be sure you are looking at the right table, in some cases administrators will create an unencrypted view that you can use instead. I recently encountered a very similar issue.


You can replace the backslash character with, for example a pipe character, with sed.

sed -i -- 's/\\/|/g' filename.txt

This error may occur if input data contain escape character itself. By default escape character is "\" symbol, so if your input text contain "\" character - try to change the default value using ESCAPE option.


copy tablename from 'filepath\filename' DELIMITERS '=' ENCODING 'WIN1252';

you can try this to handle UTF8 encoding.


This error means that records encoding in the file is different with respect to the connection. In this case iconv may return the error, sometimes even despite //IGNORE flag:

iconv -f ASCII -t utf-8//IGNORE < b.txt > /a.txt

iconv: illegal input sequence at position (some number)

The trick is to find incorrect characters and replace it. To do it on Linux use "vim" editor:

vim (your text file), press "ESC": button and type ":goto (number returned by iconv)"

To find non ASCII characters you may use the following command:

grep --color='auto' -P "[\x80-\xFF]"

If you remove incorrect characters please check if you really need to convert your file: probably the problem is already solved.


If your CSV is going to be exported from SQL Server, it is huge, and it has Unicode characters, you can export it by setting the encoding as UTF-8:

Right-Click DB > Tasks > Export > 'SQL Server Native Client 11.0' >> 'Flat File Destination > File name: ... > Code page: UTF-8 >> ...

In the next page it asks whether you want to copy data from a table or you want to write a query. If you have char or varchar data types in your table, select the query option and cast those columns as nvarchar(max). E.g if myTable has two columns where the first one is varchar and second one int, I cast the first one to nvarchar:

select cast (col1 as nvarchar(max)) col1
       , col2
from myTable

Well I was facing the same problem. And what solved my problem is this:

In excel click on Save as. From save as type, choose .csv Click on Tools. Then choose web options from drop down list. Under Encoding tab, save the document as Unicode(UTF-8). Click OK. Save the file. DONE !


I had the same problem: my file was not encoded as UTF-8. I have solved it by opening the file with notepad++ and changing the encoding of the file.

Go to "Encoding" and select "Convert to UTF-8". Save changes and that's all!


Open file CSV by Notepad++ . Choose menu Encoding \ Encoding in UTF-8, then fix few cell manuallly.

Then try import again.


For python, you need to use

Class pg8000.types.Bytea (str) Bytea is a str-derived class that is mapped to a PostgreSQL byte array.

or

Pg8000.Binary (value) Construct an object holding binary data.


Apparently I can just set the encoding on the fly,

 set client_encoding to 'latin1'

And then re-run the query. Not sure what encoding I should be using though.


latin1 made the characters legible, but most of the accented characters were in upper-case where they shouldn't have been. I assumed this was due to a bad encoding, but I think its actually the data that was just bad. I ended up keeping the latin1 encoding, but pre-processing the data and fixed the casing issues.


If you are ok with discarding nonconvertible characters, you can use -c flag

iconv -c -t utf8 filename.csv > filename.utf8.csv

and then copy them to your table