[perl] How can I output UTF-8 from Perl?

I am trying to write a Perl script using the utf8 pragma, and I'm getting unexpected results. I'm using Mac OS X 10.5 (Leopard), and I'm editing with TextMate. All of my settings for both my editor and operating system are defaulted to writing files in utf-8 format.

However, when I enter the following into a text file, save it as a ".pl", and execute it, I get the friendly "diamond with a question mark" in place of the non-ASCII characters.

#!/usr/bin/env perl -w

use strict;
use utf8;

my $str = 'Çirçös';
print( "$str\n" );

Any idea what I'm doing wrong? I expect to get 'Çirçös' in the output, but I get '?ir??s' instead.

This question is related to perl unicode utf-8

The answer is


do in your shell: $ env |grep LANG

This will probably show that your shell is not using a utf-8 locale.


TMTOWTDI, chose the method that best fits how you work. I use the environment method so I don't have to think about it.

In the environment:

export PERL_UNICODE=SDL

on the command line:

perl -CSDL -le 'print "\x{1815}"';

or with binmode:

binmode(STDOUT, ":utf8");          #treat as if it is UTF-8
binmode(STDIN, ":encoding(utf8)"); #actually check if it is UTF-8

or with PerlIO:

open my $fh, ">:utf8", $filename
    or die "could not open $filename: $!\n";

open my $fh, "<:encoding(utf-8)", $filename
    or die "could not open $filename: $!\n";

or with the open pragma:

use open ":encoding(utf8)";
use open IN => ":encoding(utf8)", OUT => ":utf8";

You also want to say, that strings in your code are utf-8. See Why does modern Perl avoid UTF-8 by default?. So set not only PERL_UNICODE=SDAL but also PERL5OPT=-Mutf8.


You can use the open pragma.

For eg. below sets STDOUT, STDIN & STDERR to use UTF-8....

use open qw/:std :utf8/;

Thanks, finally got an solution to not put utf8::encode all over code. To synthesize and complete for other cases, like write and read files in utf8 and also works with LoadFile of an YAML file in utf8

use utf8;
use open ':encoding(utf8)';
binmode(STDOUT, ":utf8");

open(FH, ">test.txt"); 
print FH "something éá";

use YAML qw(LoadFile Dump);
my $PUBS = LoadFile("cache.yaml");
my $f = "2917";
my $ref = $PUBS->{$f};
print "$f \"".$ref->{name}."\" ". $ref->{primary_uri}." ";

where cache.yaml is:

---
2917:
  id: 2917
  name: Semanário
  primary_uri: 2917.xml

Examples related to perl

The program can't start because api-ms-win-crt-runtime-l1-1-0.dll is missing while starting Apache server on my computer "End of script output before headers" error in Apache Perl - Multiple condition if statement without duplicating code? How to decrypt hash stored by bcrypt Split a string into array in Perl Turning multiple lines into one comma separated line String compare in Perl with "eq" vs "==" how to remove the first two columns in a file using shell (awk, sed, whatever) Find everything between two XML tags with RegEx Difference between \w and \b regular expression meta characters

Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8