[html] Question mark characters displaying within text, why is this?

I have a backup server that automatically backs up my live site, both files and database.

On the live site, the text looks fine, but when you view the mirrored version of it, it displays '?' within some of the text. This text is stored within the news database table.

Here is a screen shot of it being on the live server and of it on the mirrored server.

What could happen within the process of backing it up to the mirrored server? alt text

This question is related to html backup character-encoding mirror

The answer is


Check the character set being emitted by your mirrored server. There appears to be a difference from that to the main server -- the live site appears to be outputting Unicode, where the mirror is not. Also, it's usually a good idea to scrub Unicode characters in your incoming content and replace them with their appropriate HTML entities.

Your specific issue regards "smart quotes," "em dashes" and "en dashes." I know you can replace em dashes with — and n-dashes with – (which should be done on the input side of your database); I don't know what the correct replacement for the smart quotes would be. (I usually just replace all curly single quotes with ' and all curly double quotes with " ... Typography geeks may feel free to shoot me on sight.)

I should note that some browsers are more forgiving than others with this issue -- Internet Explorer on Windows tends to auto-magically detect and "fix" this; Firefox and most other browsers display the question marks.


This is going to be something to do with character encodings.

Are you sure the mirrored site has the same properties with regards to character encodings as your main server?

Depending on what sort of server you have, this may be a property of the server process itself, or it could be an environment variable.

For example, if this is a UNIX environment, perhaps try comparing LANG or LC_ALL?

See also here


I usually curse MS word and then run the following Wscript.

// replace with path to a file that needs cleaning
PATH = "test.html"

var go=WScript.CreateObject("Scripting.FileSystemObject");
var content=go.GetFile(PATH).OpenAsTextStream().ReadAll();
var out=go.CreateTextFile("clean-"+PATH, true);

// symbols
content=content.replace(/“/g,'"');
content=content.replace(/”/g,'"');
content=content.replace(/’/g,"'");
content=content.replace(/–/g,"-");
content=content.replace(/©/g,"©");
content=content.replace(/®/g,"®");
content=content.replace(/°/g,"°");
content=content.replace(/¶/g,"<p>");
content=content.replace(/¿/g,"&iquest;");
content=content.replace(/¡/g,'&iexcl;');
content=content.replace(/¢/g,'&cent;');
content=content.replace(/£/g,'&pound;');
content=content.replace(/¥/g,'&yen;');

out.Write(content);


Your browser hasn't interpretted the encoding of the page correctly (either because you've forced it to a particular setting, or the page is set incorrectly), and thus cannot display some of the characters.


Edit your Apache configuration file on the "mirror" server (the server with the problem), and comment-out the following line:

AddDefaultCharset UTF-8

Then restart Apache:

service httpd restart

The problem is that the "AddDefaultCharset UTF-8" line overrides the Content-Type specified in the .html files; e.g.:

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">

The most common symptom is that character codes above 127 display as black diamonds with question marks on them (in Chrome, Safari or Firefox), or as little boxes (in IE and Opera). HTML files generated by Microsoft Word usually have many such characters, the most common one being character code 160 = 0xA0, which is equivalent to "&nbsp;" in the Windows-1252 encoding, and is often found between span tags, like this:

<span style="mso-spacerun: yes">ááá </span>

Unicode or other character set characters falling through?

I have seen similar "strange" characters show up on sites I have worked on often when the text is copied from an email or some other document format (e.g. word) into a text editor. The editor can display the non ASCII characters but the browser can't. For the website, I would suggest looking up the HTML entity code for the character and inserting that instead ... or switch to more standard ones.


I got here looking for a solution for JavaScript displayed in the browser and although not directly related with a database...

In my case I copied and pasted some text I found on the internet into a JavaScript file and saved it with Windows Notepad.

When the page that uses that JavaScript file output the strings there were question marks (like the ones shown in the question) instead of the special characters like accented letters, etc.

I opened the file using Notepad++. Right after opening the file I saw that the character encoding was set as ANSI as you can see (mouse cursor on footer) in the following screenshot:

enter image description here

To solve the issue, click the Encoding menu in Notepad++ and select Encode in UTF-8. You should be good to go. :)


Your browser hasn't interpretted the encoding of the page correctly (either because you've forced it to a particular setting, or the page is set incorrectly), and thus cannot display some of the characters.


This is going to be something to do with character encodings.

Are you sure the mirrored site has the same properties with regards to character encodings as your main server?

Depending on what sort of server you have, this may be a property of the server process itself, or it could be an environment variable.

For example, if this is a UNIX environment, perhaps try comparing LANG or LC_ALL?

See also here


I had this issue so I just took all my content, copy/pasted it into notepad, made a new php file, pasted back in, re-saved and overwrote, and.. that worked! It really was some relic of Microsoft Word editing...


Unicode or other character set characters falling through?

I have seen similar "strange" characters show up on sites I have worked on often when the text is copied from an email or some other document format (e.g. word) into a text editor. The editor can display the non ASCII characters but the browser can't. For the website, I would suggest looking up the HTML entity code for the character and inserting that instead ... or switch to more standard ones.


Your browser hasn't interpretted the encoding of the page correctly (either because you've forced it to a particular setting, or the page is set incorrectly), and thus cannot display some of the characters.


I usually curse MS word and then run the following Wscript.

// replace with path to a file that needs cleaning
PATH = "test.html"

var go=WScript.CreateObject("Scripting.FileSystemObject");
var content=go.GetFile(PATH).OpenAsTextStream().ReadAll();
var out=go.CreateTextFile("clean-"+PATH, true);

// symbols
content=content.replace(/“/g,'"');
content=content.replace(/”/g,'"');
content=content.replace(/’/g,"'");
content=content.replace(/–/g,"-");
content=content.replace(/©/g,"&copy;");
content=content.replace(/®/g,"&reg;");
content=content.replace(/°/g,"&deg;");
content=content.replace(/¶/g,"<p>");
content=content.replace(/¿/g,"&iquest;");
content=content.replace(/¡/g,'&iexcl;');
content=content.replace(/¢/g,'&cent;');
content=content.replace(/£/g,'&pound;');
content=content.replace(/¥/g,'&yen;');

out.Write(content);


Unicode or other character set characters falling through?

I have seen similar "strange" characters show up on sites I have worked on often when the text is copied from an email or some other document format (e.g. word) into a text editor. The editor can display the non ASCII characters but the browser can't. For the website, I would suggest looking up the HTML entity code for the character and inserting that instead ... or switch to more standard ones.


I usually curse MS word and then run the following Wscript.

// replace with path to a file that needs cleaning
PATH = "test.html"

var go=WScript.CreateObject("Scripting.FileSystemObject");
var content=go.GetFile(PATH).OpenAsTextStream().ReadAll();
var out=go.CreateTextFile("clean-"+PATH, true);

// symbols
content=content.replace(/“/g,'"');
content=content.replace(/”/g,'"');
content=content.replace(/’/g,"'");
content=content.replace(/–/g,"-");
content=content.replace(/©/g,"&copy;");
content=content.replace(/®/g,"&reg;");
content=content.replace(/°/g,"&deg;");
content=content.replace(/¶/g,"<p>");
content=content.replace(/¿/g,"&iquest;");
content=content.replace(/¡/g,'&iexcl;');
content=content.replace(/¢/g,'&cent;');
content=content.replace(/£/g,'&pound;');
content=content.replace(/¥/g,'&yen;');

out.Write(content);


Check the character set being emitted by your mirrored server. There appears to be a difference from that to the main server -- the live site appears to be outputting Unicode, where the mirror is not. Also, it's usually a good idea to scrub Unicode characters in your incoming content and replace them with their appropriate HTML entities.

Your specific issue regards "smart quotes," "em dashes" and "en dashes." I know you can replace em dashes with &mdash; and n-dashes with &ndash; (which should be done on the input side of your database); I don't know what the correct replacement for the smart quotes would be. (I usually just replace all curly single quotes with ' and all curly double quotes with " ... Typography geeks may feel free to shoot me on sight.)

I should note that some browsers are more forgiving than others with this issue -- Internet Explorer on Windows tends to auto-magically detect and "fix" this; Firefox and most other browsers display the question marks.


Your browser hasn't interpretted the encoding of the page correctly (either because you've forced it to a particular setting, or the page is set incorrectly), and thus cannot display some of the characters.


Edit your Apache configuration file on the "mirror" server (the server with the problem), and comment-out the following line:

AddDefaultCharset UTF-8

Then restart Apache:

service httpd restart

The problem is that the "AddDefaultCharset UTF-8" line overrides the Content-Type specified in the .html files; e.g.:

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">

The most common symptom is that character codes above 127 display as black diamonds with question marks on them (in Chrome, Safari or Firefox), or as little boxes (in IE and Opera). HTML files generated by Microsoft Word usually have many such characters, the most common one being character code 160 = 0xA0, which is equivalent to "&nbsp;" in the Windows-1252 encoding, and is often found between span tags, like this:

<span style="mso-spacerun: yes">ááá </span>

Check the character set being emitted by your mirrored server. There appears to be a difference from that to the main server -- the live site appears to be outputting Unicode, where the mirror is not. Also, it's usually a good idea to scrub Unicode characters in your incoming content and replace them with their appropriate HTML entities.

Your specific issue regards "smart quotes," "em dashes" and "en dashes." I know you can replace em dashes with &mdash; and n-dashes with &ndash; (which should be done on the input side of your database); I don't know what the correct replacement for the smart quotes would be. (I usually just replace all curly single quotes with ' and all curly double quotes with " ... Typography geeks may feel free to shoot me on sight.)

I should note that some browsers are more forgiving than others with this issue -- Internet Explorer on Windows tends to auto-magically detect and "fix" this; Firefox and most other browsers display the question marks.


I had this issue so I just took all my content, copy/pasted it into notepad, made a new php file, pasted back in, re-saved and overwrote, and.. that worked! It really was some relic of Microsoft Word editing...


Check the character set being emitted by your mirrored server. There appears to be a difference from that to the main server -- the live site appears to be outputting Unicode, where the mirror is not. Also, it's usually a good idea to scrub Unicode characters in your incoming content and replace them with their appropriate HTML entities.

Your specific issue regards "smart quotes," "em dashes" and "en dashes." I know you can replace em dashes with &mdash; and n-dashes with &ndash; (which should be done on the input side of your database); I don't know what the correct replacement for the smart quotes would be. (I usually just replace all curly single quotes with ' and all curly double quotes with " ... Typography geeks may feel free to shoot me on sight.)

I should note that some browsers are more forgiving than others with this issue -- Internet Explorer on Windows tends to auto-magically detect and "fix" this; Firefox and most other browsers display the question marks.


I got here looking for a solution for JavaScript displayed in the browser and although not directly related with a database...

In my case I copied and pasted some text I found on the internet into a JavaScript file and saved it with Windows Notepad.

When the page that uses that JavaScript file output the strings there were question marks (like the ones shown in the question) instead of the special characters like accented letters, etc.

I opened the file using Notepad++. Right after opening the file I saw that the character encoding was set as ANSI as you can see (mouse cursor on footer) in the following screenshot:

enter image description here

To solve the issue, click the Encoding menu in Notepad++ and select Encode in UTF-8. You should be good to go. :)


This is going to be something to do with character encodings.

Are you sure the mirrored site has the same properties with regards to character encodings as your main server?

Depending on what sort of server you have, this may be a property of the server process itself, or it could be an environment variable.

For example, if this is a UNIX environment, perhaps try comparing LANG or LC_ALL?

See also here


I usually curse MS word and then run the following Wscript.

// replace with path to a file that needs cleaning
PATH = "test.html"

var go=WScript.CreateObject("Scripting.FileSystemObject");
var content=go.GetFile(PATH).OpenAsTextStream().ReadAll();
var out=go.CreateTextFile("clean-"+PATH, true);

// symbols
content=content.replace(/“/g,'"');
content=content.replace(/”/g,'"');
content=content.replace(/’/g,"'");
content=content.replace(/–/g,"-");
content=content.replace(/©/g,"&copy;");
content=content.replace(/®/g,"&reg;");
content=content.replace(/°/g,"&deg;");
content=content.replace(/¶/g,"<p>");
content=content.replace(/¿/g,"&iquest;");
content=content.replace(/¡/g,'&iexcl;');
content=content.replace(/¢/g,'&cent;');
content=content.replace(/£/g,'&pound;');
content=content.replace(/¥/g,'&yen;');

out.Write(content);


Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to backup

input file appears to be a text format dump. Please use psql How can I backup a Docker-container with its data-volumes? Backup/Restore a dockerized PostgreSQL database Export MySQL database using PHP only Tar a directory, but don't store full absolute paths in the archive How to extract or unpack an .ab file (Android Backup file) mysqldump with create database line Postgresql 9.2 pg_dump version mismatch How to backup Sql Database Programmatically in C# Opening a SQL Server .bak file (Not restoring!)

Examples related to character-encoding

Changing PowerShell's default output encoding to UTF-8 JsonParseException : Illegal unquoted character ((CTRL-CHAR, code 10) Change the encoding of a file in Visual Studio Code What is the difference between utf8mb4 and utf8 charsets in MySQL? How to open html file? All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"? UTF-8 output from PowerShell ERROR 1115 (42000): Unknown character set: 'utf8mb4' "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte How to make php display \t \n as tab and new line instead of characters

Examples related to mirror

How to run html file on localhost? How to update a git clone --mirror? Question mark characters displaying within text, why is this?