[html] HTML encoding issues - "Â" character showing up instead of " "

I've got a legacy app just starting to misbehave, for whatever reason I'm not sure. It generates a bunch of HTML that gets turned into PDF reports by ActivePDF.

The process works like this:

  1. Pull an HTML template from a DB with tokens in it to be replaced (e.g. "~CompanyName~", "~CustomerName~", etc.)
  2. Replace the tokens with real data
  3. Tidy the HTML with a simple regex function that property formats HTML tag attribute values (ensures quotation marks, etc, since ActivePDF's rendering engine hates anything but single quotes around attribute values)
  4. Send off the HTML to a web service that creates the PDF.

Somewhere in that mess, the non-breaking spaces from the HTML template (the  s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character when viewing the document in a browser (FireFox). ActivePDF pukes on these non-UTF8 characters.

My question: since I don't know where the problem stems from and don't have time to investigate it, is there an easy way to re-encode or find-and-replace the bad characters? I've tried sending it through this little function I threw together, but it turns it all into gobbledegook doesn't change anything.

Private Shared Function ConvertToUTF8(ByVal html As String) As String
    Dim isoEncoding As Encoding = Encoding.GetEncoding("iso-8859-1")
    Dim source As Byte() = isoEncoding.GetBytes(html)
    Return Encoding.UTF8.GetString(Encoding.Convert(isoEncoding, Encoding.UTF8, source))
End Function

Any ideas?

EDIT:

I'm getting by with this for now, though it hardly seems like a good solution:

Private Shared Function ReplaceNonASCIIChars(ByVal html As String) As String
    Return Regex.Replace(html, "[^\u0000-\u007F]", " ")
End Function

This question is related to html vb.net encoding utf-8 iso-8859-1

The answer is


Well I got this Issue too in my few websites and all i need to do is customize the content fetler for HTML entites. before that more i delete them more i got, so just change you html fiter or parsing function for the page and it worked. Its mainly due to HTML editors in most of CMSs. the way they store parse the data caused this issue (In My case). May this would Help in your case too


If any one had the same problem as me and the charset was already correct, simply do this:

  1. Copy all the code inside the .html file.
  2. Open notepad (or any basic text editor) and paste the code.
  3. Go "File -> Save As"
  4. Enter you file name "example.html" (Select "Save as type: All Files (.)")
  5. Select Encoding as UTF-8
  6. Hit Save and you can now delete your old .html file and the encoding should be fixed

In my case I was getting latin cross sign instead of nbsp, even that a page was correctly encoded into the UTF-8. Nothing of above helped in resolving the issue and I tried all.

In the end changing font for IE (with browser specific css) helped, I was using Helvetica-Nue as a body font changing to the Arial resolved the issue .


In my case this (a with caret) occurred in code I generated from visual studio using my own tool for generating code. It was easy to solve:

Select single spaces ( ) in the document. You should be able to see lots of single spaces that are looking different from the other single spaces, they are not selected. Select these other single spaces - they are the ones responsible for the unwanted characters in the browser. Go to Find and Replace with single space ( ). Done.

PS: It's easier to see all similar characters when you place the cursor on one or if you select it in VS2017+; I hope other IDEs may have similar features


The reason for this is PHP doesn't recognise utf-8.

Here you can check it for all Special Characters in HTML

http://www.degraeve.com/reference/specialcharacters.php


I was having the same sort of problem. Apparently it's simply because PHP doesn't recognise utf-8.

I was tearing my hair out at first when a '£' sign kept showing up as '£', despite it appearing ok in DreamWeaver. Eventually I remembered I had been having problems with links relative to the index file, when the pages, if viewed directly would work with slideshows, but not when used with an include (but that's beside the point. Anyway I wondered if this might be a similar problem, so instead of putting into the page that I was having problems with, I simply put it into the index.php file - problem fixed throughout.


Problem: Even I was facing the problem where we were sending '£' with some string in POST request to CRM System, but when we were doing the GET call from CRM , it was returning '£' with some string content. So what we have analysed is that '£' was getting converted to '£'.

Analysis: The glitch which we have found after doing research is that in POST call we have set HttpWebRequest ContentType as "text/xml" while in GET Call it was "text/xml; charset:utf-8".

Solution: So as the part of solution we have included the charset:utf-8 in POST request and it works.


Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to vb.net

How to get parameter value for date/time column from empty MaskedTextBox HTTP 415 unsupported media type error when calling Web API 2 endpoint variable is not declared it may be inaccessible due to its protection level Differences Between vbLf, vbCrLf & vbCr Constants Simple working Example of json.net in VB.net How to open up a form from another form in VB.NET? Delete a row in DataGridView Control in VB.NET How to get cell value from DataGridView in VB.Net? Set default format of datetimepicker as dd-MM-yyyy How to configure SMTP settings in web.config

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to iso-8859-1

What is the difference between UTF-8 and ISO-8859-1? C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H HTML encoding issues - "Â" character showing up instead of "&nbsp;" Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte How do I convert between ISO-8859-1 and UTF-8 in Java? Convert utf8-characters to iso-88591 and back in PHP How do I write out a text file in C# with a code page other than UTF-8? Setting the character encoding in form submit for Internet Explorer