HTML encoding issues - character showing up instead of nbsp

Question

I ve got a legacy app just starting to misbehave  for whatever reason I m not sure   It generates a bunch of HTML that gets turned into PDF reports by ActivePDF   The process works like this    Pull an HTML template from a DB with tokens in it to be replaced  e g    CompanyName      CustomerName    etc   Replace the tokens with real data Tidy the HTML with a simple regex function that property formats HTML tag attribute values  ensures quotation marks  etc  since ActivePDF s rendering engine hates anything but single quotes around attribute values  Send off the HTML to a web service that creates the PDF    Somewhere in that mess  the non-breaking spaces from the HTML template  the  amp nbsp s  are encoding as ISO-8859-1 so that they show up incorrectly as an      character when viewing the document in a browser  FireFox    ActivePDF pukes on these non-UTF8 characters   My question  since I don t know where the problem stems from and don t have time to investigate it  is there an easy way to re-encode or find-and-replace the bad characters   I ve tried sending it through this little function I threw together  but it turns it all into gobbledegook doesn t change anything   Private Shared Function ConvertToUTF8 ByVal html As String  As String     Dim isoEncoding As Encoding   Encoding GetEncoding  iso-8859-1       Dim source As Byte     isoEncoding GetBytes html      Return Encoding UTF8 GetString Encoding Convert isoEncoding  Encoding UTF8  source   End Function   Any ideas   EDIT   I m getting by with this for now  though it hardly seems like a good solution   Private Shared Function ReplaceNonASCIIChars ByVal html As String  As String     Return Regex Replace html      u0000- u007F      amp nbsp    End Function

User · Answer

In my case I was getting latin cross sign instead of nbsp, even that a page was correctly encoded into the UTF-8. Nothing of above helped in resolving the issue and I tried all.

In the end changing font for IE (with browser specific css) helped, I was using Helvetica-Nue as a body font changing to the Arial resolved the issue .

User · Answer

If any one had the same problem as me and the charset was already correct  simply do this    Copy all the code inside the  html file  Open notepad  or any basic text editor  and paste the code  Go  File -  Save As  Enter you file name  example html   Select  Save as type  All Files       Select Encoding as UTF-8 Hit Save and you can now delete your old  html file and the encoding should be fixed

User · Answer

Well I got this Issue too in my few websites and all i need to do is customize the content fetler for HTML entites  before that more i delete them more i got  so just change you html fiter or parsing function for the page and it worked  Its mainly due to HTML editors in most of CMSs  the way they store parse the data caused this issue  In My case   May this would Help in your case too

User · Answer

I was having the same sort of problem   Apparently it s simply because PHP doesn t recognise utf-8   I was tearing my hair out at first when a      sign kept showing up as         despite it appearing ok in DreamWeaver   Eventually I remembered I had been having problems with links relative to the index file  when the pages  if viewed directly would work with slideshows  but not when used with an include  but that s beside the point   Anyway I wondered if this might be a similar problem  so instead of putting  into the page that I was having problems with  I simply put it into the index php file - problem fixed throughout

User · Answer

Problem  Even I was facing the problem where we were sending      with some string in POST request to CRM System  but when we were doing the GET call from CRM   it was returning        with some string content  So what we have analysed is that      was getting converted to          Analysis  The glitch which we have found after doing research is that in POST call we have set HttpWebRequest ContentType as  text xml  while in GET Call it was  text xml  charset utf-8     Solution  So as the part of solution we have included the charset utf-8 in POST request and it works

User · Answer

Somewhere in that mess  the non-breaking spaces from the HTML template  the  nbsp s  are encoding as ISO-8859-1 so that they show up incorrectly as an      character   That d be encoding to UTF-8 then  not ISO-8859-1  The non-breaking space character is byte 0xA0 in ISO-8859-1  when encoded to UTF-8 it d be 0xC2 0xA0  which  if you  incorrectly  view it as ISO-8859-1 comes out as         That includes a trailing nbsp which you might not be noticing  if that byte isn t there  then something else has mauled your document and we need to see further up to find out what   What s the regexp  how does the templating work  There would seem to be a proper HTML parser involved somewhere if your  amp nbsp  strings are  correctly  being turned into U 00A0 NON-BREAKING SPACE characters  If so  you could just process your template natively in the DOM  and ask it to serialise using the ASCII encoding to keep non-ASCII characters as character references  That would also stop you having to do regex post-processing on the HTML itself  which is always a highly dodgy business   Well anyway  for now you can add one of the following to your document s  lt head gt  and see if that makes it look right in the browser    for HTML4   lt meta http-equiv  Content-Type  content  text html charset utf-8    gt  for HTML5   lt meta charset  utf-8  gt    If you ve done that  then any remaining problem is ActivePDF s fault

User · Answer

The reason for this is PHP doesn t recognise utf-8   Here you can check it for all Special Characters in HTML  http   www degraeve com reference specialcharacters php

User · Answer

In my case this  a with caret  occurred in code I generated from visual studio using my own tool for generating code  It was easy to solve   Select single spaces     in the document  You should be able to see lots of single spaces that are looking different from the other single spaces  they are not selected  Select these other single spaces - they are the ones responsible for the unwanted characters in the browser  Go to Find and Replace with single space      Done   PS  It s easier to see all similar characters when you place the cursor on one or if you select it in VS2017   I hope other IDEs may have similar features

[html] HTML encoding issues - "Â" character showing up instead of " "

Examples related to html

Examples related to vb.net

Examples related to encoding

Examples related to utf-8

Examples related to iso-8859-1

[html] HTML encoding issues - "Â" character showing up instead of "&nbsp;"

Examples related to html

Examples related to vb.net

Examples related to encoding

Examples related to utf-8

Examples related to iso-8859-1

[html] HTML encoding issues - "Â" character showing up instead of " "