[html] Uses for the '"' entity in HTML

I am revising some XHTML files authored by another party. As part of this effort, I am doing some bulk editing via Linq to XML.

I've just noticed that some of the original source XHTML files contain the " HTML entity in text nodes within those files. For instance:

<p>Greeting: &quot;Hello, World!&quot;</p>

And that when recovering the XHTML text via XElement.ToString(), the &quot; entities are being replaced by plain double-quotes:

<p>Greeting: "Hello, World!"</p>

Question: Can anyone tell me what the motivation might have been for the original author to use the &quot; entities instead of plain double-quotes? Did those entities serve a purpose which I don't fully appreciate? Or, were they truly unnecessary as I suspect?

I do understand that &quot; would be necessary in certain contexts, such as when there is a need to place a double-quote within an HTML attribute. For instance:

<a href="/images/hello_world.jpg" alt="Greeting: &quot;Hello, World!&quot;">
  Greeting</a>

This question is related to html xhtml escaping linq-to-xml html-entities

The answer is


As other answers pointed out, it is most likely generated by some tool.

But if I were the original author of the file, my answer would be: Consistency.

If I am not allowed to put double quotes in my attributes, why put them in the element's content ? Why do these specs always have these exceptional cases .. If I had to write the HTML spec, I would say All double quotes need to be encoded. Done.

Today it is like In attribute values we need to encode double quotes, except when the attribute value itself is defined by single quotes. In the content of elements, double quotes can be, but are not required to be, encoded. (And I am surely forgetting some cases here).

Double quotes are a keyword of the spec, encode them. Lesser/greater than are a keyword of the spec, encode them. etc..


It is likely because they used a single function for escaping attributes and text nodes. &amp; doesn't do any harm so why complicate your code and make it more error-prone by having two escaping functions and having to pick between them?


In my experience it may be the result of auto-generation by a string-based tools, where the author did not understand the rules of HTML.

When some developers generate HTML without the use of special XML-oriented tools, they may try to be sure the resulting HTML is valid by taking the approach that everything must be escaped.

Referring to your example, the reason why every occurrence of " is represented by &quot; could be because using that approach, you can safely use such "special" characters in both attributes and values.

Another motivation I've seen is where people believe, "We must explicitly show that our symbols are not part of the syntax." Whereas, valid HTML can be created by using the proper string-manipulation tools, see the previous paragraph again.

Here is some pseudo-code loosely based on C#, although it is preferred to use valid methods and tools:

public class HtmlAndXmlWriter
{
    private string Escape(string badString)
    {
        return badString.Replace("&", "&amp;").Replace("\"", "&quot;").Replace("'", "&apos;").Replace(">", "&gt;").Replace("<", "&lt;");

    }

    public string GetHtmlFromOutObject(Object obj)
    {
        return "<div class='type_" + Escape(obj.Type) + "'>" + Escape(obj.Value) + "</div>";    

    }

}

It's really very common to see such approaches taken to generate HTML.


Reason #1

There was a point where buggy/lazy implementations of HTML/XHTML renderers were more common than those that got it right. Many years ago, I regularly encountered rendering problems in mainstream browsers resulting from the use of unencoded quote chars in regular text content of HTML/XHTML documents. Though the HTML spec has never disallowed use of these chars in text content, it became fairly standard practice to encode them anyway, so that non-spec-compliant browsers and other processors would handle them more gracefully. As a result, many "old-timers" may still do this reflexively. It is not incorrect, though it is now probably unnecessary, unless you're targeting some very archaic platforms.

Reason #2

When HTML content is generated dynamically, for example, by populating an HTML template with simple string values from a database, it's necessary to encode each value before embedding it in the generated content. Some common server-side languages provided a single function for this purpose, which simply encoded all chars that might be invalid in some context within an HTML document. Notably, PHP's htmlspecialchars() function is one such example. Though there are optional arguments to htmlspecialchars() that will cause it to ignore quotes, those arguments were (and are) rarely used by authors of basic template-driven systems. The result is that all "special chars" are encoded everywhere they occur in the generated HTML, without regard for the context in which they occur. Again, this is not incorrect, it's simply unnecessary.


Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to xhtml

How to refresh table contents in div using jquery/ajax Uses for the '&quot;' entity in HTML The entity name must immediately follow the '&' in the entity reference Can I save input from form to .txt in HTML, using JAVASCRIPT/jQuery, and then use it? How to vertically align a html radio button to it's label? Image height and width not working? Removing border from table cells Enable & Disable a Div and its elements in Javascript how to remove the bold from a headline? How to make div occupy remaining height?

Examples related to escaping

Uses for the '&quot;' entity in HTML Javascript - How to show escape characters in a string? How to print a single backslash? How to escape special characters of a string with single backslashes Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence Properly escape a double quote in CSV How to Git stash pop specific stash in 1.8.3? In Java, should I escape a single quotation mark (') in String (double quoted)? How do I escape a single quote ( ' ) in JavaScript? Which characters need to be escaped when using Bash?

Examples related to linq-to-xml

Uses for the '&quot;' entity in HTML Finding element in XDocument? how to use XPath with XDocument? How to get a json string from url? How to put attributes via XElement XDocument or XmlDocument Converting XDocument to XmlDocument and vice versa How to Get XML Node from XDocument Populate XDocument from String LINQ to read XML

Examples related to html-entities

How to create string with multiple spaces in JavaScript Uses for the '&quot;' entity in HTML How to Code Double Quotes via HTML Codes Is there Unicode glyph Symbol to represent "Search" What's the right way to decode a string that has special HTML entities in it? Which characters need to be escaped in HTML? HTML entity for the middle dot HTML character codes for this ? or this ? What do &lt; and &gt; stand for? Transmitting newline character "\n"