[php] How do you make strings "XML safe"?

I am responding to an AJAX call by sending it an XML document through PHP echos. In order to form this XML document, I loop through the records of a database. The problem is that the database includes records that have '<' symbols in them. So naturally, the browser throws an error at that particular spot. How can this be fixed?

This question is related to php xml cakephp

The answer is


Since PHP 5.4 you can use:

htmlspecialchars($string, ENT_XML1);

You should specify the encoding, such as:

htmlspecialchars($string, ENT_XML1, 'UTF-8');

Update

Note that the above will only convert:

  • & to &amp;
  • < to &lt;
  • > to &gt;

If you want to escape text for use in an attribute enclosed in double quotes:

htmlspecialchars($string, ENT_XML1 | ENT_COMPAT, 'UTF-8');

will convert " to &quot; in addition to &, < and >.


And if your attributes are enclosed in single quotes:

htmlspecialchars($string, ENT_XML1 | ENT_QUOTES, 'UTF-8');

will convert ' to &apos; in addition to &, <, > and ".

(Of course you can use this even outside of attributes).


See the manual entry for htmlspecialchars.


I prefer the way Golang does quote escaping for XML (and a few extras like newline escaping, and escaping some other characters), so I have ported its XML escape function to PHP below

function isInCharacterRange(int $r): bool {
    return $r == 0x09 ||
            $r == 0x0A ||
            $r == 0x0D ||
            $r >= 0x20 && $r <= 0xDF77 ||
            $r >= 0xE000 && $r <= 0xFFFD ||
            $r >= 0x10000 && $r <= 0x10FFFF;
}

function xml(string $s, bool $escapeNewline = true): string {
    $w = '';

    $Last = 0;
    $l = strlen($s);
    $i = 0;

    while ($i < $l) {
        $r = mb_substr(substr($s, $i), 0, 1);
        $Width = strlen($r);
        $i += $Width;
        switch ($r) {
            case '"':
                $esc = '&#34;';
                break;
            case "'":
                $esc = '&#39;';
                break;
            case '&':
                $esc = '&amp;';
                break;
            case '<':
                $esc = '&lt;';
                break;
            case '>':
                $esc = '&gt;';
                break;
            case "\t":
                $esc = '&#x9;';
                break;
            case "\n":
                if (!$escapeNewline) {
                    continue 2;
                }
                $esc = '&#xA;';
                break;
            case "\r":
                $esc = '&#xD;';
                break;
            default:
                if (!isInCharacterRange(mb_ord($r)) || (mb_ord($r) === 0xFFFD && $Width === 1)) {
                    $esc = "\u{FFFD}";
                    break;
                }

                continue 2;
        }
        $w .= substr($s, $Last, $i - $Last - $Width) . $esc;
        $Last = $i;
    }
    $w .= substr($s, $Last);
    return $w;
}

Note you'll need at least PHP7.2 because of the mb_ord usage, or you'll have to swap it out for another polyfill, but these functions are working great for us!

For anyone curious, here is the relevant Go source https://golang.org/src/encoding/xml/xml.go?s=44219:44263#L1887


1) You can wrap your text as CDATA like this:

<mytag>
    <![CDATA[Your text goes here. Btw: 5<6 and 6>5]]>
</mytag>

see http://www.w3schools.com/xml/xml_cdata.asp

2) As already someone said: Escape those chars. E.g. like so:

5&lt;6 and 6&gt;5

Try this:

$str = htmlentities($str,ENT_QUOTES,'UTF-8');

So, after filtering your data using htmlentities() function, you can use the data in XML tag like:

<mytag>$str</mytag>

If at all possible, its always a good idea to create your XML using the XML classes rather than string manipulation - one of the benefits being that the classes will automatically escape characters as needed.


Adding this in case it helps someone.

As I am working with Japanese characters, encoding has also been set appropriately. However, from time to time, I find that htmlentities and htmlspecialchars are not sufficient.

Some user inputs contain special characters that are not stripped by the above functions. In those cases I have to do this:

preg_replace('/[\x00-\x1f]/','',htmlspecialchars($string))

This will also remove certain xml-unsafe control characters like Null character or EOT. You can use this table to determine which characters you wish to omit.


Examples related to php

I am receiving warning in Facebook Application using PHP SDK Pass PDO prepared statement to variables Parse error: syntax error, unexpected [ Preg_match backtrack error Removing "http://" from a string How do I hide the PHP explode delimiter from submitted form results? Problems with installation of Google App Engine SDK for php in OS X Laravel 4 with Sentry 2 add user to a group on Registration php & mysql query not echoing in html with tags? How do I show a message in the foreach loop?

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to cakephp

SQLSTATE[HY000] [1045] Access denied for user 'username'@'localhost' using CakePHP CakePHP 3.0 installation: intl extension missing from system How to check not in array element 404 Not Found The requested URL was not found on this server What is /var/www/html? Request exceeded the limit of 10 internal redirects due to probable configuration error how we add or remove readonly attribute from textbox on clicking radion button in cakephp using jquery? PHP Fatal error: Class 'PDO' not found Undefined variable: $_SESSION CentOS: Enabling GD Support in PHP Installation