[php] XSS filtering function in PHP

Does anyone know of a good function out there for filtering generic input from forms? Zend_Filter_input seems to require prior knowledge of the contents of the input and I'm concerned that using something like HTML Purifier will have a big performance impact.

What about something like : http://snipplr.com/view/1848/php--sacar-xss/

Many thanks for any input.

This question is related to php filter xss

The answer is


I'm was collect most of issues by the web and combine stepping filter for all of them.

After some testing seems it works perfect:


/*
* Total XSS preventer class by Full-R
*
*/

final class xCleaner {

    public static function clean( string $html ): string {

        return self::cleanXSS(

            preg_replace(

                [

                    '/\s?<iframe[^>]*?>.*?<\/iframe>\s?/si',
                    '/\s?<style[^>]*?>.*?<\/style>\s?/si',
                    '/\s?<script[^>]*?>.*?<\/script>\s?/si',
                    '#\son\w*="[^"]+"#',

                ],

                [
                    '',
                    '',
                    ''
                ],

                $html

            )

        );

    }

    protected static function hexToSymbols( string $s ): string {

        return html_entity_decode($s, ENT_XML1, 'UTF-8');

    }

    protected static function escape( string $s, string $m = 'attr' ): string {

        preg_match_all('/data:\w+\/([a-zA-Z]*);base64,(?!_#_#_)([^)\'"]*)/mi', $s, $b64, PREG_OFFSET_CAPTURE);

        if( count( array_filter( $b64 ) ) > 0 ) {

            switch( $m ) {

                case 'attr':

                    $xclean = self::cleanXSS(

                                        urldecode(

                                            base64_decode(

                                                $b64[ 2 ][ 0 ][ 0 ]

                                            )

                                        )

                                );

                    break;

                case 'tag':

                    $xclean = self::cleanTagInnerXSS(

                                        urldecode(

                                            base64_decode(

                                                $b64[ 2 ][ 0 ][ 0 ]

                                            )

                                        )

                                );

                    break;

            }

            return substr_replace(

                $s,

                '_#_#_'. base64_encode( $xclean ),

                $b64[ 2 ][ 0 ][ 1 ],

                strlen( $b64[ 2 ][ 0 ][ 0 ] )

            );

        }
        else {

            return $s;

        }

    }

    protected static function cleanXSS( string $s ): string {

        // base64 injection prevention
        $st = self::escape( $s, 'attr' );

        return preg_replace([

                // JSON unicode
                '/\\\\u?{?([a-f0-9]{4,}?)}?/mi',                                                                    // [1] unicode JSON clean

                // Data b64 safe
                '/\*\w*\*/mi',                                                                                            // [2] unicode simple clean

                // Malware payloads
                '/:?e[\s]*x[\s]*p[\s]*r[\s]*e[\s]*s[\s]*s[\s]*i[\s]*o[\s]*n[\s]*(:|;|,)?\w*/mi',    // [3]  (:expression) evalution
                '/l[\s]*i[\s]*v[\s]*e[\s]*s[\s]*c[\s]*r[\s]*i[\s]*p[\s]*t[\s]*(:|;|,)?\w*/mi',         // [4]  (livescript:) evalution
                '/j[\s]*s[\s]*c[\s]*r[\s]*i[\s]*p[\s]*t[\s]*(:|;|,)?\w*/mi',                                 // [5]  (jscript:) evalution
                '/j[\s]*a[\s]*v[\s]*a[\s]*s[\s]*c[\s]*r[\s]*i[\s]*p[\s]*t[\s]*(:|;|,)?\w*/mi',       // [6]  (javascript:) evalution
                '/b[\s]*e[\s]*h[\s]*a[\s]*v[\s]*i[\s]*o[\s]*r[\s]*(:|;|,)?\w*/mi',                     // [7]  (behavior:) evalution
                '/v[\s]*b[\s]*s[\s]*c[\s]*r[\s]*i[\s]*p[\s]*t[\s]*(:|;|,)?\w*/mi',                      // [8]  (vsbscript:) evalution
                '/v[\s]*b[\s]*s[\s]*(:|;|,)?\w*/mi',                                                              // [9]  (vbs:) evalution
                '/e[\s]*c[\s]*m[\s]*a[\s]*s[\s]*c[\s]*r[\s]*i[\s]*p[\s]*t*(:|;|,)?\w*/mi',        // [10] (ecmascript:) possible ES evalution
                '/b[\s]*i[\s]*n[\s]*d[\s]*i[\s]*n[\s]*g*(:|;|,)?\w*/mi',                                 // [11] (-binding) payload
                '/\+\/v(8|9|\+|\/)?/mi',                                                                          // [12] (UTF-7 mutation)

                // Some entities
                '/&{\w*}\w*/mi',                                                                                   // [13] html entites clenup
                '/&#\d+;?/m',                                                                                      // [14] html entites clenup

                // Script tag encoding mutation issue
                '/\¼\/?\w*\¾\w*/mi',                                                                         // [21] mutation KOI-8
                '/\+ADw-\/?\w*\+AD4-\w*/mi',                                                         // [22] mutation old encodings

                '/\/*?%00*?\//m',

                // base64 escaped
                '/_#_#_/mi',                                                                                       // [23] base64 escaped marker cleanup
             
            ],

            // Replacements steps :: 23
            ['&#x$1;', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],

            str_ireplace(

                ['\u0', '&colon;', '&tab;', '&newline;'],
                ['\0', ':', '', ''],

            // U-HEX prepare step
            self::hexToSymbols( $st ))

        );

    }

}

Also you can add Tidy markup correction to make HTML valid.


htmlspecialchars() is perfectly adequate for filtering user input that is displayed in html forms.


I have a similar problem. I need users to submit html content to a profile page with a great WYSIWYG editor (Redactorjs!), i wrote the following function to clean the submitted html:

    <?php function filterxss($str) {
//Initialize DOM:
$dom = new DOMDocument();
//Load content and add UTF8 hint:
$dom->loadHTML('<meta http-equiv="content-type" content="text/html; charset=utf-8">'.$str);
//Array holds allowed attributes and validation rules:
$check = array('src'=>'#(http://[^\s]+(?=\.(jpe?g|png|gif)))#i','href'=>'|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i');
//Loop all elements:
foreach($dom->getElementsByTagName('*') as $node){
    for($i = $node->attributes->length -1; $i >= 0; $i--){
        //Get the attribute:
        $attribute = $node->attributes->item($i);
        //Check if attribute is allowed:
        if( in_array($attribute->name,array_keys($check))) {
            //Validate by regex:    
            if(!preg_match($check[$attribute->name],$attribute->value)) { 
                //No match? Remove the attribute
                $node->removeAttributeNode($attribute); 
            }
        }else{
            //Not allowed? Remove the attribute:
            $node->removeAttributeNode($attribute);
        }
    }
}
var_dump($dom->saveHTML()); } ?>

The $check array holds all the allowed attributes and validation rules. Maybe this is useful for some of you. I haven't tested is yet, so tips are welcome


There are a number of ways hackers put to use for XSS attacks, PHP's built-in functions do not respond to all sorts of XSS attacks. Hence, functions such as strip_tags, filter_var, mysql_real_escape_string, htmlentities, htmlspecialchars, etc do not protect us 100%. You need a better mechanism, here is what is solution:

function xss_clean($data)
{
// Fix &entity\n;
$data = str_replace(array('&amp;','&lt;','&gt;'), array('&amp;amp;','&amp;lt;','&amp;gt;'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');

// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);

// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);

// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);

// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);

do
{
    // Remove really unwanted tags
    $old_data = $data;
    $data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
}
while ($old_data !== $data);

// we are done...
return $data;
}

Try using for Clean XSS

xss_clean($data): "><script>alert(String.fromCharCode(74,111,104,116,111,32,82,111,98,98,105,101))</script>

All above methods don't allow to preserve some tags like <a>, <table> etc. There is an ultimate solution http://sourceforge.net/projects/kses/ Drupal uses it


the best and the secure way is to use HTML Purifier. Follow this link for some hints on using it with Zend Framework.

HTML Purifier with Zend Framework


According to www.mcafeesecure.com General Solution for vulnerable to cross-site scripting (XSS) filter function can be:

function xss_cleaner($input_str) {
    $return_str = str_replace( array('<','>',"'",'"',')','('), array('&lt;','&gt;','&apos;','&#x22;','&#x29;','&#x28;'), $input_str );
    $return_str = str_ireplace( '%3Cscript', '', $return_str );
    return $return_str;
}

function clean($data){
    $data = rawurldecode($data);
    return filter_var($data, FILTER_SANITIZE_SPEC_CHARS);
}

I found a solution for my problem with the posts with german umlaut. To provide from totally cleaning (killing) the posts, i encode the incoming data:

    *$data = utf8_encode($data);
    ... function ...*

And at last i decode the output to get correct signs:

    *$data = utf8_decode($data);*

Now the post go through the filter function and i get a correct result...


Examples related to php

I am receiving warning in Facebook Application using PHP SDK Pass PDO prepared statement to variables Parse error: syntax error, unexpected [ Preg_match backtrack error Removing "http://" from a string How do I hide the PHP explode delimiter from submitted form results? Problems with installation of Google App Engine SDK for php in OS X Laravel 4 with Sentry 2 add user to a group on Registration php & mysql query not echoing in html with tags? How do I show a message in the foreach loop?

Examples related to filter

Monitoring the Full Disclosure mailinglist Pyspark: Filter dataframe based on multiple conditions How Spring Security Filter Chain works Copy filtered data to another sheet using VBA Filter object properties by key in ES6 How do I filter date range in DataTables? How do I filter an array with TypeScript in Angular 2? Filtering array of objects with lodash based on property value How to filter an array from all elements of another array How to specify "does not contain" in dplyr filter

Examples related to xss

WARNING: sanitizing unsafe style value url What is the http-header "X-XSS-Protection"? How to pass parameters to a Script tag? How do you use window.postMessage across domains? Sanitizing user input before adding it to the DOM in Javascript XSS prevention in JSP/Servlet web application How do I prevent people from doing XSS in Spring MVC? How to prevent XSS with HTML/PHP? XSS filtering function in PHP Java Best Practices to Prevent Cross Site Scripting