[php] Test if string is URL encoded in PHP

How can I test if a string is URL encoded?

Which of the following approaches is better?

  • Search the string for characters which would be encoded, which aren't, and if any exist then its not encoded, or
  • Use something like this which I've made:

function is_urlEncoded($string){
 $test_string = $string;
 while(urldecode($test_string) != $test_string){
  $test_string = urldecode($test_string);
 }
 return (urlencode($test_string) == $string)?True:False; 
}

$t = "Hello World > how are you?";
if(is_urlEncoded($sreq)){
 print "Was Encoded.\n";
}else{
 print "Not Encoded.\n";
 print "Should be ".urlencode($sreq)."\n";
}

The above code works, but not in instances where the string has been doubly encoded, as in these examples:

  • $t = "Hello%2BWorld%2B%253E%2Bhow%2Bare%2Byou%253F";
  • $t = "Hello+World%2B%253E%2Bhow%2Bare%2Byou%253F";

This question is related to php testing url-encoding

The answer is


@user187291 code works and only fails when + is not encoded.

I know this is very old post. But this worked to me.

$is_encoded = preg_match('~%[0-9A-F]{2}~i', $string);
if($is_encoded) {
 $string  = urlencode(urldecode(str_replace(['+','='], ['%2B','%3D'], $string)));
} else {
  $string = urlencode($string);
}

i have one trick :

you can do this to prevent doubly encode. Every time first decode then again encode;

$string = urldecode($string);

Then do again

$string = urlencode($string);

Performing this way we can avoid double encode :)


What about:

if (urldecode(trim($url)) == trim($url)) { $url_form = 'decoded'; }
  else { $url_form = 'encoded'; }

Will not work with double encoding but this is out of scope anyway I suppose?


private static boolean isEncodedText(String val, String... encoding) throws UnsupportedEncodingException { String decodedText = URLDecoder.decode(val, TransformFetchConstants.DEFAULT_CHARSET);

    if(encoding != null && encoding.length > 0){
        decodedText = URLDecoder.decode(val, encoding[0]);
    }

    String encodedText =  URLEncoder.encode(decodedText);

    return encodedText.equalsIgnoreCase(val) || !decodedText.equalsIgnoreCase(val);

}

I found.
The url is For Exapmle: https://example.com/xD?foo=bar&uri=https%3A%2F%2Fexample.com%2FxD
You need Found $_GET['uri'] is encoded or not:

preg_match("/.*uri=(.*)&?.*/", $_SERVER['REQUEST_URI'], $r);
if (isset($_GET['uri']) && urldecode($r['1']) === $r['1']) {
  // Code Here if url is not encoded
}

send a variable that flags the decode when you already getting data from an url.

?path=folder/new%20file.txt&decode=1

well, the term "url encoded" is a bit vague, perhaps simple regex check will do the trick

$is_encoded = preg_match('~%[0-9A-F]{2}~i', $string);

I am using the following test to see if strings have been urlencoded:

if(urlencode($str) != str_replace(['%','+'], ['%25','%2B'], $str))

If a string has already been urlencoded, the only characters that will changed by double encoding are % (which starts all encoded character strings) and + (which replaces spaces.) Change them back and you should have the original string.

Let me know if this works for you.


In my case I wanted to check if a complete URL is encoded, so I already knew that the URL must contain the string https://, and what I did was to check if the string had the encoded version of https:// in it (https%3A%2F%2F) and if it didn't, then I knew it was not encoded:

//make sure $completeUrl is encoded
if (strpos($completeUrl, urlencode('https://')) === false) {
    // not encoded, need to encode it
    $completeUrl = urlencode($completeUrl);
}

in theory this solution can be used with any string other than complete URLs, as long as you know part of the string (https:// in this example) will always exists in what you are trying to check.


I think there's no foolproof way to do it. For example, consider the following:

$t = "A+B";

Is that an URL encoded "A B" or does it need to be encoded to "A%2BB"?


There's no reliable way to do this, as there are strings which stay the same through the encoding process, i.e. is "abc" encoded or not? There's no clear answer. Also, as you've encountered, some characters have multiple encodings... But...

Your decode-check-encode-check scheme fails due to the fact that some characters may be encoded in more than one way. However, a slight modification to your function should be fairly reliable, just check if the decode modifies the string, if it does, it was encoded.

It won't be fool proof of course, as "10+20=30" will return true (+ gets converted to space), but we're actually just doing arithmetic. I suppose this is what you're scheme is attempting to counter, I'm sorry to say that I don't think there's a perfect solution.

HTH.

Edit:
As I entioned in my own comment (just reiterating here for clarity), a good compromise would probably be to check for invalid characters in your url (e.g. space), and if there are some it's not encoded. If there are none, try to decode and see if the string changes. This still won't handle the arithmetic above (which is impossible), but it'll hopefully be sufficient.


Here is something i just put together.

if ( urlencode(urldecode($data)) === $data){
    echo 'string urlencoded';
} else {
    echo 'string is NOT urlencoded';
}

Examples related to php

I am receiving warning in Facebook Application using PHP SDK Pass PDO prepared statement to variables Parse error: syntax error, unexpected [ Preg_match backtrack error Removing "http://" from a string How do I hide the PHP explode delimiter from submitted form results? Problems with installation of Google App Engine SDK for php in OS X Laravel 4 with Sentry 2 add user to a group on Registration php & mysql query not echoing in html with tags? How do I show a message in the foreach loop?

Examples related to testing

Test process.env with Jest How to configure "Shorten command line" method for whole project in IntelliJ Jest spyOn function called Simulate a button click in Jest Mockito - NullpointerException when stubbing Method toBe(true) vs toBeTruthy() vs toBeTrue() How-to turn off all SSL checks for postman for a specific site What is the difference between smoke testing and sanity testing? ReferenceError: describe is not defined NodeJs How to properly assert that an exception gets raised in pytest?

Examples related to url-encoding

A html space is showing as %2520 instead of %20 Sharing a URL with a query string on Twitter What is %2C in a URL? How to do URL decoding in Java? How to encode URL to avoid special characters in Java? urlencoded Forward slash is breaking URL Test if string is URL encoded in PHP URL encoding the space character: + or %20? In a URL, should spaces be encoded using %20 or +? urlencode vs rawurlencode?