[php] Best way to check if a URL is valid

I want to use PHP to check, if string stored in $myoutput variable contains a valid link syntax or is it just a normal text. The function or solution, that I'm looking for, should recognize all links formats including the ones with GET parameters.

A solution, suggested on many sites, to actually query string (using CURL or file_get_contents() function) is not possible in my case and I would like to avoid it.

I thought about regular expressions or another solution.

This question is related to php

The answer is


Personally I would like to use regular expression here. Bellow code perfectly worked for me.

$baseUrl     = url('/'); // for my case https://www.xrepeater.com
$posted_url  = "home";
// Test with one by one
/*$posted_url  = "/home";
$posted_url  = "xrepeater.com";
$posted_url  = "www.xrepeater.com";
$posted_url  = "http://www.xrepeater.com";
$posted_url  = "https://www.xrepeater.com";
$posted_url  = "https://xrepeater.com/services";
$posted_url  = "xrepeater.dev/home/test";
$posted_url  = "home/test";*/

$regularExpression  = "((https?|ftp)\:\/\/)?"; // SCHEME Check
$regularExpression .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass Check
$regularExpression .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP Check
$regularExpression .= "(\:[0-9]{2,5})?"; // Port Check
$regularExpression .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path Check
$regularExpression .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query String Check
$regularExpression .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor Check

if(preg_match("/^$regularExpression$/i", $posted_url)) { 
    if(preg_match("@^http|https://@i",$posted_url)) {
        $final_url = preg_replace("@(http://)+@i",'http://',$posted_url);
        // return "*** - ***Match : ".$final_url;
    }
    else { 
          $final_url = 'http://'.$posted_url;
          // return "*** / ***Match : ".$final_url;
         }
    }
else {
     if (substr($posted_url, 0, 1) === '/') { 
         // return "*** / ***Not Match :".$final_url."<br>".$baseUrl.$posted_url;
         $final_url = $baseUrl.$posted_url;
     }
     else { 
         // return "*** - ***Not Match :".$posted_url."<br>".$baseUrl."/".$posted_url;
         $final_url = $baseUrl."/".$final_url; }
}

Actually... filter_var($url, FILTER_VALIDATE_URL); doesn't work very well. When you type in a real url, it works but, it only checks for http:// so if you type something like "http://weirtgcyaurbatc", it will still say it's real.


Given issues with filter_var() needing http://, I use:

$is_url = filter_var($filename, FILTER_VALIDATE_URL) || array_key_exists('scheme', parse_url($filename));


Another way to check if given URL is valid is to try to access it, below function will fetch the headers from given URL, this will ensure that URL is valid AND web server is alive:

function is_url($url){
        $response = array();
        //Check if URL is empty
        if(!empty($url)) {
            $response = get_headers($url);
        }
        return (bool)in_array("HTTP/1.1 200 OK", $response, true);
/*Array
(
    [0] => HTTP/1.1 200 OK 
    [Date] => Sat, 29 May 2004 12:28:14 GMT
    [Server] => Apache/1.3.27 (Unix)  (Red-Hat/Linux)
    [Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
    [ETag] => "3f80f-1b6-3e1cb03b"
    [Accept-Ranges] => bytes
    [Content-Length] => 438
    [Connection] => close
    [Content-Type] => text/html
)*/ 
    }   

public function testing($Url=''){
    $ch = curl_init($Url);
    curl_setopt($ch, CURLOPT_TIMEOUT, 5);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $data = curl_exec($ch);
    $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
    if($httpcode >= 200 && $httpcode <= 301){
        $this->output->set_header('Access-Control-Allow-Origin: *');
        $this->output->set_content_type('application/json', 'utf-8');
        $this->output->set_status_header(200);
        $this->output->set_output(json_encode('VALID URL', JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES));
        return;
    }else{
        $this->output->set_header('Access-Control-Allow-Origin: *');
        $this->output->set_content_type('application/json', 'utf-8');
        $this->output->set_status_header(200);
        $this->output->set_output(json_encode('INVALID URL', JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES));
        return;
    }
}

Using filter_var() will fail for urls with non-ascii chars, e.g. (http://pt.wikipedia.org/wiki/Guimarães). The following function encode all non-ascii chars (e.g. http://pt.wikipedia.org/wiki/Guimar%C3%A3es) before calling filter_var().

Hope this helps someone.

<?php

function validate_url($url) {
    $path = parse_url($url, PHP_URL_PATH);
    $encoded_path = array_map('urlencode', explode('/', $path));
    $url = str_replace($path, implode('/', $encoded_path), $url);

    return filter_var($url, FILTER_VALIDATE_URL) ? true : false;
}

// example
if(!validate_url("http://somedomain.com/some/path/file1.jpg")) {
    echo "NOT A URL";
}
else {
    echo "IS A URL";
}

if anyone is interested to use the cURL for validation. You can use the following code.

<?php 
public function validationUrl($Url){
        if ($Url == NULL){
            return $false;
        }
        $ch = curl_init($Url);
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $data = curl_exec($ch);
        $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);
        return ($httpcode >= 200 && $httpcode < 300) ? true : false; 
    }

You can use this function, but its will return false if website offline.

  function isValidUrl($url) {
    $url = parse_url($url);
    if (!isset($url["host"])) return false;
    return !(gethostbyname($url["host"]) == $url["host"]);
}

Came across this article from 2012. It takes into account variables that may or may not be just plain URLs.

The author of the article, David Müeller, provides this function that he says, "...could be worth wile [sic]," along with some examples of filter_var and its shortcomings.

/**
 * Modified version of `filter_var`.
 *
 * @param  mixed $url Could be a URL or possibly much more.
 * @return bool
 */
function validate_url( $url ) {
    $url = trim( $url );

    return (
        ( strpos( $url, 'http://' ) === 0 || strpos( $url, 'https://' ) === 0 ) &&
        filter_var(
            $url,
            FILTER_VALIDATE_URL,
            FILTER_FLAG_SCHEME_REQUIRED || FILTER_FLAG_HOST_REQUIRED
        ) !== false
    );
}

Here is the best tutorial I found over there:

http://www.w3schools.com/php/filter_validate_url.asp

<?php
$url = "http://www.qbaki.com";

// Remove all illegal characters from a url
$url = filter_var($url, FILTER_SANITIZE_URL);

// Validate url
if (filter_var($url, FILTER_VALIDATE_URL) !== false) {
echo("$url is a valid URL");
} else {
echo("$url is not a valid URL");
}
?>

Possible flags:

FILTER_FLAG_SCHEME_REQUIRED - URL must be RFC compliant (like http://example)
FILTER_FLAG_HOST_REQUIRED - URL must include host name (like http://www.example.com)
FILTER_FLAG_PATH_REQUIRED - URL must have a path after the domain name (like www.example.com/example1/)
FILTER_FLAG_QUERY_REQUIRED - URL must have a query string (like "example.php?name=Peter&age=37")

function is_url($uri){
    if(preg_match( '/^(http|https):\\/\\/[a-z0-9_]+([\\-\\.]{1}[a-z_0-9]+)*\\.[_a-z]{2,5}'.'((:[0-9]{1,5})?\\/.*)?$/i' ,$uri)){
      return $uri;
    }
    else{
        return false;
    }
}