[regex] What is the best regular expression to check if a string is a valid URL?

With regard to eyelidness' answer post that reads "This is based on my reading of the URI specification.": Thanks Eyelidness, yours is the perfect solution I sought, as it is based on the URI spec! Superb work. :)

I had to make two amendments. The first to get the regexp to match IP address URLs correctly in PHP (v5.2.10) with the preg_match() function.

I had to add one more set of parenthesis to the line above "IP Address" around the pipes:

)|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?#

Not sure why.

I have also reduced the top level domain minimum length from 3 to 2 letters to support .co.uk and similar.

Final code:

/^(https?|ftp):\/\/(?#                                      protocol
)(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+(?#         username
)(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?(?#      password
)@)?(?#                                                     auth requires @
)((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?#             domain segments AND
)[a-z][a-z0-9-]*[a-z0-9](?#                                 top level domain  OR
)|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?#
    )(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?#             IP address
))(:\d+)?(?#                                                port
))(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*(?# path
)(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)(?#      query string
)?)?)?(?#                                                   path and query string optional
)(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?(?#      fragment
)$/i

This modified version was not checked against the URI specification so I can't vouch for it's compliance, it was altered to handle URLs on local network environments and two digit TLDs as well as other kinds of Web URL, and to work better in the PHP setup I use.

As PHP code:

define('URL_FORMAT', 
'/^(https?):\/\/'.                                         // protocol
'(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'.         // username
'(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'.      // password
'@)?(?#'.                                                  // auth requires @
')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'.                      // domain segments AND
'[a-z][a-z0-9-]*[a-z0-9]'.                                 // top level domain  OR
'|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'.
'(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'.                 // IP address
')(:\d+)?'.                                                // port
')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path
'(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'.      // query string
'?)?)?'.                                                   // path and query string optional
'(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'.      // fragment
'$/i');

Here is a test program in PHP which validates a variety of URLs using the regex:

<?php

define('URL_FORMAT',
'/^(https?):\/\/'.                                         // protocol
'(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'.         // username
'(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'.      // password
'@)?(?#'.                                                  // auth requires @
')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'.                      // domain segments AND
'[a-z][a-z0-9-]*[a-z0-9]'.                                 // top level domain  OR
'|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'.
'(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'.                 // IP address
')(:\d+)?'.                                                // port
')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path
'(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'.      // query string
'?)?)?'.                                                   // path and query string optional
'(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'.      // fragment
'$/i');

/**
 * Verify the syntax of the given URL. 
 * 
 * @access public
 * @param $url The URL to verify.
 * @return boolean
 */
function is_valid_url($url) {
  if (str_starts_with(strtolower($url), 'http://localhost')) {
    return true;
  }
  return preg_match(URL_FORMAT, $url);
}


/**
 * String starts with something
 * 
 * This function will return true only if input string starts with
 * niddle
 * 
 * @param string $string Input string
 * @param string $niddle Needle string
 * @return boolean
 */
function str_starts_with($string, $niddle) {
      return substr($string, 0, strlen($niddle)) == $niddle;
}


/**
 * Test a URL for validity and count results.
 * @param url url
 * @param expected expected result (true or false)
 */

$numtests = 0;
$passed = 0;

function test_url($url, $expected) {
  global $numtests, $passed;
  $numtests++;
  $valid = is_valid_url($url);
  echo "URL Valid?: " . ($valid?"yes":"no") . " for URL: $url. Expected: ".($expected?"yes":"no").". ";
  if($valid == $expected) {
    echo "PASS\n"; $passed++;
  } else {
    echo "FAIL\n";
  }
}

echo "URL Tests:\n\n";

test_url("http://localserver/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true);
test_url("http://www.google.com", true);
test_url("http://www.google.co.uk/projects/my%20folder/test.php", true);
test_url("https://myserver.localdomain", true);
test_url("http://192.168.1.120/projects/index.php", true);
test_url("http://192.168.1.1/projects/index.php", true);
test_url("http://projectpier-server.localdomain/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true);
test_url("https://2.4.168.19/project-pier?c=test&a=b", true);
test_url("https://localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true);
test_url("http://user:password@localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true);

echo "\n$passed out of $numtests tests passed.\n\n";

?>

Thanks again to eyelidness for the regex!

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to url

What is the difference between URL parameters and query strings? Allow Access-Control-Allow-Origin header using HTML5 fetch API File URL "Not allowed to load local resource" in the Internet Browser Slack URL to open a channel from browser Getting absolute URLs using ASP.NET Core How do I load an HTTP URL with App Transport Security enabled in iOS 9? Adding form action in html in laravel React-router urls don't work when refreshing or writing manually URL for public Amazon S3 bucket How can I append a query parameter to an existing URL?

Examples related to language-agnostic

IOException: The process cannot access the file 'file path' because it is being used by another process Peak signal detection in realtime timeseries data Match linebreaks - \n or \r\n? Simple way to understand Encapsulation and Abstraction How can I pair socks from a pile efficiently? How do I determine whether my calculation of pi is accurate? What is ADT? (Abstract Data Type) How to explain callbacks in plain english? How are they different from calling one function from another function? Ukkonen's suffix tree algorithm in plain English Private vs Protected - Visibility Good-Practice Concern