[regex] What is the best regular expression to check if a string is a valid URL?

I created a similar regex (PCRE) to the one @eyelidlessness provided following RFC3987 along with other RFC documents. The major difference between @eyelidlessness and my regex are mainly readability and also URN support.

The regex below is all one piece (instead of being mixed with PHP) so it can be used in different languages very easily (so long as they support PCRE)

The easiest way to test this regex is to use regex101 and copy paste the code and test strings below with the appropriate modifiers (gmx).

To use this regex in PHP, insert the regex below into the following code:

$regex = <<<'EOD'
// Put the regex here
EOD;


You can match a link without a scheme by doing the following:
To match a link without a scheme (i.e. [email protected] or www.google.com/pathtofile.php?query), replace this section:

  (?:
    (?<scheme>
      (?<urn>urn)|
      (?&d_scheme)
    )
    :
  )?

with this:

  (?:
    (?<scheme>
      (?<urn>urn)|
      (?&d_scheme)
    )
    :
  )?

Note, however, that by replacing this, the regex does not become 100% reliable.


Regex (PCRE) with gmx modifiers for the multi-line test string below

(?(DEFINE)
  # Definitions
  (?<ALPHA>[\p{L}])
  (?<DIGIT>[0-9])
  (?<HEX>[0-9a-fA-F])
  (?<NCCHAR>
    (?&UNRESERVED)|
    (?&PCT_ENCODED)|
    (?&SUB_DELIMS)|
    @
  )
  (?<PCHAR>
    (?&UNRESERVED)|
    (?&PCT_ENCODED)|
    (?&SUB_DELIMS)|
    :|
    @|
    \/
  )
  (?<UCHAR>
    (?&UNRESERVED)|
    (?&PCT_ENCODED)|
    (?&SUB_DELIMS)|
    :
  )
  (?<RCHAR>
    (?&UNRESERVED)|
    (?&PCT_ENCODED)|
    (?&SUB_DELIMS)
  )
  (?<PCT_ENCODED>%(?&HEX){2})
  (?<UNRESERVED>
    ((?&ALPHA)|(?&DIGIT)|[-._~])
  )
  (?<RESERVED>(?&GEN_DELIMS)|(?&SUB_DELIMS))
  (?<GEN_DELIMS>[:\/?#\[\]@])
  (?<SUB_DELIMS>[!$&'()*+,;=])
  # URI Parts
  (?<d_scheme>
    (?!urn)
    (?:
      (?&ALPHA)
      ((?&ALPHA)|(?&DIGIT)|[+-.])*
      (?=:)
    )
  )
  (?<d_hier_part_slashes>
    (\/{2})?
  )
  (?<d_authority>(?&d_userinfo)?)
  (?<d_userinfo>(?&UCHAR)*)
  (?<d_ipv6>
    (?![^:]*::[^:]*::[^:]*)
    (
      (
        ((?&HEX){0,4})
        :
      ){1,7}
      ((?&d_ipv4)|:|(?&HEX){1,4})
    )
  )
  (?<d_ipv4>
    ((?&octet)\.){3}
    (?&octet)
  )
  (?<octet>
    (
      25[]0-5]|
      2[0-4](?&DIGIT)|
      1(?&DIGIT){2}|
      [1-9](?&DIGIT)|
      (?&DIGIT)
    )
  )
  (?<d_reg_name>(?&RCHAR)*)
  (?<d_urn_name>(?&UCHAR)*)
  (?<d_port>(?&DIGIT)*)
  (?<d_path>
    (
      \/
      ((?&PCHAR)*)*
      (?=\?|\#|$)
    )
  )
  (?<d_query>
    (
      ((?&PCHAR)|\/|\?)*
    )?
  )
  (?<d_fragment>
    (
      ((?&PCHAR)|\/|\?)*
    )?
  )
)
^
(?<link>
  (?:
    (?<scheme>
      (?<urn>urn)|
      (?&d_scheme)
    )
    :
  )
  (?(urn)
    (?:
      (?<namespace_identifier>[0-9a-zA-Z\-]+)
      :
      (?<namespace_specific_string>(?&d_urn_name)+)
    )
    |
    (?<hier_part>
      (?<slashes>(?&d_hier_part_slashes))
      (?<authority>
        (?:
          (?<userinfo>(?&d_authority))
          @
        )?
        (?<host>
          (?<ipv4>\[?(?&d_ipv4)\]?)|
          (?<ipv6>\[(?&d_ipv6)\])|
          (?<domain>(?&d_reg_name))
        )
        (?:
          :
          (?<port>(?&d_port))
        )?
      )
      (?<path>(?&d_path))?
    )
    (?:
      \?
      (?<query>(?&d_query))
    )?
    (?:
      \#
      (?<fragment>(?&d_fragment))
    )?
  )
)
$

Test Strings

# Valid URIs
ftp://cnn.example.com&[email protected]/top_story.htm
ftp://ftp.is.co.za/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB?objectClass?one
mailto:[email protected]
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
telnet://192.0.2.16:80/
urn:isbn:0451450523
urn:oid:2.16.840
urn:isan:0000-0000-9E59-0000-O-0000-0000-2
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
http://localhost/test/somefile.php?query=someval&variable=value#fragment
http://[2001:db8:a0b:12f0::1]/test
ftp://username:[email protected]/path/to/file/somefile.html?queryVariable=value#fragment
https://subdomain.domain.com/path/to/file.php?query=value#fragment
https://subdomain.example.com/path/to/file.php?query=value#fragment
mailto:john.smith(comment)@example.com
mailto:user@[2001:DB8::1]
mailto:user@[255:192:168:1]
mailto:[email protected]
http://localhost:4433/path/to/file?query#fragment
# Note that the example below IS a valid as it does follow RFC standards
localhost:4433/path/to/file

# These work with the optional scheme group although I'd suggest making the scheme mandatory as misinterpretations can occur
[email protected]
www.google.com/pathtofile.php?query
[192a:123::192.168.1.1]:80/path/to/file.html?query#fragment

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to url

What is the difference between URL parameters and query strings? Allow Access-Control-Allow-Origin header using HTML5 fetch API File URL "Not allowed to load local resource" in the Internet Browser Slack URL to open a channel from browser Getting absolute URLs using ASP.NET Core How do I load an HTTP URL with App Transport Security enabled in iOS 9? Adding form action in html in laravel React-router urls don't work when refreshing or writing manually URL for public Amazon S3 bucket How can I append a query parameter to an existing URL?

Examples related to language-agnostic

IOException: The process cannot access the file 'file path' because it is being used by another process Peak signal detection in realtime timeseries data Match linebreaks - \n or \r\n? Simple way to understand Encapsulation and Abstraction How can I pair socks from a pile efficiently? How do I determine whether my calculation of pi is accurate? What is ADT? (Abstract Data Type) How to explain callbacks in plain english? How are they different from calling one function from another function? Ukkonen's suffix tree algorithm in plain English Private vs Protected - Visibility Good-Practice Concern