[javascript] What is a good regular expression to match a URL?

Currently I have an input box which will detect the URL and parse the data.

So right now, I am using:

var urlR = /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)
           (?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;
var url= content.match(urlR);

The problem is, when I enter a URL like www.google.com, its not working. when I entered http://www.google.com, it is working.

I am not very fluent in regular expressions. Can anyone help me?

This question is related to javascript regex

The answer is


These are the droids you're looking for. This is taken from validator.js which is the library you should really use to do this. But if you want to roll your own, who am I to stop you? If you want pure regex then you can just take out the length check. I think it's a good idea to test the length of the URL though if you really want to determine compliance with the spec.

 function isURL(str) {
     var urlRegex = '^(?!mailto:)(?:(?:http|https|ftp)://)(?:\\S+(?::\\S*)?@)?(?:(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[0-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))|localhost)(?::\\d{2,5})?(?:(/|\\?|#)[^\\s]*)?$';
     var url = new RegExp(urlRegex, 'i');
     return str.length < 2083 && url.test(str);
}

I was trying to put together some JavaScript to validate a domain name (ex. google.com) and if it validates enable a submit button. I thought that I would share my code for those who are looking to accomplish something similar. It expects a domain without any http:// or www. value. The script uses a stripped down regular expression from above for domain matching, which isn't strict about fake TLD.

http://jsfiddle.net/nMVDS/1/

$(function () {
  $('#whitelist_add').keyup(function () {
    if ($(this).val() == '') { //Check to see if there is any text entered
        //If there is no text within the input, disable the button
        $('.whitelistCheck').attr('disabled', 'disabled');
    } else {
        // Domain name regular expression
        var regex = new RegExp("^([0-9A-Za-z-\\.@:%_\+~#=]+)+((\\.[a-zA-Z]{2,3})+)(/(.)*)?(\\?(.)*)?");
        if (regex.test($(this).val())) {
            // Domain looks OK
            //alert("Successful match");
            $('.whitelistCheck').removeAttr('disabled');
        } else {
            // Domain is NOT OK
            //alert("No match");
            $('.whitelistCheck').attr('disabled', 'disabled');
        }
    }
  });
});

HTML FORM:

<form action="domain_management.php" method="get">
    <input type="text" name="whitelist_add" id="whitelist_add" placeholder="domain.com">
    <button type="submit" class="btn btn-success whitelistCheck" disabled='disabled'>Add to Whitelist</button>
</form>

Another possible solution, above solution failed for me in parsing query string params.

var regex = new RegExp("^(http[s]?:\\/\\/(www\\.)?|ftp:\\/\\/(www\\.)?|www\\.){1}([0-9A-Za-z-\\.@:%_\+~#=]+)+((\\.[a-zA-Z]{2,3})+)(/(.)*)?(\\?(.)*)?");

if(regex.test("http://google.com")){
  alert("Successful match");
}else{
  alert("No match");
}

In this solution please feel free to modify [-0-9A-Za-z\.@:%_\+~#=, to match the domain/sub domain name. In this solution query string parameters are also taken care.

If you are not using RegEx, then from the expression replace \\ by \.

Hope this helps.


Regex if you want to ensure URL starts with HTTP/HTTPS:

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

If you do not require HTTP protocol:

[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

To try this out see http://regexr.com?37i6s, or for a version which is less restrictive http://regexr.com/3e6m0.

Example JavaScript implementation:

_x000D_
_x000D_
var expression = /[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)?/gi;_x000D_
var regex = new RegExp(expression);_x000D_
var t = 'www.google.com';_x000D_
_x000D_
if (t.match(regex)) {_x000D_
  alert("Successful match");_x000D_
} else {_x000D_
  alert("No match");_x000D_
}
_x000D_
_x000D_
_x000D_


(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})

Will match the following cases

  • http://www.foufos.gr
  • https://www.foufos.gr
  • http://foufos.gr
  • http://www.foufos.gr/kino
  • http://werer.gr
  • www.foufos.gr
  • www.mp3.com
  • www.t.co
  • http://t.co
  • http://www.t.co
  • https://www.t.co
  • www.aa.com
  • http://aa.com
  • http://www.aa.com
  • https://www.aa.com

Will NOT match the following

  • www.foufos
  • www.foufos-.gr
  • www.-foufos.gr
  • foufos.gr
  • http://www.foufos
  • http://foufos
  • www.mp3#.com

_x000D_
_x000D_
var expression = /(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})/gi;_x000D_
var regex = new RegExp(expression);_x000D_
_x000D_
var check = [_x000D_
  'http://www.foufos.gr',_x000D_
  'https://www.foufos.gr',_x000D_
  'http://foufos.gr',_x000D_
  'http://www.foufos.gr/kino',_x000D_
  'http://werer.gr',_x000D_
  'www.foufos.gr',_x000D_
  'www.mp3.com',_x000D_
  'www.t.co',_x000D_
  'http://t.co',_x000D_
  'http://www.t.co',_x000D_
  'https://www.t.co',_x000D_
  'www.aa.com',_x000D_
  'http://aa.com',_x000D_
  'http://www.aa.com',_x000D_
  'https://www.aa.com',_x000D_
  'www.foufos',_x000D_
  'www.foufos-.gr',_x000D_
  'www.-foufos.gr',_x000D_
  'foufos.gr',_x000D_
  'http://www.foufos',_x000D_
  'http://foufos',_x000D_
  'www.mp3#.com'_x000D_
];_x000D_
_x000D_
check.forEach(function(entry) {_x000D_
  if (entry.match(regex)) {_x000D_
    $("#output").append( "<div >Success: " + entry + "</div>" );_x000D_
  } else {_x000D_
    $("#output").append( "<div>Fail: " + entry + "</div>" );_x000D_
  }_x000D_
});
_x000D_
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>_x000D_
<div id="output"></div>
_x000D_
_x000D_
_x000D_

Check it in rubular - NEW version

Check it in rubular - old version