[javascript] How to extract base URL from a string in JavaScript?

I'm trying to find a relatively easy and reliable method to extract the base URL from a string variable using JavaScript (or jQuery).

For example, given something like:

http://www.sitename.com/article/2009/09/14/this-is-an-article/

I'd like to get:

http://www.sitename.com/

Is a regular expression the best bet? If so, what statement could I use to assign the base URL extracted from a given string to a new variable?

I've done some searching on this, but everything I find in the JavaScript world seems to revolve around gathering this information from the actual document URL using location.host or similar.

This question is related to javascript regex string url

The answer is


I use a simple regex that extracts the host form the url:

function get_host(url){
    return url.replace(/^((\w+:)?\/\/[^\/]+\/?).*$/,'$1');
}

and use it like this

var url = 'http://www.sitename.com/article/2009/09/14/this-is-an-article/'
var host = get_host(url);

Note, if the url does not end with a / the host will not end in a /.

Here are some tests:

describe('get_host', function(){
    it('should return the host', function(){
        var url = 'http://www.sitename.com/article/2009/09/14/this-is-an-article/';
        assert.equal(get_host(url),'http://www.sitename.com/');
    });
    it('should not have a / if the url has no /', function(){
        var url = 'http://www.sitename.com';
        assert.equal(get_host(url),'http://www.sitename.com');
    });
    it('should deal with https', function(){
        var url = 'https://www.sitename.com/article/2009/09/14/this-is-an-article/';
        assert.equal(get_host(url),'https://www.sitename.com/');
    });
    it('should deal with no protocol urls', function(){
        var url = '//www.sitename.com/article/2009/09/14/this-is-an-article/';
        assert.equal(get_host(url),'//www.sitename.com/');
    });
    it('should deal with ports', function(){
        var url = 'http://www.sitename.com:8080/article/2009/09/14/this-is-an-article/';
        assert.equal(get_host(url),'http://www.sitename.com:8080/');
    });
    it('should deal with localhost', function(){
        var url = 'http://localhost/article/2009/09/14/this-is-an-article/';
        assert.equal(get_host(url),'http://localhost/');
    });
    it('should deal with numeric ip', function(){
        var url = 'http://192.168.18.1/article/2009/09/14/this-is-an-article/';
        assert.equal(get_host(url),'http://192.168.18.1/');
    });
});

WebKit-based browsers, Firefox as of version 21 and current versions of Internet Explorer (IE 10 and 11) implement location.origin.

location.origin includes the protocol, the domain and optionally the port of the URL.

For example, location.origin of the URL http://www.sitename.com/article/2009/09/14/this-is-an-article/ is http://www.sitename.com.

To target browsers without support for location.origin use the following concise polyfill:

if (typeof location.origin === 'undefined')
    location.origin = location.protocol + '//' + location.host;

A lightway but complete approach to getting basic values from a string representation of an URL is Douglas Crockford's regexp rule:

var yourUrl = "http://www.sitename.com/article/2009/09/14/this-is-an-article/";
var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;
var parts = parse_url.exec( yourUrl );
var result = parts[1]+':'+parts[2]+parts[3]+'/' ;

If you are looking for a more powerful URL manipulation toolkit try URI.js It supports getters, setter, url normalization etc. all with a nice chainable api.

If you are looking for a jQuery Plugin, then jquery.url.js should help you

A simpler way to do it is by using an anchor element, as @epascarello suggested. This has the disadvantage that you have to create a DOM Element. However this can be cached in a closure and reused for multiple urls:

var parseUrl = (function () {
  var a = document.createElement('a');
  return function (url) {
    a.href = url;
    return {
      host: a.host,
      hostname: a.hostname,
      pathname: a.pathname,
      port: a.port,
      protocol: a.protocol,
      search: a.search,
      hash: a.hash
    };
  }
})();

Use it like so:

paserUrl('http://google.com');

To get the origin of any url, including paths within a website (/my/path) or schemaless (//example.com/my/path), or full (http://example.com/my/path) I put together a quick function.

In the snippet below, all three calls should log https://stacksnippets.net.

_x000D_
_x000D_
function getOrigin(url)_x000D_
{_x000D_
  if(/^\/\//.test(url))_x000D_
  { // no scheme, use current scheme, extract domain_x000D_
    url = window.location.protocol + url;_x000D_
  }_x000D_
  else if(/^\//.test(url))_x000D_
  { // just path, use whole origin_x000D_
    url = window.location.origin + url;_x000D_
  }_x000D_
  return url.match(/^([^/]+\/\/[^/]+)/)[0];_x000D_
}_x000D_
_x000D_
console.log(getOrigin('https://stacksnippets.net/my/path'));_x000D_
console.log(getOrigin('//stacksnippets.net/my/path'));_x000D_
console.log(getOrigin('/my/path'));
_x000D_
_x000D_
_x000D_


You can use below codes for get different parameters of Current URL

alert("document.URL : "+document.URL);
alert("document.location.href : "+document.location.href);
alert("document.location.origin : "+document.location.origin);
alert("document.location.hostname : "+document.location.hostname);
alert("document.location.host : "+document.location.host);
alert("document.location.pathname : "+document.location.pathname);

Instead of having to account for window.location.protocol and window.location.origin, and possibly missing a specified port number, etc., just grab everything up to the 3rd "/":

// get nth occurrence of a character c in the calling string
String.prototype.nthIndex = function (n, c) {
    var index = -1;
    while (n-- > 0) {
        index++;
        if (this.substring(index) == "") return -1; // don't run off the end
        index += this.substring(index).indexOf(c);
    }
    return index;
}

// get the base URL of the current page by taking everything up to the third "/" in the URL
function getBaseURL() {
    return document.URL.substring(0, document.URL.nthIndex(3,"/") + 1);
}

A good way is to use JavaScript native api URL object. This provides many usefull url parts.

For example:

const url = 'https://stackoverflow.com/questions/1420881/how-to-extract-base-url-from-a-string-in-javascript'

const urlObject = new URL(url);

console.log(urlObject);


// RESULT: 
//________________________________
hash: "",
host: "stackoverflow.com",
hostname: "stackoverflow.com",
href: "https://stackoverflow.com/questions/1420881/how-to-extract-base-url-from-a-string-in-javascript",
origin: "https://stackoverflow.com",
password: "",
pathname: "/questions/1420881/how-to-extract-base-url-from-a-string-in-javaript",
port: "",
protocol: "https:",
search: "",
searchParams: [object URLSearchParams]
... + some other methods

As you can see here you can just access whatever you need.

For example: console.log(urlObject.host); // "stackoverflow.com"

doc for URL


var tilllastbackslashregex = new RegExp(/^.*\//);
baseUrl = tilllastbackslashregex.exec(window.location.href);

window.location.href gives the current url address from browser address bar

it can be any thing like https://stackoverflow.com/abc/xyz or https://www.google.com/search?q=abc tilllastbackslashregex.exec() run regex and retun the matched string till last backslash ie https://stackoverflow.com/abc/ or https://www.google.com/ respectively


This, works for me:

_x000D_
_x000D_
var getBaseUrl = function (url) {_x000D_
  if (url) {_x000D_
    var parts = url.split('://');_x000D_
    _x000D_
    if (parts.length > 1) {_x000D_
      return parts[0] + '://' + parts[1].split('/')[0] + '/';_x000D_
    } else {_x000D_
      return parts[0].split('/')[0] + '/';_x000D_
    }_x000D_
  }_x000D_
};
_x000D_
_x000D_
_x000D_


String.prototype.url = function() {
  const a = $('<a />').attr('href', this)[0];
  // or if you are not using jQuery 
  // const a = document.createElement('a'); a.setAttribute('href', this);
  let origin = a.protocol + '//' + a.hostname;
  if (a.port.length > 0) {
    origin = `${origin}:${a.port}`;
  }
  const {host, hostname, pathname, port, protocol, search, hash} = a;
  return {origin, host, hostname, pathname, port, protocol, search, hash};

}

Then :

'http://mysite:5050/pke45#23'.url()
 //OUTPUT : {host: "mysite:5050", hostname: "mysite", pathname: "/pke45", port: "5050", protocol: "http:",hash:"#23",origin:"http://mysite:5050"}

For your request, you need :

 'http://mysite:5050/pke45#23'.url().origin

Review 07-2017 : It can be also more elegant & has more features

const parseUrl = (string, prop) =>  {
  const a = document.createElement('a'); 
  a.setAttribute('href', string);
  const {host, hostname, pathname, port, protocol, search, hash} = a;
  const origin = `${protocol}//${hostname}${port.length ? `:${port}`:''}`;
  return prop ? eval(prop) : {origin, host, hostname, pathname, port, protocol, search, hash}
}

Then

parseUrl('http://mysite:5050/pke45#23')
// {origin: "http://mysite:5050", host: "mysite:5050", hostname: "mysite", pathname: "/pke45", port: "5050"…}


parseUrl('http://mysite:5050/pke45#23', 'origin')
// "http://mysite:5050"

Cool!


If you're using jQuery, this is a kinda cool way to manipulate elements in javascript without adding them to the DOM:

var myAnchor = $("<a />");

//set href    
myAnchor.attr('href', 'http://example.com/path/to/myfile')

//your link's features
var hostname = myAnchor.attr('hostname'); // http://example.com
var pathname = myAnchor.attr('pathname'); // /path/to/my/file
//...etc

You can do it using a regex :

/(http:\/\/)?(www)[^\/]+\//i

does it fit ?


Well, URL API object avoids splitting and constructing the url's manually.

 let url = new URL('https://stackoverflow.com/questions/1420881');
 alert(url.origin);

Don't need to use jQuery, just use

location.hostname

This works:

location.href.split(location.pathname)[0];

function getBaseURL() {
    var url = location.href;  // entire url including querystring - also: window.location.href;
    var baseURL = url.substring(0, url.indexOf('/', 14));


    if (baseURL.indexOf('http://localhost') != -1) {
        // Base Url for localhost
        var url = location.href;  // window.location.href;
        var pathname = location.pathname;  // window.location.pathname;
        var index1 = url.indexOf(pathname);
        var index2 = url.indexOf("/", index1 + 1);
        var baseLocalUrl = url.substr(0, index2);

        return baseLocalUrl + "/";
    }
    else {
        // Root Url for domain name
        return baseURL + "/";
    }

}

You then can use it like this...

var str = 'http://en.wikipedia.org/wiki/Knopf?q=1&t=2';
var url = str.toUrl();

The value of url will be...

{
"original":"http://en.wikipedia.org/wiki/Knopf?q=1&t=2",<br/>"protocol":"http:",
"domain":"wikipedia.org",<br/>"host":"en.wikipedia.org",<br/>"relativePath":"wiki"
}

The "var url" also contains two methods.

var paramQ = url.getParameter('q');

In this case the value of paramQ will be 1.

var allParameters = url.getParameters();

The value of allParameters will be the parameter names only.

["q","t"]

Tested on IE,chrome and firefox.


var host = location.protocol + '//' + location.host + '/';

If you are extracting information from window.location.href (the address bar), then use this code to get http://www.sitename.com/:

var loc = location;
var url = loc.protocol + "//" + loc.host + "/";

If you have a string, str, that is an arbitrary URL (not window.location.href), then use regular expressions:

var url = str.match(/^(([a-z]+:)?(\/\/)?[^\/]+\/).*$/)[1];

I, like everyone in the Universe, hate reading regular expressions, so I'll break it down in English:

  • Find zero or more alpha characters followed by a colon (the protocol, which can be omitted)
  • Followed by // (can also be omitted)
  • Followed by any characters except / (the hostname and port)
  • Followed by /
  • Followed by whatever (the path, less the beginning /).

No need to create DOM elements or do anything crazy.


There is no reason to do splits to get the path, hostname, etc from a string that is a link. You just need to use a link

//create a new element link with your link
var a = document.createElement("a");
a.href="http://www.sitename.com/article/2009/09/14/this-is-an-article/";

//hide it from view when it is added
a.style.display="none";

//add it
document.body.appendChild(a);

//read the links "features"
alert(a.protocol);
alert(a.hostname)
alert(a.pathname)
alert(a.port);
alert(a.hash);

//remove it
document.body.removeChild(a);

You can easily do it with jQuery appending the element and reading its attr.

Update: There is now new URL() which simplifies it

_x000D_
_x000D_
const myUrl = new URL("https://www.example.com:3000/article/2009/09/14/this-is-an-article/#m123")

const parts = ['protocol', 'hostname', 'pathname', 'port', 'hash'];

parts.forEach(key => console.log(key, myUrl[key]))
_x000D_
_x000D_
_x000D_


Examples related to javascript

need to add a class to an element How to make a variable accessible outside a function? Hide Signs that Meteor.js was Used How to create a showdown.js markdown extension Please help me convert this script to a simple image slider Highlight Anchor Links when user manually scrolls? Summing radio input values How to execute an action before close metro app WinJS javascript, for loop defines a dynamic variable name Getting all files in directory with ajax

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to url

What is the difference between URL parameters and query strings? Allow Access-Control-Allow-Origin header using HTML5 fetch API File URL "Not allowed to load local resource" in the Internet Browser Slack URL to open a channel from browser Getting absolute URLs using ASP.NET Core How do I load an HTTP URL with App Transport Security enabled in iOS 9? Adding form action in html in laravel React-router urls don't work when refreshing or writing manually URL for public Amazon S3 bucket How can I append a query parameter to an existing URL?