how to get domain name from URL

Question

How can I fetch a domain name from a URL String  Examples   ---------------------- ------------    input                  output        ---------------------- ------------    www google com         google         www mail yahoo com     mail yahoo     www mail yahoo co in   mail yahoo     www abc au uk          abc           ---------------------- ------------   Related   Matching a web address through regex

User · Answer

w 3      a-zA-Z0-9   a-zA-Z0-9 -  0 65  a-zA-Z0-9        a-zA-Z  2 6  gim  usage of this javascript regex ignores www and following dot  while retaining the domain intact  also properly matches no www and cc tld

User · Answer

I don t know of any libraries  but the string manipulation of domain names is easy enough    The hard part is knowing if the name is at the second or third level  For this you will need a data file you maintain  e g  for  uk is is not always the third level  some organisations  e g  bl uk  jet uk  exist at the second level    The source of Firefox from Mozilla has such a data file  check the Mozilla licensing to see if you could reuse that

User · Answer

So if you just have a string and not a window location you could use     String prototype toUrl   function     if  this  amp  amp  0  lt  this length        return undefined    var original   this toString    var s   original  if  original toLowerCase   startsWith  http          s    http       original     s   this split        var protocol   s 0   var host   s 2   var relativePath        if s length  gt  3       for var i 3 i lt  s length i                  relativePath          s i            s   host split       var domain   s s length-2          s s length-1        return       original  original      protocol  protocol      domain  domain      host  host      relativePath  relativePath      getParameter  function param                return this getParameters   param              getParameters  function            var vars       hash          var hashes   this original slice this original indexOf        1  split   amp             for  var i   0  i  lt  hashes length  i                  hash   hashes i  split                   vars push hash 0                vars hash 0     hash 1                     return vars               How to use   var str    http   en wikipedia org wiki Knopf q 1 amp t 2   var url   str toUrl   var host   url host  var domain   url domain  var original   url original  var relativePath   url relativePath  var paramQ   url getParameter  q    var paramT   url getParamter  t

User · Answer

A little late to the party  but    x000D   x000D  const urls     x000D     www abc au uk   x000D     https   github com   x000D     http   github ca   x000D     https   www google ru   x000D     http   www google co uk   x000D     www yandex com   x000D     yandex ru   x000D     yandex  x000D    x000D   x000D  urls forEach url   gt  console log url replace         www       g        x000D   x000D   x000D

User · Answer

For a certain purpose I did this quick Python function yesterday  It returns domain from URL  It s quick and doesn t need any input file listing stuff  However  I don t pretend it works in all cases  but it really does the job I needed for a simple text mining script   Output looks like this     http   www google co uk    google co uk http   24 media tumblr com tumblr m04s34rqh567ij78k 250 gif    tumblr com  def getDomain url               parts   re split       url          match   re match     w -         w -     w 2 6      parts 2            if match    None              if re search    uk   parts 2                     match   re match     w -         w -      w -     w 2 6      parts 2               return match group 2          else  return        Seems to work pretty well  However  it has to be modified to remove domain extensions on output as you wished

User · Answer

Basically  what you want is   google com        - gt  google com    - gt  google www google com    - gt  google com    - gt  google google co uk      - gt  google co uk  - gt  google www google co uk  - gt  google co uk  - gt  google www google org    - gt  google org    - gt  google www google org uk - gt  google org uk - gt  google   Optional   www google com     - gt  google com    - gt  www google images google com  - gt  google com    - gt  images google mail yahoo co uk   - gt  yahoo co uk   - gt  mail yahoo mail yahoo com     - gt  yahoo com     - gt  mail yahoo www mail yahoo com - gt  yahoo com     - gt  mail yahoo   You don t need to construct an ever-changing regex as 99  of domains will be matched properly if you simply look at the 2nd last part of the name    co com gov net org    If it is one of these  then you need to match 3 dots  else 2  Simple  Now  my regex wizardry is no match for that of some other SO ers  so the best way I ve found to achieve this is with some code  assuming you ve already stripped off the path    my  d split       domain                   split the domain part into an array   c  d                                      count how many parts   dest  d  c-2       d  c-1                 use the last 2 parts  if   d  c-2   m  co com gov net org        is the second-last part one of these      dest  d  c-3       dest                 if so  add a third part      print  dest                                show it   To just get the name  as per your question    my  d split       domain                   split the domain part into an array   c  d                                      count how many parts  if   d  c-2   m  co com gov net org        is the second-last part one of these      dest  d  c-3                            if so  give the third last     dest  d  c-4       dest if   c gt 3        optional bit    else       dest  d  c-2                            else the second last     dest  d  c-3       dest if   c gt 2        optional bit       print  dest                                show it   I like this approach because it s maintenance-free  Unless you want to validate that it s actually a legitimate domain  but that s kind of pointless because you re most likely only using this to process log files and an invalid domain wouldn t find its way in there in the first place   If you d like to match  unofficial  subdomains such as bozo za net  or bozo au uk  bozo msf ru just add  za au msf  to the regex   I d love to see someone do all of this using just a regex  I m sure it s possible

User · Answer

I know the question is seeking a regex solution but in every attempt it won t work to cover everything  I decided to write this method in Python which only works with urls that have a subdomain  i e  www mydomain co uk  and not multiple level subdomains like www mail yahoo com  def urlextract url     url split url split        if len url split   lt   2        raise Exception  Full url required with subdomain   url    return   subdomain   url split 0    domain   url split 1    suffix       join url split 2

User · Answer

You need a list of what domain prefixes and suffixes can be removed  For example   Prefixes    www    Suffixes     com  co in  au uk

User · Answer

Use this             then just extract the leading and end points   Easy  right

User · Answer

Extracting the Domain name accurately can be quite tricky mainly because the domain extension can contain 2 parts  like  com au or  co uk  and the subdomain  the prefix  may or may not be there  Listing all domain extensions is not an option because there are hundreds of these  EuroDNS com for example lists over 800 domain name extensions   I therefore wrote a short php function that uses  parse url    and some observations about domain extensions to accurately extract the url components AND the domain name  The function is as follows   function parse url all  url        url   substr  url 0 4    http    url   http      url       d   parse url  url        tmp   explode      d  host          n   count  tmp       if   n gt  2           if   n  4      n  3  amp  amp  strlen  tmp   n-2    lt  3                 d  domain      tmp   n-3        tmp   n-2        tmp   n-1                 d  domainX      tmp   n-3              else                d  domain      tmp   n-2        tmp   n-1                 d  domainX      tmp   n-2                        return  d      This simple function will work in almost every case  There are a few exceptions  but these are very rare   To demonstrate   test this function you can use the following    urls   array  www test com    test com    cp test com          echo   lt div style  overflow-x auto   gt    echo   lt table gt    echo   lt tr gt  lt th gt URL lt  th gt  lt th gt Host lt  th gt  lt th gt Domain lt  th gt  lt th gt Domain X lt  th gt  lt  tr gt    foreach   urls as  url         info   parse url all  url       echo   lt tr gt  lt td gt    url   lt  td gt  lt td gt    info  host          lt  td gt  lt td gt    info  domain     lt  td gt  lt td gt    info  domainX     lt  td gt  lt  tr gt      echo   lt  table gt  lt  div gt      The output will be as follows for the URL s listed     As you can see  the domain name and the domain name without the extension are consistently extracted whatever the URL that is presented to the function   I hope that this helps

User · Answer

www              com au  uk co  in

User · Answer

import urlparse  GENERIC TLDS          aero    asia    biz    com    coop    edu    gov    info    int    jobs         mil    mobi    museum    name    net    org    pro    tel    travel    cat         def get domain url       hostname   urlparse urlparse url lower    netloc     if hostname                  Force the recognition as a full URL         hostname   urlparse urlparse  http       uri  netloc        Remove the  user passw    www   and   port  parts     hostname   hostname split      -1  split      0  lstrip  www    split           num parts   len hostname      if  num parts  lt  3  or  len hostname -1    gt  2           return     join hostname  -1       if len hostname -2    gt  2 and hostname -2  not in GENERIC TLDS          return     join hostname  -1       if num parts  gt   3          return     join hostname  -2     This code isn t guaranteed to work with all URLs and doesn t filter those that are grammatically correct but invalid like  example uk     However it ll do the job in most cases

User · Answer

how is this             http s                 a-zA-Z0-9               a-zA-Z0-9       a-zA-Z0-9  2 3    you may want to add      to end of pattern if your goal is to rid url s passed in as a param you may add the equal sign as the first char  like              http s               a-zA-Z0-9              a-zA-Z0-9      a-zA-Z0-9  2 3     and replace with       The goal of this example to get rid of any domain name regardless of the form it appears in   i e  to ensure url parameters don t incldue domain names to avoid xss attack

User · Answer

usr bin perl -w use strict   my  url    ARGV 0   if  url                                                g      print  3

User · Answer

https           www             i

User · Answer

There are two ways  Using split  Then just parse that string  var domain    find  amp  remove protocol  http  ftp  etc   and get domain if  url indexOf         gt  -1        domain   url split      2     if  url indexOf           0        domain   url split      2     else       domain   url split      0        find  amp  remove port number domain   domain split      0     Using Regex   var r                      http   stackoverflow com questions 5343288 get-url  match r  1      gt  stackoverflow com   Hope this helps

User · Answer

I once had to write such a regex for a company I worked for  The solution was this    Get a list of every ccTLD and gTLD available  Your first stop should be IANA  The list from Mozilla looks great at first sight  but lacks ac uk for example so for this it is not really usable  Join the list like the example below  A warning  Ordering is important  If org uk would appear after uk then example org uk would match org instead of example     Example regex              com net org info coop int co  uk org  uk ac  uk uk   and so on       This worked really well and also matched weird  unofficial top-levels like de com and friends   The upside    Very fast if regex is optimally ordered   The downside of this solution is of course    Handwritten regex which has to be updated manually if ccTLDs change or get added  Tedious job  Very large regex so not very readable

User · Answer

Just for knowledge    http   api livreto co books  replace    https         a-z  3  0-9        w      a-zA-Z  2 3      a-zA-Z  2 3           3 4 5       returns livreto co

User · Answer

It is not possible without using a TLD list to compare with as their exist many cases like http   www db de  or http   bbc co uk  that will be interpreted by a regex as the domains db de  correct  and co uk  wrong    But even with that you won t have success if your list does not contain SLDs  too  URLs like http   big uk com  and http   www uk com  would be both interpreted as uk com  the first domain is big uk com    Because of that all browsers use Mozilla s Public Suffix List   https   en wikipedia org wiki Public Suffix List  You can use it in your code by importing it through this URL    http   mxr mozilla org mozilla-central source netwerk dns effective tld names dat raw 1  Feel free to extend my function to extract the domain name  only  It won t use regex and it is fast   http   www programmierer-forum de domainnamen-ermitteln-t244185 htm 3471878

[regex] how to get domain name from URL

Examples related to regex

Examples related to url