Get domain name from given url

Question

Given a URL  I want to extract domain name It should not include  www  part   Url can contain http https  Here is the java code that I wrote  Though It seems to work fine  is there any better approach or are there some edge cases  that could fail   public static String getDomainName String url  throws MalformedURLException      if  url startsWith  http    amp  amp   url startsWith  https              url    http       url                    URL netUrl   new URL url       String host   netUrl getHost        if host startsWith  www             host   host substring  www  length   1             return host      Input  http   google com blah  Output  google com

User · Answer

private static final String hostExtractorRegexString       https         www             com au   uk co   in be in uk org   in org net edu gov mil    private static final Pattern hostExtractorRegexPattern   Pattern compile hostExtractorRegexString    public static String getDomainName String url       if  url    null  return null      url   url trim        Matcher m   hostExtractorRegexPattern matcher url       if m find    amp  amp  m groupCount      2            return m group 1    m group 2             return null      Explanation   The regex has 4 groups  The first two are non-matching groups and the next two are matching groups   The first non-matching group is  http  or  https  or     The second non-matching group is  www   or     The second matching group is the top level domain  The first matching group is anything after the non-matching groups and anything before the top level domain  The concatenation of the two matching groups will give us the domain host name   PS   Note that you can add any number of supported domains to the regex

User · Answer

In my case i only needed the main domain and not the subdomain  no  www  or whatever the subdomain is     public static String getUrlDomain String url  throws URISyntaxException       URI uri   new URI url       String domain   uri getHost        String   domainArray   domain split             if  domainArray length    1            return domainArray 0             return domainArray domainArray length - 2          domainArray domainArray length - 1       With this method the url  https   rest webtoapp io llSlider lg en amp t 8  will have for domain  webtoapp io

User · Answer

There is a similar question Extract main domain name from a given url   If you take a look at this answer   you will see that it is very easy  You just need to use java net URL and String utility - Split

User · Answer

To get the actual domain name  without the subdomain  I use   private String getDomainName String url  throws URISyntaxException       String hostName   new URI url  getHost        if   hostName contains                 return hostName            String   host   hostName split             return host host length - 2       Note that this won t work with second-level domains  like  co uk

User · Answer

All the above are good  This one seems really simple to me and easy to understand  Excuse the quotes  I wrote it for Groovy inside a class called DataCenter   static String extractDomainName String url        int start   url indexOf            if  start  lt  0            start   0       else           start    3           int end   url indexOf      start      if  end  lt  0            end   url length             String domainName   url substring start  end       int port   domainName indexOf          if  port  gt   0            domainName   domainName substring 0  port            domainName     And here are some junit4 tests    Test void shouldFindDomainName         assert DataCenter extractDomainName  http   example com path        example com      assert DataCenter extractDomainName  http   subpart example com path        subpart example com      assert DataCenter extractDomainName  http   example com       example com      assert DataCenter extractDomainName  http   example com 18445 path        example com      assert DataCenter extractDomainName  example com path        example com      assert DataCenter extractDomainName  example com       example com

User · Answer

try this one   java net URL  JOptionPane showMessageDialog null  getDomainName new URL  https   en wikipedia org wiki List of Internet top-level domains       public String getDomainName URL url   String strDomain  String   strhost   url getHost   split Pattern quote        String   strTLD     com   org   net   int   edu   gov   mil   arpa     if Arrays asList strTLD  indexOf strhost strhost length-1   gt  0      strDomain   strhost strhost length-2      strhost strhost length-1   else if strhost length gt 2      strDomain   strhost strhost length-3      strhost strhost length-2      strhost strhost length-1   else     strDomain   strhost strhost length-2      strhost strhost length-1   return strDomain

User · Answer

Here is a short and simple line using InternetDomainName topPrivateDomain   in Guava  InternetDomainName from new URL url  getHost    topPrivateDomain   toString    Given http   www google com blah  that will give you google com  Or  given http   www google co mx  it will give you google co mx   As Sa Qada commented in another answer on this post  this question has been asked earlier  Extract main domain name from a given url  The best answer to that question is from Satya  who suggests Guava s InternetDomainName topPrivateDomain       public boolean isTopPrivateDomain        Indicates whether this domain name is composed of exactly one   subdomain component followed by a public suffix  For example  returns   true for google com and foo co uk  but not for www google com or   co uk        Warning  A true result from this method does not imply that the   domain is at the highest level which is addressable as a host  as many   public suffixes are also addressable hosts  For example  the domain   bar uk com has a public suffix of uk com  so it would return true from   this method  But uk com is itself an addressable host       This method can be used to determine whether a domain is probably the   highest level for which cookies may be set  though even that depends   on individual browsers  implementations of cookie controls  See RFC   2109 for details    Putting that together with URL getHost    which the original post already contains  gives you    import com google common net InternetDomainName   import java net URL   public class DomainNameMain      public static void main final String    args  throws Exception       final String urlString    http   www google com blah       final URL url   new URL urlString       final String host   url getHost        final InternetDomainName name   InternetDomainName from host  topPrivateDomain        System out println urlString       System out println host       System out println name

User · Answer

groovy String hostname   url - gt  url  url indexOf         3   -1   split      0      hostname  http   hello world com something      return  hello world com  hostname  docker   quay io skopeo stable      return  quay io

User · Answer

If you want to parse a URL  use java net URI   java net URL has a bunch of problems -- its equals method does a DNS lookup which means code using it can be vulnerable to denial of service attacks when used with untrusted inputs    Mr  Gosling -- why did you make url equals suck   explains one such problem   Just get in the habit of using java net URI instead   public static String getDomainName String url  throws URISyntaxException       URI uri   new URI url       String domain   uri getHost        return domain startsWith  www      domain substring 4    domain      should do what you want        Though It seems to work fine  is there any better approach or are there some edge cases  that could fail    Your code as written fails for the valid URLs    httpfoo bar -- relative URL with a path component that starts with http  HTTP   example com  -- protocol is case-insensitive    example com  -- protocol relative URL with a host www foo -- a relative URL with a path component that starts with www wwwexample com -- domain name that does not starts with www  but starts with www    Hierarchical URLs have a complex grammar   If you try to roll your own parser without carefully reading RFC 3986  you will probably get it wrong   Just use the one that s built into the core libraries   If you really need to deal with messy inputs that java net URI rejects  see RFC 3986 Appendix B      Appendix B   Parsing a URI Reference with a Regular Expression      As the  first-match-wins  algorithm is identical to the  greedy       disambiguation method used by POSIX regular expressions  it is      natural and commonplace to use a regular expression for parsing the      potential five components of a URI reference       The following line is the regular expression for breaking-down a      well-formed URI reference into its components                                                                  12            3  4          5       6  7        8 9       The numbers in the second line above are only to assist readability       they indicate the reference points for each subexpression  i e   each      paired parenthesis

User · Answer

import java net    import java io     public class ParseURL     public static void main String   args  throws Exception        URL aURL   new URL  http   example com 80 docs books tutorial                             index html name networking DOWNLOADING         System out println  protocol       aURL getProtocol       http     System out println  authority       aURL getAuthority       example com 80     System out println  host       aURL getHost       example com     System out println  port       aURL getPort       80     System out println  path       aURL getPath          docs books tutorial index html     System out println  query       aURL getQuery       name networking     System out println  filename       aURL getFile        docs books tutorial index html name networking     System out println  ref       aURL getRef       DOWNLOADING         Read more

User · Answer

One of the way I did and worked for all of the cases is using Guava Library and regex in combination   public static String getDomainNameWithGuava String url  throws MalformedURLException     URISyntaxException       String host  new URL url  getHost        String domainName         try          domainName   InternetDomainName from host  topPrivateDomain   toString         catch  IllegalStateException   IllegalArgumentException e           domainName  getDomain url true             return domainName      getDomain   can be any common method with regex

User · Answer

I wrote a method  see below  which extracts a url s domain name and which uses simple String matching  What it actually does is extract the bit between the first        or index 0 if there s no       contained  and the first subsequent      or index String length   if there s no subsequent       The remaining  preceding  www       bit is chopped off  I m sure there ll be cases where this won t be good enough but it should be good enough in most cases   Mike Samuel s post above says that the java net URI class could do this  and was preferred to the java net URL class  but I encountered problems with the URI class  Notably  URI getHost   gives a null value if the url does not include the scheme  i e  the  http s   bit          Extracts the domain name from   code url     by means of String manipulation    rather than using the   link URI  or   link URL  class         param url is non-null      return the domain name within   code url       public String getUrlDomainName String url      String domainName   new String url      int index   domainName indexOf            if  index    -1           keep everything after the           domainName   domainName substring index   3          index   domainName indexOf          if  index    -1           keep everything before the         domainName   domainName substring 0  index             check for and remove a preceding  www       followed by any sequence of characters  non-greedy       followed by a          from the beginning of the string   domainName   domainName replaceFirst   www                 return domainName

User · Answer

If the input url is user input  this method gives the most appropriate host name  if not found gives back the input url   private String getHostName String urlInput            urlInput   urlInput toLowerCase            String hostName urlInput          if  urlInput equals                   if urlInput startsWith  http      urlInput startsWith  https                     try                      URL netUrl   new URL urlInput                       String host  netUrl getHost                        if host startsWith  www                             hostName   host substring  www  length   1                        else                          hostName host                                         catch  MalformedURLException e                       hostName urlInput                                 else if urlInput startsWith  www                     hostName urlInput substring  www  length   1                             return  hostName           else              return

User · Answer

val host   url split  quot   quot   2

User · Answer

I made a small treatment after the URI object creation   if  url startsWith  http                if   url contains  http                     url   url replaceAll  http      http                       else           url    http       url            URI uri   new URI url       String domain   uri getHost        return domain startsWith  www      domain substring 4    domain

[java] Get domain name from given url

Examples related to java

Examples related to url