Regular expression to find URLs within a string

Question

Does anyone know of a regular expression I could use to find URLs within a string  I ve found a lot of regular expressions on Google for determining if an entire string is a URL but I need to be able to search an entire string for URLs   For example  I would like to be able to find www google com and http   yahoo com in the following string   Hello www google com World http   yahoo com   I am not looking for specific URLs in the string  I am looking for ALL of the URLs in the string which is why I need a regular expression

User · Answer

It is just simple   Use this pattern   b  ftp https          w-       com net org gov mil int edu info me    d    d    d    d      d        w-        w    w     amp  w-         w-        It matches any link contains   Allowed Protocols  http  https and ftp  Allowed Domains    com    net    org    gov    mil    int    edu    info and   me   OR   IP  Allowed Ports  true  Allowed Parameters  true  Allowed Hashes  true

User · Answer

I used below regular expression to find url in a string     http https        a-zA-Z0-9 -       a-zA-Z  2 3     S

User · Answer

Using the regex provided by  JustinLevene did not have the proper escape sequences on the back-slashes  Updated to now be correct  and added in condition to match the FTP protocol as well  Will match to all urls with or without protocols  and with out without  www     Code     http ftp https           w -            w -         w        amp       -    w      amp      -     Example  https   regex101 com r uQ9aL4 65

User · Answer

I used this    https          a-zA-z0-9        a-zA-z0-9        a-zA-z0-9        -

User · Answer

text    quot  quot  quot The link of this question  https   stackoverflow com questions 6038061 regular-expression-to-find-urls-within-a-string Also there are some urls  www google com  facebook com  http   test com method param wasd  http   test com method param wasd amp params2 kjhdkjshd The code below catches all urls in text and returns urls in list  quot  quot  quot   urls   re findall        https  ftp          w  -          w  - amp          text  print urls   Output         https   stackoverflow com questions 6038061 regular-expression-to-find-urls-within-a-string         www google com         facebook com        http   test com method param wasd        http   test com method param wasd amp params2 kjhdkjshd

User · Answer

Wrote one up myself  let regex       w              w d-        w-         w           amp         w-        gm  It works on ALL of the following domains  https   www facebook com https   app-1 number123 com http   facebook com ftp   facebook com http   localhost 3000 localhost 3000  unitedkingdomurl co uk this is a url com its still going wow shop facebook org app number123 com app1 number123 com app-1 numbEr123 com app dashes-dash com www facebook com facebook com fb com hello 123 fb com hel-lo fb com hello goodbye fb com hello goodbye okay fb com hello goodbye okay alright Hello www google com World http   yahoo com https   www google com tr admin subPage qs1 sss1 amp qs2 sss2 amp qs3 sss3 Services https   google com tr test subPage qs1 sss1 amp qs2 sss2 amp qs3 sss3 Services http   google com test subPage qs1 sss1 amp qs2 sss2 amp qs3 sss3 Services ftp   google com test subPage qs1 sss1 amp qs2 sss2 amp qs3 sss3 Services www google com tr test subPage qs1 sss1 amp qs2 sss2 amp qs3 sss3 Services www google com test subPage qs1 sss1 amp qs2 sss2 amp qs3 sss3 Services drive google com test subPage qs1 sss1 amp qs2 sss2 amp qs3 sss3 Services https   www example pl http   www example com www example pl example com http   blog example com http   www example com product http   www example com products id 1 amp page 2 http   www example com up http   255 255 255 255 255 255 255 255 shop facebook org derf html  You can see how it performs here on regex101 and adjust as needed

User · Answer

Guess no regex is perfect for this use  I found a pretty solid one here         https  ftp file       www   ftp         -A-Z0-9  amp                      -A-Z0-9  amp                         -A-Z0-9  amp                      A-Z0-9  amp              igm   Some differences   advantages compared to the other ones posted here    It does not match email addresses It does match localhost 12345 It won t detect something like moo com without http or www   See here for examples

User · Answer

This is the one I use    http ftp https       w -            w -         w        amp      -    w      amp     -      Works for me  should work for you too

User · Answer

This is a simplest one  which work for me fine     http ftp https www          A-Za-z0-9-          a-z

User · Answer

This is a slight improvement on adjustment to  depending on what you need  Rajeev s answer     w -             s   dot   s  A-Z -         A-Z -         amp amp          A-Z -       amp amp         2 6    See here for an example of what it does and does not match  I got rid of the check for  quot http quot  etc as I wanted to catch url s without this  I added slightly to the regex to catch some obfuscated urls  i e  where user s use  dot  instead of a  quot   quot    Finally I replaced  quot  w quot  with  quot A-Z quot  to and  quot  2 3  quot  to reduce false positives like v2 0 and  quot moo 0dd quot   Any improvements on this welcome

User · Answer

IMPROVED  Detects Urls like these    https   www example pl  http   www example com www example pl example com http   blog example com http   www example com product http   www example com products id 1 amp page 2 http   www example com up http   255 255 255 255 255 255 255 255 http    www site com 8008   Regex        http s           w -         w  -      w -                amp                  gm

User · Answer

I use the logic of finding text between two dots or periods  the regex below works fine with python     lt

User · Answer

If you have the url pattern  you should be able to search for it in your string  Just make sure that the pattern doesnt have   and   marking beginning and end of the url string  So if P is the pattern for URL  look for matches for P

User · Answer

Matching a URL in a text should not be so complex           ftp http  s        www               n     https   regex101 com r wewpP1 2

User · Answer

A probably too simplistic  but working method might be    localhost http https ftp file       w S             I tested it on Python and as long as the string parsing contains a space before and after and none in the url  which I have never seen before  it should be fine   Here is an online ide demonstrating it  However here are some benefits of using it     It recognises file  and localhost as well as ip addresses It will never match without them It does not mind unusual characters such as   or -  see url of this post

User · Answer

Here a little bit more optimized regexp          https  ftp file       www   ftp       w -          s   dot   s  A-Z -         A-Z -         amp amp           A-Z -       amp amp          2 6     Here is test with data  https   regex101 com r sFzzpY 6

User · Answer

None of the solutions provided here solved the problems use-cases I had   What I have provided here  is the best I have found made so far  I will update it when I find new edge-cases that it doesn t handle    b    Word cannot begin with special characters      lt        amp  -      Protocols are optional  but take them with us if they are present      lt protocol gt  w 2 10            Domains have to be of a length of 1 chars or greater        w   amp    d 1 5     -        The domain ending has to be between 2 to 15 characters        a-z  2 15           If no domain ending we want a port  only if a protocol is specified            protocol       d 1 6           b  Word cannot end with    made to catch emails           We accept any number of slugs  given we have a char after the slash        If we have endings like   fds include the ending       w d   -      amp                 w d   -      amp             The last char cannot be one of these symbols      - exclude these    lt       -

User · Answer

If you have to be strict on selecting links  I would go for     i  b     a-z   w-        1 3   a-z0-9    www d 0 3      a-z0-9  -      a-z  2 4         s   lt  gt          s   lt  gt          s   lt  gt                     s   lt  gt          s   lt  gt              s                 lt  gt                        For more infos  read this   An Improved Liberal  Accurate Regex Pattern for Matching URLs

User · Answer

I think this regex pattern handle precisely what you want     http https ftp ftps        a-zA-Z0-9 -       a-zA-Z  2 3     S       and this is an snippet example  to extract Urls       The Regular Expression filter  reg exUrl      http https ftp ftps        a-zA-Z0-9 -       a-zA-Z  2 3     S           The Text you want to filter for urls  text    The text you want  https   stackoverflow com questions 6038061 regular-expression-to-find-urls-within-a-string to filter goes here        Check if there is a url in the text preg match all  reg exUrl   text   url  matches   var dump  matches

User · Answer

All of the above answers are not match for Unicode characters in URL  for example  http   google com query d c filan d   search  For the solution  this one should work    ftp      www   https        1  a-zA-Z0-9u00a1- uffff0-  2     a-zA-Z0-9u00a1- uffff0-  2    S

User · Answer

Short and simple  I have not tested in javascript code yet but It looks it will work     http ftp https            w -        w       Code on regex101 com

User · Answer

I have utilize c  Uri class and it works  well with IP Address  localhost   public static bool CheckURLIsValid string url                Uri returnURL          return  Uri TryCreate url  UriKind Absolute  out returnURL              amp  amp   returnURL Scheme    Uri UriSchemeHttp    returnURL Scheme    Uri UriSchemeHttps

User · Answer

I liked Stefan Henze  s solution but it would pick up 34 56  Its too general and I have unparsed html  There are 4 anchors for a url  www   http    and co      followed by letters and then     or letters   and one of these  https   ftp isc org www survey reports current bynum txt   I used lots of info from this thread  Thank you all   quot      http ftp https gopher telnet file localhost           www      xn--   1     w -              w -          w        amp        -     w      amp       -          w -  2 200             w -              w -         w        amp        -     w      amp       -           org com net edu gov mil int arpa biz info unknown one ninja network host coop tech   jp br it cn mx ar nl pl ru tr tw za be uk eg es fi pt th nz cz hu gr dk il sg uy lt ua ie ir ve kz ec rs sk py bg hk eu ee md is my lv gt pk ni by ae kr su vn cy am ke            ttp tp ttps           ww      n--    quot   Above solves just about everything except a string like  quot eurls www google com facebook com http   test com  quot   which it returns as a single string  Tbh idk why I added gopher etc  Proof R code if T     wierdurl lt -vector     wierdurl 1  lt - quot https   JP     jp dir1     quot    wierdurl 2  lt - quot xn--jp-cd2fp15c xn--fsq jp  quot    wierdurl 3  lt - quot http   52 221 161 242 2018 11 23 biofourmis-collab quot    wierdurl 4  lt - quot https   12000 org   quot    wierdurl 5  lt - quot   https   vg-1 com  page id 1002  quot    wierdurl 6  lt - quot https   3dnews ru 822878 quot    wierdurl 7  lt - quot The link of this question  https   stackoverflow com questions 6038061 regular-expression-to-find-urls-within-a-string   Also there are some urls  www google com  facebook com  http   test com method param wasd   The code below catches all urls in text and returns urls in list   quot    wierdurl 8  lt - quot Thelinkofthisquestion https   stackoverflow com questions 6038061 regular-expression-to-find-urls-within-a-string   Alsotherearesomeurls www google com facebook com http   test com method param wasd   Thecodebelowcatchesallurlsintextandreturnsurlsinlist   quot    wierdurl 9  lt - quot Thelinkofthisquestion https   stackoverflow com questions 6038061 regular-expression-to-find-urls-within-a-stringAlsotherearesomeurlsZwww google com facebook com http   test com method param wasdThecodebelowcatchesallurlsintextandreturnsurlsinlist  quot    wierdurl 10  lt - quot 1facebook com 1res quot    wierdurl 11  lt - quot 1facebook com 1res wat txt quot    wierdurl 12  lt - quot www e  quot    wierdurl 13  lt - quot is this the file txt i need quot    wierdurl 14  lt - quot xn--jp-cd2fp15c xn--fsq jpinspiredby  quot    wierdurl 15  lt - quot  xn--jp-cd2fp15c xn--fsq jp inspiredby  quot    wierdurl 16  lt - quot xnto--jpto-cd2fp15c xnto--fsq jpinspiredby  quot    wierdurl 17  lt - quot fsety--fwdvg-gertu56 ffuoiw--ffwsx 3dinspiredby  quot    wierdurl 18  lt - quot    3dnews ru 822878  quot    wierdurl 19  lt - quot  http   mywebsite com msn co uk  quot    wierdurl 20  lt - quot  2 0http   www abe hip  quot    wierdurl 21  lt - quot www abe hip quot    wierdurl 22  lt - quot hardware software data quot    regexstring lt -vector     regexstring 2  lt - quot  http ftp https        w -              w -          w        amp      -     w      amp     -    quot    regexstring 3  lt - quot        https  ftp file         www    ftp           -A-Z0-9  amp                        -A-Z0-9  amp                           -A-Z0-9  amp                        A-Z0-9  amp               igm quot    regexstring 4  lt - quot  a-zA-Z0-9 u00A0- uD7FF uF900- uFDCF uFDF0- uFFEF   quot    regexstring 5  lt - quot   http ftp https                w -              w -          w        amp      -     w      amp     -    quot    regexstring 6  lt - quot   http ftp https              w -              w -          w        amp        -     w      amp       -    quot    regexstring 7  lt - quot  http ftp https              w -              w -          w        amp      -     w      amp     -    quot    regexstring 8  lt - quot       https  ftp file         www    ftp           -A-Z0-9  amp                      -A-Z0-9  amp                         -A-Z0-9  amp                      A-Z0-9  amp             quot    regexstring 10  lt - quot   http s   ftp                    s          w           w  -           s             w  -     quot    regexstring 12  lt - quot http s       alnum       quot    regexstring 9  lt - quot http s       alnum       quot   in DLpages 230   regexstring 1  lt - quot    alnum  -        alnum              quot   in link graphs 50   regexstring 13  lt - quot     mailto        http https ftp          S       S                1-9   d  1  d  d 2 01   d 22 0-3           1   d 1 2  2 0-4   d 25 0-5    2           0-9   d  1  d  d 2 0-4   d 25 0-4           a-z  u00a1-  uffff0-9  -    a-z  u00a1-  uffff0-9             a-z  u00a1-  uffff0-9  -    a-z  u00a1-  uffff0-9              a-z  u00a1-  uffff  2      localhost       d 2 5                   s      quot    regexstring 14  lt - quot      http ftp https           www      xn--   1     w -              w -          w        amp        -     w      amp       -          w -              w -             org com net edu gov mil int      alpha   2                        w        amp        -     w      amp       -             ttp tp ttps           ww      n--    quot    regexstring 15  lt - quot      http ftp https gopher telnet file localhost           www      xn--   1     w -              w -          w        amp        -     w      amp       -          w -  2 200             w -              w -         w        amp        -     w      amp       -           org com net edu gov mil int arpa biz info unknown one ninja network host coop tech   jp br it cn mx ar nl pl ru tr tw za be uk eg es fi pt th nz cz hu gr dk il sg uy lt ua ie ir ve kz ec rs sk py bg hk eu ee md is my lv gt pk ni by ae kr su vn cy am ke            ttp tp ttps           ww      n--    quot         for i in wierdurl   c 7 22    for c in regexstring c 15          print paste i which regexstring  c        print str extract all i c

User · Answer

vnc s3 ssh scp sftp ftp http https         w           d 0 5      mailto      w        w       If you want an explanation of each part  try in regexr   com where you will get a great explanation of every character   This is split by an     or  OR  because not all useable URI have      so this is where you can create a list of schemes as or conditions that you would be interested in matching

User · Answer

I use this Regex       w       S     w       w  S      s     ig   It works fine for many URLs  like  http   google com  https   dev-site io 8080 home val 1 amp count 100  www regexr com  localhost 8080 path

User · Answer

I found this which covers most sample links  including subdirectory parts   Regex is          https  ftp        b    a-z d               s   lt  gt            s   lt  gt            s   lt  gt                       s   lt  gt              s   lt  gt              s                 lt  gt

User · Answer

This is the best one   NSString  urlRegex   http ftp https www gopher telnet file            w -              w -            w        amp      -     w      amp     -

[regex] Regular expression to find URLs within a string

Examples related to regex

Examples related to string

Examples related to url