Detect URLs in text with JavaScript

Question

Does anyone have suggestions for detecting URLs in a set of strings   arrayOfStrings forEach function string        detect URLs in strings and do something swell       like creating elements with links        Update  I wound up using this regex for link detection    Apparently several years later   kLINK DETECTION REGEX       a-z            a-z0-9 -        a-z  2  aero arpa biz com coop edu gov info int jobs mil museum name nato net org pro travel local internal     0-9  1 5       a-z0-9  -            a-z0-9  -         a-z0-9   -     amp amp          a-zA-Z0-9   amp        -             s     gi   The full helper  with optional Handlebars support  is at gist  1654670

User · Answer

Generic Object Oriented Solution  For people like me that use frameworks like angular that don t allow manipulating DOM directly  I created a function that takes a string and returns an array of url plainText objects that can be used to create any UI representation that you want   URL regex  For URL matching I used  slightly adapted  h0mayun regex         https           www       s   g  My function also drops punctuation characters from the end of a URL like   and   that I believe more often will be actual punctuation than a legit URL ending  but it could be  This is not rigorous science as other answers explain well  For that I apply the following regex onto matched URLs                        Typescript code      export function urlMatcherInText inputString  string   UrlMatcherResult             if    inputString  return              const results  UrlMatcherResult                  function addText text  string                if    text  return               const result   new UrlMatcherResult                result type    text               result value   text              results push result                      function addUrl url  string                if    url  return               const result   new UrlMatcherResult                result type    url               result value   url              results push result                      const findUrlRegex          https           www       s   g          const cleanUrlRegex                                  let match  RegExpExecArray          let indexOfStartOfString   0           do               match   findUrlRegex exec inputString                if  match                    const text   inputString substr indexOfStartOfString  match index - indexOfStartOfString                   addText text                    var dirtyUrl   match 0                   var urlDirtyMatch   cleanUrlRegex exec dirtyUrl                   addUrl urlDirtyMatch 1                    addText urlDirtyMatch 2                     indexOfStartOfString   match index   dirtyUrl length                                  while  match            const remainingText   inputString substr indexOfStartOfString  inputString length - indexOfStartOfString           addText remainingText            return results             export class UrlMatcherResult           public type   url     text          public value  string

User · Answer

let str    https   example com is a great site  str replace   https         s    g   lt a href   1  target   blank   gt  1 lt  a gt      Short Code Big Work      Result -    lt a href  https   example com  target   blank   gt  https   example com  lt  a gt

User · Answer

Function can be further improved to render images as well   function renderHTML text         var rawText   strip text      var urlRegex     b https  ftp file       -A-Z0-9  amp                   -A-Z0-9  amp             ig          return rawText replace urlRegex  function url            if     url indexOf   jpg    gt  0        url indexOf   png    gt  0        url indexOf   gif    gt  0                   return   lt img src      url      gt       lt br  gt             else               return   lt a href      url      gt     url     lt  a gt       lt br  gt                          or for a thumbnail image that links to fiull size image   return   lt a href      url      gt  lt img style  width  100px  border  0px  -moz-border-radius  5px  border-radius  5px   src      url      gt       lt  a gt       lt br  gt     And here is the strip   function that pre-processes the text string for uniformity by removing any existing html   function strip html                   var tmp   document createElement  DIV             tmp innerHTML   html           var urlRegex     b https  ftp file       -A-Z0-9  amp                   -A-Z0-9  amp             ig             return tmp innerText replace urlRegex  function url                 return   n    url

User · Answer

There is existing npm package  url-regex  just install it with yarn add url-regex or npm install url-regex and use as following   const urlRegex   require  url-regex     const replaced    Find me at http   www example com and also at http   stackoverflow com or at google com     replace urlRegex  strict  false    function url         return   lt a href      url      gt     url     lt  a gt

User · Answer

You can use a regex like this to extract normal url patterns    https         www      www   a-zA-Z0-9  a-zA-Z0-9-   a-zA-Z0-9      s  2   www   a-zA-Z0-9  a-zA-Z0-9-   a-zA-Z0-9      s  2   https         www      www   a-zA-Z0-9       s  2   www   a-zA-Z0-9       s  2      If you need more sophisticated patterns  use a library like this   https   www npmjs com package pattern-dreamer

User · Answer

I googled this problem for quite a while  then it occurred to me that there is an Android method  android text util Linkify  that utilizes some pretty robust regexes to accomplish this  Luckily  Android is open source   They use a few different patterns for matching different types of urls  You can find them all here  http   grepcode com file repository grepcode com java ext com google android android 2 0 r1 android text util Regex java Regex 0WEB URL PATTERN  If you re just concerned about url s that match the WEB URL PATTERN  that is  urls that conform to the RFC 1738 spec  you can use this         http https Http Https rtsp Rtsp             a-zA-Z0-9   -                        amp           a-fA-F0-9  2    1 64          a-zA-Z0-9   -                        amp           a-fA-F0-9  2    1 25                 a-zA-Z0-9  a-zA-Z0-9 -  0 64           aero arpa asia a cdefgilmnoqrstuwxz      biz b abdefghijmnorstvwyz      cat com coop c acdfghiklmnoruvxyz   d ejkmoz     edu e cegrstu   f ijkmor     gov g abdefghilmnpqrstuwy   h kmnrtu     info int i delmnoqrst      jobs j emop   k eghimnrwyz  l abcikrstuvy     mil mobi museum m acdghklmnopqrstuvwxyz      name net n acefgilopruz      org om     pro p aefghklmnrstwy   qa r eouw  s abcdeghijklmnortuvyz     tel travel t cdfghjklmnoprtvwz   u agkmsyz  v aceginu  w fs  y etu  z amw          25 0-5  2 0-4  0-9   0-1  0-9  2   1-9  0-9   1-9       25 0-5  2 0-4  0-9   0-1  0-9  2   1-9  0-9   1-9  0      25 0-5  2 0-4  0-9   0-1  0-9  2   1-9  0-9   1-9  0      25 0-5  2 0-4  0-9   0-1  0-9  2   1-9  0-9   0-9          d 1 5              a-zA-Z0-9            amp        -                           a-fA-F0-9  2          b    gi    Here is the full text of the source         http https Http Https rtsp Rtsp               a-zA-Z0-9     -                                         amp             a-fA-F0-9  2    1 64           a-zA-Z0-9     -                                         amp             a-fA-F0-9  2    1 25                       a-zA-Z0-9  a-zA-Z0-9  -  0 64             named host              plus top level domain       aero arpa asia a cdefgilmnoqrstuwxz           biz b abdefghijmnorstvwyz           cat com coop c acdfghiklmnoruvxyz        d ejkmoz          edu e cegrstu        f ijkmor          gov g abdefghilmnpqrstuwy        h kmnrtu          info int i delmnoqrst           jobs j emop        k eghimnrwyz       l abcikrstuvy          mil mobi museum m acdghklmnopqrstuvwxyz           name net n acefgilopruz           org om          pro p aefghklmnrstwy        qa      r eouw       s abcdeghijklmnortuvyz          tel travel t cdfghjklmnoprtvwz        u agkmsyz       v aceginu       w fs       y etu       z amw               25 0-5  2 0-4      or ip address     0-9   0-1  0-9  2   1-9  0-9   1-9        25 0-5  2 0-4  0-9        0-1  0-9  2   1-9  0-9   1-9  0       25 0-5  2 0-4  0-9   0-1       0-9  2   1-9  0-9   1-9  0       25 0-5  2 0-4  0-9   0-1  0-9  2        1-9  0-9   0-9                 d 1 5         plus option port number               a-zA-Z0-9                  amp                plus option query params      -                                     a-fA-F0-9  2                b        If you want to be really fancy  you can test for email addresses as well  The regex for email addresses is     a-zA-Z0-9              -  1 256     a-zA-Z0-9  a-zA-Z0-9  -  0 64      a-zA-Z0-9  a-zA-Z0-9  -  0 25    gi   PS  The top level domains supported by above regex are current as of June 2007  For an up to date list you ll need to check https   data iana org TLD tlds-alpha-by-domain txt

User · Answer

First you need a good regex that matches urls  This is hard to do  See here  here and here         almost anything is a valid URL   There   are some punctuation rules for   splitting it up   Absent any   punctuation  you still have a valid   URL       Check the RFC carefully and see if you   can construct an  invalid  URL   The   rules are very flexible         For example       is a valid URL     The path is           A pretty   stupid filename  but a valid filename       Also        is a valid URL   The   netloc   hostname   is      The path   is         Again  stupid   Also   valid   This URL normalizes to         which is the equivalent       Something like  bad      worse         is perfectly valid   Dumb but valid    Anyway  this answer is not meant to give you the best regex but rather a proof of how to do the string wrapping inside the text  with JavaScript   OK so lets just use this one    https         s    g  Again  this is a bad regex  It will have many false positives  However it s good enough for this example    x000D   x000D  function urlify text    x000D    var urlRegex     https         s    g  x000D    return text replace urlRegex  function url    x000D      return   lt a href      url      gt     url     lt  a gt    x000D       x000D       or alternatively x000D       return text replace urlRegex    lt a href   1  gt  1 lt  a gt    x000D    x000D   x000D  var text    Find me at http   www example com and also at http   stackoverflow com   x000D  var html   urlify text   x000D   x000D  console log html  x000D   x000D   x000D       html now looks like      Find me at  lt a href  http   www example com  gt http   www example com lt  a gt  and also at  lt a href  http   stackoverflow com  gt http   stackoverflow com lt  a gt     So in sum try        pad dl dd   each function element        element innerHTML   urlify element innerHTML

User · Answer

try this   function isUrl s        if   isUrl rx url               taken from https   gist github com dperini 729294         isUrl rx url         https  ftp            S      S               10 127       d 1 3   3        169  254 192  168       d 1 3   2     172     1 6-9  2 d 3 0-1        d 1 3   2      1-9  d  1 d d 2 01  d 22 0-3          1  d 1 2  2 0-4  d 25 0-5    2          1-9  d  1 d d 2 0-4  d 25 0-4           a-z u00a1- uffff0-9 -    a-z u00a1- uffff0-9            a-z u00a1- uffff0-9 -    a-z u00a1- uffff0-9             a-z u00a1- uffff  2             d 2 5            S     i             valid prefixes         isUrl prefixes   http         https         ftp         www                taken from https   w3techs com technologies overview top level domain all         isUrl domains   com   ru   net   org   de   jp   uk   br   pl   in   it   fr   au   info   nl   ir   cn   es   cz   kr   ua   ca   eu   biz   za   gr   co   ro   se   tw   mx   vn   tr   ch   hu   at   be   dk   tv   me   ar   no   us   sk   xyz   fi   id   cl   by   nz   il   ie   pt   kz   io   my   lt   hk   cc   sg   edu   pk   su   bg   th   top   lv   hr   pe   club   rs   ae   az   si   ph   pro   ng   tk   ee   asia   mobi               if   isUrl rx url test s   return false      for  let i 0  i lt isUrl prefixes length  i    if  s startsWith isUrl prefixes i    return true      for  let i 0  i lt isUrl domains length  i    if  s endsWith     isUrl domains i      s includes     isUrl domains i          s includes     isUrl domains i        return true      return false     function isEmail s        if   isEmail rx email               taken from http   stackoverflow com a 16016476 460084         var sQtext        x0d  x22  x5c  x80-  xff            var sDtext        x0d  x5b-  x5d  x80-  xff            var sAtom        x00-  x20  x22  x28  x29  x2c  x2e  x3a-  x3c  x3e  x40  x5b-  x5d  x7f-  xff             var sQuotedPair      x5c   x00-  x7f            var sDomainLiteral      x5b     sDtext         sQuotedPair        x5d           var sQuotedString      x22     sQtext         sQuotedPair        x22           var sDomain ref   sAtom          var sSubDomain         sDomain ref         sDomainLiteral                var sWord         sAtom         sQuotedString                var sDomain   sSubDomain       x2e    sSubDomain                 var sLocalPart   sWord       x2e    sWord                 var sAddrSpec   sLocalPart      x40    sDomain     complete RFC822 email address spec         var sValidEmail         sAddrSpec           as whole string          isEmail rx email   new RegExp sValidEmail              return isEmail rx email test s       will also recognize urls such as  google com   http   www google bla    http   google bla   www google bla but not google bla

User · Answer

If you want to detect links with http    OR without http    OR ftp OR other possible cases like removing trailing punctuation at the end  take a look at this code   https   jsfiddle net AndrewKang xtfjn8g3   A simple way to use that is to use NPM  npm install --save url-knife

User · Answer

Based on Crescent Fresh answer  if you want to detect links with http    OR without http    and by www  you can use the following  function urlify text        var urlRegex       https         www       s    g        var urlRegex     https         s    g      return text replace urlRegex  function url b c            var url2    c     www        http      url   url          return   lt a href     url2     target   blank  gt     url     lt  a gt

User · Answer

Here is what I ended up using as my regex   var urlRegex     b https  ftp file       -A-Z0-9  amp                   -A-Z0-9  amp             ig    This doesn t include trailing punctuation in the URL  Crescent s function works like a charm     so   function linkify text        var urlRegex     b https  ftp file       -A-Z0-9  amp                   -A-Z0-9  amp             ig      return text replace urlRegex  function url            return   lt a href      url      gt     url     lt  a gt

User · Answer

tmp innerText is undefined  You should use tmp innerHTML  function strip html                   var tmp   document createElement  DIV             tmp innerHTML   html           var urlRegex     b https  ftp file       -A-Z0-9  amp                   -A-Z0-9  amp             ig             return tmp innerHTML  replace urlRegex  function url                 return   n    url

User · Answer

This library on NPM looks like it is pretty comprehensive https   www npmjs com package linkifyjs     Linkify is a small yet comprehensive JavaScript plugin for finding URLs in plain-text and converting them to HTML links  It works with all valid URLs and email addresses

[javascript] Detect URLs in text with JavaScript

Examples related to javascript

Examples related to regex

Examples related to url