Getting title and meta tags from external website

Question

I want to try figure out how to get the   lt title gt A common title lt  title gt   lt meta name  keywords  content  Keywords blabla    gt   lt meta name  description  content  This is the description    gt    Even though if it s arranged in any order  I ve heard of the PHP Simple HTML DOM Parser but I don t really want to use it  Is it possible for a solution except using the PHP Simple HTML DOM Parser    preg match will not be able to do it if it s invalid HTML   Can cURL do something like this with preg match   Facebook does something like this but it s properly used by using    lt meta property  og description  content  Description blabla    gt    I want something like this so that it is possible when someone posts a link  it should retrieve the title and the meta tags  If there are no meta tags  then it it ignored or the user can set it themselves  but I ll do that later on myself

User · Answer

Php s native function  get meta tags    http   php net manual en function get-meta-tags php

User · Answer

As it was already said  this can handle the problem    url  http   stackoverflow com questions 3711357 get-title-and-meta-tags-of-external-site 4640613    meta get meta tags  url   echo  title  meta  title       php - Get Title and Meta Tags of External site - Stack Overflow

User · Answer

This is the way it should be   function file get contents curl  url         ch   curl init         curl setopt  ch  CURLOPT HEADER  0       curl setopt  ch  CURLOPT RETURNTRANSFER  1       curl setopt  ch  CURLOPT URL   url       curl setopt  ch  CURLOPT FOLLOWLOCATION  1         data   curl exec  ch       curl close  ch        return  data      html   file get contents curl  http   example com        parsing begins here   doc   new DOMDocument      doc- gt loadHTML  html    nodes    doc- gt getElementsByTagName  title       get and display what you need   title    nodes- gt item 0 - gt nodeValue    metas    doc- gt getElementsByTagName  meta     for   i   0   i  lt   metas- gt length   i           meta    metas- gt item  i       if  meta- gt getAttribute  name       description            description    meta- gt getAttribute  content        if  meta- gt getAttribute  name       keywords            keywords    meta- gt getAttribute  content       echo  Title   title     lt br  gt  lt br  gt    echo  Description   description     lt br  gt  lt br  gt    echo  Keywords   keywords

User · Answer

lt  php    Assuming the above tags are at www example com  tags   get meta tags  http   www example com         Notice how the keys are all lowercase now  and    how   was replaced by   in the key  echo  tags  author             name echo  tags  keywords           php documentation echo  tags  description        a php manual echo  tags  geo position       49 33 -86 59   gt

User · Answer

Easy and php s in-built function   http   php net manual en function get-meta-tags php

User · Answer

get meta tags did not work with title   Only meta tags with name attributes like    lt meta name  description  content  the description  gt    will be parsed

User · Answer

Improved answer from  shamittomar above to get the meta tags  or the specified one from html source   Can be improved further    the difference from php s default get meta tags is that it works even when there is unicode string  function getMetaTags  html   name   null         doc   new DOMDocument        try             doc- gt loadHTML  html         catch  Exception  e                 metas    doc- gt getElementsByTagName  meta          data           for   i   0   i  lt   metas- gt length   i                   meta    metas- gt item  i            if   empty  meta- gt getAttribute  name                      will ignore repeating meta tags                 data  meta- gt getAttribute  name       meta- gt getAttribute  content                         if   empty  name             return  empty  data  name      data  name    false             return  data

User · Answer

get meta tags will help you with all but the title  To get the title just use a regex    url    http   some url com   preg match    lt title gt      lt   title gt  siU   file get contents  url    matches    title    matches 1     Hope that helps

User · Answer

lt  php      ------------------------------------------------------   function curl get contents  url          timeout   5        useragent    Mozilla 5 0  Windows NT 6 1  WOW64  rv 27 0  Gecko 20100101 Firefox 27 0          ch   curl init         curl setopt  ch  CURLOPT URL   url        curl setopt  ch  CURLOPT USERAGENT   useragent        curl setopt  ch  CURLOPT RETURNTRANSFER  1        curl setopt  ch  CURLOPT CONNECTTIMEOUT   timeout         data   curl exec  ch        curl close  ch         return  data         ------------------------------------------------------   function fetch meta tags  url           html   curl get contents  url         mdata   array           doc   new DOMDocument         doc- gt loadHTML  html         titlenode    doc- gt getElementsByTagName  title          title    titlenode- gt item 0 - gt nodeValue        metanodes    doc- gt getElementsByTagName  meta         foreach  metanodes as  node          key    node- gt getAttribute  name          val    node- gt getAttribute  content         if   empty  key      mdata  key     val                 res   array  url   title   mdata         return  res        ------------------------------------------------------     gt

User · Answer

Unfortunately  the built in php function get meta tags   requires the name parameter  and certain sites  such as twitter leave that off in favor of the property attribute   This function  using a mix of regex and dom document  will return a keyed array of metatags from a webpage   It checks for the name parameter  then the property parameter   This has been tested on instragram  pinterest and twitter          Extract metatags from a webpage     function extract tags from url  url       tags   array        ch   curl init      curl setopt  ch  CURLOPT HEADER  0     curl setopt  ch  CURLOPT RETURNTRANSFER  1     curl setopt  ch  CURLOPT URL   url     curl setopt  ch  CURLOPT FOLLOWLOCATION  1       contents   curl exec  ch     curl close  ch      if  empty  contents         return  tags         if  preg match all    lt meta    gt    content      gt     gt      contents   matches          doc   new DOMDocument         doc- gt loadHTML   lt  xml encoding  utf-8    gt     implode  matches 0          tags   array        foreach  doc- gt getElementsByTagName  meta   as  metaTag          if  metaTag- gt getAttribute  name                     tags  metaTag- gt getAttribute  name       metaTag- gt getAttribute  content                  elseif   metaTag- gt getAttribute  property                     tags  metaTag- gt getAttribute  property       metaTag- gt getAttribute  content                         return  tags

User · Answer

A simple function to understand how to retrieve og tags  title and description  adapt this for yourself  function read og tags as json  url          ch   curl init         curl setopt  ch  CURLOPT HEADER  0       curl setopt  ch  CURLOPT RETURNTRANSFER  1       curl setopt  ch  CURLOPT URL   url       curl setopt  ch  CURLOPT FOLLOWLOCATION  1         HTML DOCUMENT   curl exec  ch       curl close  ch         doc   new DOMDocument         doc- gt loadHTML  HTML DOCUMENT           fecth  lt title gt       res  title      doc- gt getElementsByTagName  title  - gt item 0 - gt nodeValue          fetch og tags     foreach   doc- gt getElementsByTagName  meta   as  m                  if had property           if   m- gt getAttribute  property                      prop    m- gt getAttribute  property                      here search only og tags               if  preg match   og  i    prop                           get results on an array - gt  nice for templating                    res  og tags                         array   property    gt   m- gt getAttribute  property                               content    gt   m- gt getAttribute  content                                                end if had property               fetch  lt meta name  description       gt            if   m- gt getAttribute  name       description                   res  description      m- gt getAttribute  content                                end foreach         render JSON     echo json encode  res  JSON PRETTY PRINT       JSON UNESCAPED UNICODE   JSON UNESCAPED SLASHES        Return for this page  may have more infos             title    php - Getting title and meta tags from external website - Stack Overflow        og tags                            property    og type                content    website                                    property    og url                content    https   stackoverflow com questions 3711357 getting-title-and-meta-tags-from-external-website                                    property    og site name                content    Stack Overflow                                    property    og image                content    https   cdn sstatic net Sites stackoverflow Img apple-touch-icon 2 png v 73d79a89bded                                    property    og title                content    Getting title and meta tags from external website                                    property    og description                content    I want to try figure out how to get the n n amp lt title amp gt A common title amp lt  title amp gt  n amp lt meta name   keywords   content   Keywords blabla     amp gt  n amp lt meta name   description   content   This is the descript

User · Answer

We use Apache Tika via php  command line utility  with -j for json    http   tika apache org    lt  php     shell exec   java -jar tika-app-1 4 jar -j http   www guardian co uk politics 2013 jul 21 tory-strategist-lynton-crosby-lobbying       gt    This is a sample output from a random guardian article          Content-Encoding   UTF-8       Content-Length  205599      Content-Type   text html  charset u003dUTF-8       DC date issued   2013-07-21       X-UA-Compatible   IE u003dEdge chrome u003d1       application-name   The Guardian       article author   http   www guardian co uk profile nicholaswatt       article modified time   2013-07-21T22 42 21 01 00       article published time   2013-07-21T22 00 03 01 00       article section   Politics       article tag           Lynton Crosby          Health policy          NHS          Health          Healthcare industry          Society          Public services policy          Lobbying          Conservatives          David Cameron          Politics          UK news          Business            content-id    politics 2013 jul 21 tory-strategist-lynton-crosby-lobbying       dc title   Tory strategist Lynton Crosby in new lobbying row   Politics   The Guardian       description   Exclusive  Firm he founded  Crosby Textor  advised private healthcare providers how to exploit NHS  u0027failings u0027       fb app id  180444840287      keywords   Lynton Crosby Health policy NHS Health Healthcare industry Society Public services policy Lobbying Conservatives David Cameron Politics UK news Business Politics       msapplication-TileColor    004983       msapplication-TileImage   http   static guim co uk static a314d63c616d4a06f5ec28ab4fa878a11a692a2a common images favicons windows tile 144 b png       news keywords   Lynton Crosby Health policy NHS Health Healthcare industry Society Public services policy Lobbying Conservatives David Cameron Politics UK news Business Politics       og description   Exclusive  Firm he founded  Crosby Textor  advised private healthcare providers how to exploit NHS  u0027failings u0027       og image   https   static-secure guim co uk sys-images Guardian Pix pixies 2013 7 21 1374433351329 Lynton-Crosby-008 jpg       og site name   the Guardian       og title   Tory strategist Lynton Crosby in new lobbying row       og type   article       og url   http   www guardian co uk politics 2013 jul 21 tory-strategist-lynton-crosby-lobbying       resourceName   tory-strategist-lynton-crosby-lobbying       title   Tory strategist Lynton Crosby in new lobbying row   Politics   The Guardian       twitter app id googleplay   com guardian       twitter app id iphone  409128287      twitter app name googleplay   The Guardian       twitter app name iphone   The Guardian       twitter app url googleplay   guardian   www guardian co uk politics 2013 jul 21 tory-strategist-lynton-crosby-lobbying       twitter card   summary large image       twitter site    guardian

User · Answer

Your best bet is to bite the bullet use the DOM Parser - it s the  right way  to do it  In the long run it ll save you more time than it takes to learn how  Parsing HTML with Regex is known to be unreliable and intolerant of special cases

User · Answer

I made this small composer package based on the top answer  https   github com diversen get-meta-tags   composer require diversen get-meta-tags   And then    use diversen meta    m   new meta        Simple usage  get s title  description  and keywords by default  ary    m- gt getMeta  https   github com diversen get-meta-tags    print r  ary       With more params  ary    m- gt getMeta  https   github com diversen get-meta-tags   array   description    keywords     timeout   10   print r  ary     It requires CURL and DOMDocument  as the top answer - and is built in the way  but has option for setting curl timeout  and for getting all kind of meta tags

User · Answer

Here is PHP simple DOM HTML Class two line code to get page META details    html   file get html  link    meat description    html- gt find  head meta name description    0 - gt content   meat keywords    html- gt find  head meta name keywords    0 - gt content

User · Answer

I ve got this working a different way and thought I d share it  Less code than others and found it here  I ve added a few things to make it load the page meta that you are on instead of a certain page  I wanted this to copy the default page title and description into the og tags automatically   For some reason though  whatever way  different scripts  I tried  the page loads super slow online but instant on wamp  Not sure why so I m probably going with a switch case since the site is not huge    lt  php  url    http   sitename com    SERVER  REQUEST URI     fp   fopen  url   r      content        while  feof  fp          buffer   trim fgets  fp  4096         content     buffer      start     lt title gt     end     lt   title gt     preg match    start     end s    content   match    title    match 1     metatagarray   get meta tags  url    description    metatagarray  description     echo   lt div gt  lt strong gt Title  lt  strong gt   title lt  div gt    echo   lt div gt  lt strong gt Description  lt  strong gt   description lt  div gt      gt    and in the HTML header   lt meta property  og title  content   lt  php echo  title    gt     gt   lt meta property  og description  content   lt  php echo  description    gt     gt

User · Answer

My solution  adapted from parts of cronoklee s  amp  shamittomar s posts  so I can call it from anywhere and get a JSON return  Can be easily parsed into any content    lt  php header  Content-type  application json  charset UTF-8     if   empty   GET  url           file get contents curl   GET  url       else       echo  No Valid URL Provided        function file get contents curl  url         ch   curl init         curl setopt  ch  CURLOPT HEADER  0       curl setopt  ch  CURLOPT RETURNTRANSFER  1       curl setopt  ch  CURLOPT URL   url       curl setopt  ch  CURLOPT FOLLOWLOCATION  1         data   curl exec  ch       curl close  ch        echo json encode getSiteOG  data   JSON PRETTY PRINT   JSON UNESCAPED UNICODE   JSON UNESCAPED SLASHES      function getSiteOG   OGdata        doc   new DOMDocument          doc- gt loadHTML  OGdata        res  title      doc- gt getElementsByTagName  title  - gt item 0 - gt nodeValue       foreach   doc- gt getElementsByTagName  meta   as  m            tag    m- gt getAttribute  name       m- gt getAttribute  property            if in array  tag   description   keywords       strpos  tag  og      0   res str replace  og       tag     utf8 decode  m- gt getAttribute  content                 return  res      gt

User · Answer

Shouldnt we be using OG   The chosen answer is good but doesn t work when a site is redirected  very common    and doesn t return OG tags  which are the new industry standard  Here s a little function which is a bit more usable in 2018  It tries to get OG tags and falls back to meta tags if it cant them   function getSiteOG   url   specificTags 0         doc   new DOMDocument          doc- gt loadHTML file get contents  url         res  title      doc- gt getElementsByTagName  title  - gt item 0 - gt nodeValue       foreach   doc- gt getElementsByTagName  meta   as  m            tag    m- gt getAttribute  name       m- gt getAttribute  property            if in array  tag   description   keywords       strpos  tag  og      0   res str replace  og       tag      m- gt getAttribute  content              return  specificTags  array intersect key   res  array flip  specificTags       res      How to use it                   SAMPLE USAGE  print r getSiteOG  http   www stackoverflow com       note the incorrect url                  OUTPUT  Array        title    gt  Stack Overflow - Where Developers Learn  Share   amp  Build Careers      description    gt  Stack Overflow is the largest  most trusted online community for developers to learn  share     their programming   knowledge  and build their careers       type    gt  website      url    gt  https   stackoverflow com       site name    gt  Stack Overflow      image    gt  https   cdn sstatic net Sites stackoverflow img apple-touch-icon 2 png v 73d79a89bded

User · Answer

Now a days  most of the sites add meta tags to their sites providing information about their site or any particular article page  Such as news or blog sites   I have created a Meta API which gives you required meta data ac like OpenGraph  Schema Org  etc   Check it out - https   api sakiv com docs

User · Answer

Get meta tags from url  php function example   function get meta tags   url             html   load content   url false               print r   html            preg match all     lt title gt      lt   title gt      html  content     title            preg match all     lt meta name   description   content            gt  i    html  content     description            preg match all     lt meta name   keywords   content            gt  i    html  content     keywords             res  content      array  title    gt   title 1  0    descritpion    gt   description 1  0    keywords    gt    keywords 1  0              res  msg      html  msg             return  res      Example   print r  get meta tags   bing com        Get Meta Tags php

User · Answer

If you re working with PHP  check out the Pear packages at pear php net and see if you find anything useful to you  I ve used the RSS packages effectively and it saves a lot of time  provided you can follow how they implement their code via their examples   Specifically take a look at Sax 3 and see if it will work for your needs  Sax 3 is no longer updated but it might be sufficient

[php] Getting title and meta tags from external website

Examples related to php

Examples related to curl

Examples related to title

Examples related to meta-tags