How to get page content using cURL

Question

I would like to scrape the content of this Google search result page using curl  I ve been trying setting different user agents  and setting other options but I just can t seem to get the content of that page  as I often get redirected or I get a  page moved  error   I believe it has something to do with the fact that the query string gets encoded somewhere but I m really not sure how to get around that          url is the same as the link above      ch   curl init         user agent  Mozilla 5 0  Windows NT 6 1  rv 8 0  Gecko 20100101 Firefox 8 0      curl setopt   ch  CURLOPT URL   url       curl setopt   ch  CURLOPT USERAGENT   user agent       curl setopt   ch  CURLOPT HEADER  0       curl setopt   ch  CURLOPT FOLLOWLOCATION  1       curl setopt   ch  CURLOPT RETURNTRANSFER  1       curl setopt   ch CURLOPT CONNECTTIMEOUT 120       curl setopt   ch CURLOPT TIMEOUT 120       curl setopt   ch CURLOPT MAXREDIRS 10       curl setopt   ch CURLOPT COOKIEFILE  cookie txt        curl setopt   ch CURLOPT COOKIEJAR  cookie txt        echo curl exec   ch     What do I need to do to get my php code to show the exact content of the page as I would see it on my browser  What am I missing  Can anyone point me to the right direction   I ve seen similar questions on SO  but none with an answer that could help me   EDIT   I tried to just open the link using the Selenium WebDriver  that gives the same results as cURL  I am still thinking that this has to do with the fact that there are special characters in the query string which are getting messed up somewhere in the process

User · Answer

For a realistic approach that emulates the most human behavior  you may want to add a referer in your curl options  You may also want to add a follow location to your curl options  Trust me  whoever said that cURLING Google results is impossible  is a complete dolt and should throw his her computer against the wall in hopes of never returning to the internetz again  Everything that you can do  IRL  with your own browser can all be emulated using PHP cURL or libCURL in Python  You just need to do more cURLS to get buff  Then you will see what I mean         url    http   www google com search q    strSearch   amp hl en amp start 0 amp sa N      ch   curl init      curl setopt  ch  CURLOPT REFERER   http   www example com 1      curl setopt  ch  CURLOPT HEADER  0     curl setopt  ch  CURLOPT VERBOSE  0     curl setopt  ch  CURLOPT RETURNTRANSFER  true     curl setopt  ch  CURLOPT USERAGENT   Mozilla 4 0  compatible        curl setopt  ch  CURLOPT URL  urlencode  url       response   curl exec  ch     curl close  ch

User · Answer

this is how                  Get a web file  HTML  XHTML  XML  image  etc   from a URL   Return an        array containing the HTTP server response header fields and content              function get web page   url                  user agent  Mozilla 5 0  Windows NT 6 1  rv 8 0  Gecko 20100101 Firefox 8 0             options   array               CURLOPT CUSTOMREQUEST    gt  GET            set request type post or get             CURLOPT POST             gt false           set to GET             CURLOPT USERAGENT        gt   user agent    set user agent             CURLOPT COOKIEFILE       gt  cookie txt     set cookie file             CURLOPT COOKIEJAR        gt  cookie txt     set cookie jar             CURLOPT RETURNTRANSFER   gt  true         return web page             CURLOPT HEADER           gt  false        don t return headers             CURLOPT FOLLOWLOCATION   gt  true         follow redirects             CURLOPT ENCODING         gt               handle all encodings             CURLOPT AUTOREFERER      gt  true         set referer on redirect             CURLOPT CONNECTTIMEOUT   gt  120          timeout on connect             CURLOPT TIMEOUT          gt  120          timeout on response             CURLOPT MAXREDIRS        gt  10           stop after 10 redirects                      ch        curl init   url            curl setopt array   ch   options             content   curl exec   ch             err       curl errno   ch             errmsg    curl error   ch             header    curl getinfo   ch            curl close   ch              header  errno        err           header  errmsg       errmsg           header  content      content          return  header          Example    Read a web page and check for errors    result   get web page   url     if    result  errno      0           error  bad url  timeout  redirect loop      if    result  http code      200           error  no page  no permissions  no service       page    result  content

User · Answer

Get content with Curl php   request server support Curl function  enable in httpd conf in folder Apache   function UrlOpener  url       global  output        ch   curl init          curl setopt  ch  CURLOPT URL   url         curl setopt  ch  CURLOPT RETURNTRANSFER  1          output   curl exec  ch         curl close  ch            echo  output     If get content by google cache use Curl you can use this url  http   webcache googleusercontent com search q cache Put your url Sample  http   urlopener mixaz net

User · Answer

I suppose that have you noticed that your link is actually an HTTPS link      It seems that CURL parameters do not include any kind of SSH handling    maybe this could be your problem  Why don t you try with a non-HTTPS link to see what happens  i e Google Custom Search Engine

User · Answer

Try This    url    http   www google com search q    strSearch   amp hl en amp start 0 amp sa N      ch   curl init      curl setopt  ch  CURLOPT HEADER  0     curl setopt  ch  CURLOPT VERBOSE  0     curl setopt  ch  CURLOPT RETURNTRANSFER  true     curl setopt  ch  CURLOPT USERAGENT   Mozilla 4 0  compatible        curl setopt  ch  CURLOPT URL  urlencode  url       response   curl exec  ch     curl close  ch

[php] How to get page content using cURL?

Examples related to php

Examples related to curl