Is it ok to scrape data from Google results

Question

I d like to fetch results from Google using curl to detect potential duplicate content  Is there a high risk of being banned by Google

User · Answer

Google thrives on scraping websites of the world   so if it was  so illegal  then even Google won t survive   of course other answers mention ways of mitigating IP blocks by Google  One more way to explore avoiding captcha could be scraping at random times  dint try    Moreover  I have a feeling  that if we provide novelty or some significant processing of data then it sounds fine at least to me   if we are simply copying a website   or hampering its business brand in some way   then it is bad and should be avoided  on top of it all   if you are a startup then no one will fight you as there is no benefit   but if your entire premise is on scraping even when you are funded then you should think of more sophisticated ways   alternative APIs  eventually  Also Google keeps releasing  or depricating   fields for its API so what you want to scrap now may be in roadmap of new Google API releases

User · Answer

Google will eventually block your IP when you exceed a certain amount of requests

User · Answer

Google disallows automated access in their TOS  so if you accept their terms you would break them  That said  I know of no lawsuit from Google against a scraper  Even Microsoft scraped Google  they powered their search engine Bing with it  They got caught in 2011 red handed    There are two options to scrape Google results  1  Use their API  UPDATE 2020  Google has reprecated previous APIs  again  and has new prices and new limits  Now  https   developers google com custom-search v1 overview  you can query up to 10k results per day at 1 500 USD per month  more than that is not permitted and the results are not what they display in normal searches    You can issue around 40 requests per hour You are limited to what they give you  it s not really useful if you want to track ranking positions or what a real user would see  That s something you are not allowed to gather   If you want a higher amount of API requests you need to pay   60 requests per hour cost 2000 USD per year  more queries require a custom deal    2  Scrape the normal result pages  Here comes the tricky part  It is possible to scrape the normal result pages  Google does not allow it  If you scrape at a rate higher than 8  updated from 15  keyword requests per hour you risk detection  higher than 10 h  updated from 20  will get you blocked from my experience  By using multiple IPs you can up the rate  so with 100 IP addresses you can scrape up to 1000 requests per hour   24k a day   updated  There is an open source search engine scraper written in PHP at http   scraping compunect com It allows to reliable scrape Google  parses the results properly and manages IP addresses  delays  etc  So if you can use PHP it s a nice kickstart  otherwise the code will still be useful to learn how it is done   3  Alternatively use a scraping service  updated   Recently a customer of mine had a huge search engine scraping requirement but it was not  ongoing   it s more like one huge refresh per month  In this case I could not find a self-made solution that s  economic   I used the service at http   scraping services instead  They also provide open source code and so far it s running well  several thousand resultpages per hour during the refreshes  The downside is that such a service means that your solution is  quot bound quot  to one professional supplier  the upside is that it was a lot cheaper than the other options I evaluated  and faster in our case  One option to reduce the dependency on one company is to make two approaches at the same time  Using the scraping service as primary source of data and falling back to a proxy based solution like described at 2  when required

[web-scraping] Is it ok to scrape data from Google results?

Examples related to web-scraping