Python check if website exists

Question

I wanted to check if a certain website exists  this is what I m doing   user agent    Mozilla 20 0 1  compatible  MSIE 5 5  Windows NT   headers      User-Agent  user agent   link    http   www abc com  req   urllib2 Request link  headers   headers  page   urllib2 urlopen req  read   - ERROR 402 generated here    If the page doesn t exist  error 402  or whatever other errors   what can I do in the page       line to make sure that the page I m reading does exit

User · Answer

There is an excellent answer provided by @Adem Öztas, for use with httplib and urllib2. For requests, if the question is strictly about resource existence, then the answer can be improved upon in the case of large resource existence.

The previous answer for requests suggested something like the following:

def uri_exists_get(uri: str) -> bool:
    try:
        response = requests.get(uri)
        try:
            response.raise_for_status()
            return True
        except requests.exceptions.HTTPError:
            return False
    except requests.exceptions.ConnectionError:
        return False

requests.get attempts to pull the entire resource at once, so for large media files, the above snippet would attempt to pull the entire media into memory. To solve this, we can stream the response.

def uri_exists_stream(uri: str) -> bool:
    try:
        with requests.get(uri, stream=True) as response:
            try:
                response.raise_for_status()
                return True
            except requests.exceptions.HTTPError:
                return False
    except requests.exceptions.ConnectionError:
        return False

I ran the above snippets with timers attached against two web resources:

1) http://bbb3d.renderfarming.net/download.html, a very light html page

2) http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_30fps_normal.mp4, a decently sized video file

Timing results below:

uri_exists_get("http://bbb3d.renderfarming.net/download.html")
# Completed in: 0:00:00.611239

uri_exists_stream("http://bbb3d.renderfarming.net/download.html")
# Completed in: 0:00:00.000007

uri_exists_get("http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_30fps_normal.mp4")
# Completed in: 0:01:12.813224

uri_exists_stream("http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_30fps_normal.mp4")
# Completed in: 0:00:00.000007

As a last note: this function also works in the case that the resource host doesn't exist. For example "http://abcdefghblahblah.com/test.mp4" will return False.

User · Answer

You can use HEAD request instead of GET  It will only download the header  but not the content  Then you can check the response status from the headers  For python 2 7 x  you can use httplib  import httplib c   httplib HTTPConnection  www example com   c request  quot HEAD quot       if c getresponse   status    200     print  web site exists    or urllib2  import urllib2 try      urllib2 urlopen  http   www example com some page   except urllib2 HTTPError  e      print e code  except urllib2 URLError  e      print e args   or for 2 7 and 3 x  you can install requests import requests request   requests get  http   www example com   if request status code    200      print  Web site exists   else      print  Web site does not exist

User · Answer

code   a  http   www example com  try          print urllib urlopen a  except      print a    site does not exist

User · Answer

You can simply use stream method to not download the full file  As in latest Python3 you won t get urllib2  It s best to use proven request method  This simple function will solve your problem    def uri exists uri       r   requests get url  stream True      if r status code    200          return True     else          return False

User · Answer

Try this one               import urllib2   website  https   www allyourmusic com    try        response   urllib2 urlopen website        if response code  200            print  site exists          else            print  site doesn t exists      except urllib2 HTTPError  e        print e code    except urllib2 URLError  e        print e args

User · Answer

def isok mypath       try          thepage   urllib request urlopen mypath      except HTTPError as e          return 0     except URLError as e          return 0     else          return 1

User · Answer

It s better to check that status code is  lt  400  like it was done here  Here is what do status codes mean  taken from wikipedia     1xx - informational 2xx - success 3xx - redirection 4xx - client error 5xx - server error   If you want to check if page exists and don t want to download the whole page  you should use Head Request   import httplib2 h   httplib2 Http   resp   h request  http   www google com    HEAD   assert int resp 0   status     lt  400   taken from this answer   If you want to download the whole page  just make a normal request and check the status code  Example using requests   import requests  response   requests get  http   google com   assert response status code  lt  400   See also similar topics    Python script to see if a web page exists without downloading the whole page  Checking whether a link is dead or not using Python without downloading the webpage How do you send a HEAD HTTP request in Python 2  Making HTTP HEAD request with urllib2 from Python 2   Hope that helps

User · Answer

from urllib2 import Request  urlopen  HTTPError  URLError  user agent    Mozilla 20 0 1  compatible  MSIE 5 5  Windows NT   headers      User-Agent  user agent   link    http   www abc com   req   Request link  headers   headers  try          page open   urlopen req  except HTTPError  e          print e code except URLError  e          print e reason else          print  ok    To answer the comment of unutbu      Because the default handlers handle redirects  codes in the 300 range   and codes in the 100-299 range indicate success  you will usually only see error codes in the 400-599 range    Source

[python] Python check if website exists

Examples related to python

Examples related to html

Examples related to urlopen