How to read html from a url in python 3

Question

I looked at previous similar questions and got only more confused   In python 3 4  I want to read an html page as a string  given the url   In perl I do this with LWP  Simple  using get     A matplotlib 1 3 1 example says  import urllib  u1 urllib urlretrieve url   python3 can t find urlretrieve   I tried u1   urllib request urlopen url   which appears to get an HTTPResponse object  but I can t print it or get a length on it or index it   u1 body doesn t exist  I can t find a description of the HTTPResponse in python3   Is there an attribute in the HTTPResponse object which will give me the raw bytes of the html page    Irrelevant stuff from other questions include urllib2  which doesn t exist in my python  csv parsers  etc    Edit   I found something in a prior question which partially  mostly  does the job   u2   urllib request urlopen  http   finance yahoo com q s aapl amp ql 1    for lines in u2 readlines        print  lines    I say  partially  because I don t want to read separate lines  but just one big string   I could just concatenate the lines  but every line printed has a character  b  prepended to it   Where does that come from   Again  I suppose I could delete the first character before concatenating  but that does get to be a kloodge

User · Answer

Try the  requests  module  it s much simpler    pip install requests for installation  import requests  url    https   www google com   r   requests get url  r text   more info here   http   docs python-requests org en master

User · Answer

urllib request urlopen url  read   should return you the raw HTML page as a string

User · Answer

Note that Python3 does not read the html code as a string but as a bytearray  so you need to convert it to one with decode   import urllib request  fp   urllib request urlopen  http   www python org   mybytes   fp read    mystr   mybytes decode  utf8   fp close    print mystr

User · Answer

For python 2 import urllib some url    https   docs python org 2 library urllib html  filehandle   urllib urlopen some url  print filehandle read

User · Answer

Reading an html page with urllib is fairly simple to do  Since you want to read it as a single string I will show you   Import urllib request      usr bin python3 5  import urllib request   Prepare our request  request   urllib request Request  http   www w3schools com     Always use a  try except  when requesting a web page as things can easily go wrong  urlopen   requests the page   try      response   urllib request urlopen request  except      print  something wrong     Type is a great function that will tell us what  type  a variable is  Here  response is a http response object   print type response     The read function for our response object will store the html as bytes to our variable  Again type   will verify this   htmlBytes   response read    print type htmlBytes     Now we use the decode function for our bytes variable to get a single string   htmlStr   htmlBytes decode  utf8    print type htmlStr     If you do want to split up this string into separate lines  you can do so with the split   function  In this form we can easily iterate through to print out the entire page or do any other processing   htmlSplit   htmlStr split   n    print type htmlSplit    for line in htmlSplit      print line    Hopefully this provides a little more detailed of an answer  Python documentation and tutorials are great  I would use that as a reference because it will answer most questions you might have

User · Answer

import requests  url   requests get  http   yahoo com   htmltext   url text print htmltext    This will work similar to urllib urlopen

[python] How to read html from a url in python 3

Examples related to python

Examples related to html

Examples related to url