How to open html file

Question

I have html file called test html it has one word         I open the test html and print it s content  using this block of code   file   open  test html    r   print file read     but it prints         why this happened and how could I fix it   BTW  when I open text file it works good   Edit  I d tried this    gt  gt  gt  import codecs  gt  gt  gt  f   codecs open  test html   r    gt  gt  gt  print f read

User · Answer

Use codecs open with the encoding parameter   import codecs f   codecs open  test html    r    utf-8

User · Answer

I encountered this problem today as well  I am using Windows and the system language by default is Chinese  Hence  someone may encounter this Unicode error similarly  Simply add encoding    utf-8    with open  test html    r   encoding  utf-8   as f      text  f read

User · Answer

CODE    import codecs  path  D   Users  html  abc html   file codecs open path  rb   file1 file read   file1 str file1

User · Answer

import codecs f codecs open  test html    r   print f read     Try something like this

User · Answer

You can read HTML page using  urllib      python 2 x    import urllib    page   urllib urlopen  your path    read     print page

User · Answer

you can make use of the following code   from   future   import division  unicode literals  import codecs from bs4 import BeautifulSoup  f codecs open  test html    r    utf-8   document  BeautifulSoup f read    get text   print document   If you want to delete all the blank lines in between and get all the words as a string  also avoid special characters  numbers  then also include   import nltk from nltk tokenize import word tokenize docwords word tokenize document  for line in docwords      line    line rstrip        if line          if re match    A-Za-z     line               if  line not in stop and len line  gt 1                   st st     line print st    define st as a string initially  like st

User · Answer

you can use  urllib  in python3 same as   https   stackoverflow com a 27243244 4815313 with few changes    python3  import urllib  page   urllib request urlopen   path    read   print page

[python] How to open html file?

Examples related to python

Examples related to python-2.7

Examples related to character-encoding