[python] What should I use to open a url instead of urlopen in urllib3

I wanted to write a piece of code like the following:

from bs4 import BeautifulSoup
import urllib2

url = 'http://www.thefamouspeople.com/singers.php'
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)

But I found that I have to install urllib3 package now.

Moreover, I couldn't find any tutorial or example to understand how to rewrite the above code, for example, urllib3 does not have urlopen.

Any explanation or example, please?!

P/S: I'm using python 3.4.

This question is related to python web-scraping beautifulsoup urllib3

The answer is


The new urllib3 library has a nice documentation here
In order to get your desired result you shuld follow that:

Import urllib3
from bs4 import BeautifulSoup

url = 'http://www.thefamouspeople.com/singers.php'

http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))

The "decode utf-8" part is optional. It worked without it when i tried, but i posted the option anyway.
Source: User Guide


With gazpacho you could pipeline the page straight into a parse-able soup object:

from gazpacho import Soup
url = "http://www.thefamouspeople.com/singers.php"
soup = Soup.get(url)

And run finds on top of it:

soup.find("div")

You should use urllib.reuqest, not urllib3.

import urllib.request   # not urllib - important!
urllib.request.urlopen('https://...')

In urlip3 there's no .urlopen, instead try this:

import requests
html = requests.get(url)

urllib3 is a different library from urllib and urllib2. It has lots of additional features to the urllibs in the standard library, if you need them, things like re-using connections. The documentation is here: https://urllib3.readthedocs.org/

If you'd like to use urllib3, you'll need to pip install urllib3. A basic example looks like this:

from bs4 import BeautifulSoup
import urllib3

http = urllib3.PoolManager()

url = 'http://www.thefamouspeople.com/singers.php'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)

You do not have to install urllib3. You can choose any HTTP-request-making library that fits your needs and feed the response to BeautifulSoup. The choice is though usually requests because of the rich feature set and convenient API. You can install requests by entering pip install requests in the command line. Here is a basic example:

from bs4 import BeautifulSoup
import requests

url = "url"
response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to web-scraping

Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org How to print an exception in Python 3? What should I use to open a url instead of urlopen in urllib3 Use Excel VBA to click on a button in Internet Explorer, when the button has no "name" associated How to use Python requests to fake a browser visit a.k.a and generate User Agent? Scraping data from website using vba Using BeautifulSoup to extract text without tags Is it ok to scrape data from Google results? What's the best way of scraping data from a website? Use getElementById on HTMLElement instead of HTMLDocument

Examples related to beautifulsoup

Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org What should I use to open a url instead of urlopen in urllib3 TypeError: a bytes-like object is required, not 'str' in python and CSV UnicodeEncodeError: 'ascii' codec can't encode character at special name UnicodeEncodeError: 'charmap' codec can't encode characters bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library? Using BeautifulSoup to extract text without tags python BeautifulSoup parsing table install beautiful soup using pip Python BeautifulSoup extract text between element

Examples related to urllib3

What should I use to open a url instead of urlopen in urllib3 Suppress InsecureRequestWarning: Unverified HTTPS request is being made in Python2.6 Python Requests throwing SSLError