I'm trying to read a csv-file from given URL, using Python 3.x:
import pandas as pd
import requests
url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
s = requests.get(url).content
c = pd.read_csv(s)
I have the following error
"Expected file path name or file-like object, got <class 'bytes'> type"
How can I fix this? I'm using Python 3.4
In the latest version of pandas (0.19.2
) you can directly pass the url
import pandas as pd
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c=pd.read_csv(url)
As I commented you need to use a StringIO object and decode i.e c=pd.read_csv(io.StringIO(s.decode("utf-8")))
if using requests, you need to decode as .content returns bytes if you used .text you would just need to pass s as is s = requests.get(url).text
c = pd.read_csv(StringIO(s))
.
A simpler approach is to pass the correct url of the raw data directly to read_csv
, you don't have to pass a file like object, you can pass a url so you don't need requests at all:
c = pd.read_csv("https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv")
print(c)
Output:
Country Region
0 Algeria AFRICA
1 Angola AFRICA
2 Benin AFRICA
3 Botswana AFRICA
4 Burkina AFRICA
5 Burundi AFRICA
6 Cameroon AFRICA
..................................
From the docs:
filepath_or_buffer :
string or file handle / StringIO The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv
url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
c = pd.read_csv(url, sep = "\t")
The problem you're having is that the output you get into the variable 's' is not a csv, but a html file. In order to get the raw csv, you have to modify the url to:
'https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
Your second problem is that read_csv expects a file name, we can solve this by using StringIO from io module. Third problem is that request.get(url).content delivers a byte stream, we can solve this using the request.get(url).text instead.
End result is this code:
from io import StringIO
import pandas as pd
import requests
url='https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
s=requests.get(url).text
c=pd.read_csv(StringIO(s))
output:
>>> c.head()
Country Region
0 Algeria AFRICA
1 Angola AFRICA
2 Benin AFRICA
3 Botswana AFRICA
4 Burkina AFRICA
To Import Data through URL in pandas just apply the simple below code it works actually better.
import pandas as pd
train = pd.read_table("https://urlandfile.com/dataset.csv")
train.head()
If you are having issues with a raw data then just put 'r' before URL
import pandas as pd
train = pd.read_table(r"https://urlandfile.com/dataset.csv")
train.head()
Source: Stackoverflow.com