[r] Download a file from HTTPS using download.file()

I would like to read online data to R using download.file() as shown below.

URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(URL, destfile = "./data/data.csv", method="curl")

Someone suggested to me that I add the line setInternet2(TRUE), but it still doesn't work.

The error I get is:

Warning messages:
1: running command 'curl  "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"  -o "./data/data.csv"' had status 127 
2: In download.file(URL, destfile = "./data/data.csv", method = "curl",  :
  download had nonzero exit status

Appreciate your help.

This question is related to r csv https

The answer is


Try following with heavy files

library(data.table)
URL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- fread(URL)

You can set global options and try-

options('download.file.method'='curl')
download.file(URL, destfile = "./data/data.csv", method="auto")

For issue refer to link- https://stat.ethz.ch/pipermail/bioconductor/2011-February/037723.html


Offering the curl package as an alternative that I found to be reliable when extracting large files from an online database. In a recent project, I had to download 120 files from an online database and found it to half the transfer times and to be much more reliable than download.file.

#install.packages("curl")
library(curl)
#install.packages("RCurl")
library(RCurl)

ptm <- proc.time()
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
proc.time() - ptm
ptm

ptm1 <- proc.time()
curl_download(url =URL ,destfile="TEST.CSV",quiet=FALSE, mode="wb")
proc.time() - ptm1
ptm1

ptm2 <- proc.time()
y = download.file(URL, destfile = "./data/data.csv", method="curl")
proc.time() - ptm2
ptm2

In this case, rough timing on your URL showed no consistent difference in transfer times. In my application, using curl_download in a script to select and download 120 files from a website decreased my transfer times from 2000 seconds per file to 1000 seconds and increased the reliability from 50% to 2 failures in 120 files. The script is posted in my answer to a question I asked earlier, see .


I've succeed with the following code:

url = "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x = read.csv(file=url)

Note that I've changed the protocol from https to http, since the first one doesn't seem to be supported in R.


127 means command not found

In your case, curl command was not found. Therefore it means, curl was not found.

You need to install/reinstall CURL. That's all. Get latest version for your OS from http://curl.haxx.se/download.html

Close RStudio before installation.


If using RCurl you get an SSL error on the GetURL() function then set these options before GetURL(). This will set the CurlSSL settings globally.

The extended code:

install.packages("RCurl")
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))   
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)

Worked for me on Windows 7 64-bit using R3.1.0!


Had exactly the same problem as UseR (original question), I'm also using windows 7. I tried all proposed solutions and they didn't work.

I resolved the problem doing as follows:

  1. Using RStudio instead of R console.

  2. Actualising the version of R (from 3.1.0 to 3.1.1) so that the library RCurl runs OK on it. (I'm using now R3.1.1 32bit although my system is 64bit).

  3. I typed the URL address as https (secure connection) and with / instead of backslashes \\.

  4. Setting method = "auto".

It works for me now. You should see the message:

Content type 'text/csv; charset=utf-8' length 9294 bytes
opened URL
downloaded 9294 by

Here's an update as of Nov 2014. I find that setting method='curl' did the trick for me (while method='auto', does not).

For example:

# does not work
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip')

# does not work. this appears to be the default anyway
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip', method='auto')

# works!
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip', method='curl')

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to https

What's the net::ERR_HTTP2_PROTOCOL_ERROR about? Requests (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.") Error in PyCharm requesting website Android 8: Cleartext HTTP traffic not permitted ssl.SSLError: tlsv1 alert protocol version Invalid self signed SSL cert - "Subject Alternative Name Missing" How do I make a https post in Node Js without any third party module? Page loaded over HTTPS but requested an insecure XMLHttpRequest endpoint How to force Laravel Project to use HTTPS for all routes? Could not create SSL/TLS secure channel, despite setting ServerCertificateValidationCallback Use .htaccess to redirect HTTP to HTTPs