[html] How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list?

There is an online HTTP directory that I have access to. I have tried to download all sub-directories and files via wget. But, the problem is that when wget downloads sub-directories it downloads the index.html file which contains the list of files in that directory without downloading the files themselves.

Is there a way to download the sub-directories and files without depth limit (as if the directory I want to download is just a folder which I want to copy to my computer).

online HTTP directory

This question is related to html http get download wget

The answer is


you can use lftp, the swish army knife of downloading if you have bigger files you can add --use-pget-n=10 to command

lftp -c 'mirror --parallel=100 https://example.com/files/ ;exit'

wget generally works in this way, but some sites may have problems and it may create too many unnecessary html files. In order to make this work easier and to prevent unnecessary file creation, I am sharing my getwebfolder script, which is the first linux script I wrote for myself. This script downloads all content of a web folder entered as parameter.

When you try to download an open web folder by wget which contains more then one file, wget downloads a file named index.html. This file contains a file list of the web folder. My script converts file names written in index.html file to web addresses and downloads them clearly with wget.

Tested at Ubuntu 18.04 and Kali Linux, It may work at other distros as well.

Usage :

  • extract getwebfolder file from zip file provided below

  • chmod +x getwebfolder (only for first time)

  • ./getwebfolder webfolder_URL

such as ./getwebfolder http://example.com/example_folder/

Download Link

Details on blog


You can use this Firefox addon to download all files in HTTP Directory.

https://addons.mozilla.org/en-US/firefox/addon/http-directory-downloader/


wget is an invaluable resource and something I use myself. However sometimes there are characters in the address that wget identifies as syntax errors. I'm sure there is a fix for that, but as this question did not ask specifically about wget I thought I would offer an alternative for those people who will undoubtedly stumble upon this page looking for a quick fix with no learning curve required.

There are a few browser extensions that can do this, but most require installing download managers, which aren't always free, tend to be an eyesore, and use a lot of resources. Heres one that has none of these drawbacks:

"Download Master" is an extension for Google Chrome that works great for downloading from directories. You can choose to filter which file-types to download, or download the entire directory.

https://chrome.google.com/webstore/detail/download-master/dljdacfojgikogldjffnkdcielnklkce

For an up-to-date feature list and other information, visit the project page on the developer's blog:

http://monadownloadmaster.blogspot.com/


wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/

From man wget

‘-r’ ‘--recursive’ Turn on recursive retrieving. See Recursive Download, for more details. The default maximum depth is 5.

‘-np’ ‘--no-parent’ Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.

‘-nH’ ‘--no-host-directories’ Disable generation of host-prefixed directories. By default, invoking Wget with ‘-r http://fly.srk.fer.hr/’ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.

‘--cut-dirs=number’ Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.

Take, for example, the directory at ‘ftp://ftp.xemacs.org/pub/xemacs/’. If you retrieve it with ‘-r’, it will be saved locally under ftp.xemacs.org/pub/xemacs/. While the ‘-nH’ option can remove the ftp.xemacs.org/ part, you are still stuck with pub/xemacs. This is where ‘--cut-dirs’ comes in handy; it makes Wget not “see” number remote directory components. Here are several examples of how ‘--cut-dirs’ option works.

No options -> ftp.xemacs.org/pub/xemacs/ -nH -> pub/xemacs/ -nH --cut-dirs=1 -> xemacs/ -nH --cut-dirs=2 -> .

--cut-dirs=1 -> ftp.xemacs.org/xemacs/ ... If you just want to get rid of the directory structure, this option is similar to a combination of ‘-nd’ and ‘-P’. However, unlike ‘-nd’, ‘--cut-dirs’ does not lose with subdirectories—for instance, with ‘-nH --cut-dirs=1’, a beta/ subdirectory will be placed to xemacs/beta, as one would expect.


I was able to get this to work thanks to this post utilizing VisualWGet. It worked great for me. The important part seems to be to check the -recursive flag (see image).

Also found that the -no-parent flag is important, othewise it will try to download everything.

enter image description here enter image description here


No Software or Plugin required!

(only usable if you don't need recursive deptch)

Use bookmarklet. Drag this link in bookmarks, then edit and paste this code:

(function(){ var arr=[], l=document.links; var ext=prompt("select extension for download (all links containing that, will be downloaded.", ".mp3"); for(var i=0; i<l.length; i++) { if(l[i].href.indexOf(ext) !== false){ l[i].setAttribute("download",l[i].text); l[i].click(); } } })();

and go on page (from where you want to download files), and click that bookmarklet.


Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to http

Access blocked by CORS policy: Response to preflight request doesn't pass access control check Axios Delete request with body and headers? Read response headers from API response - Angular 5 + TypeScript Android 8: Cleartext HTTP traffic not permitted Angular 4 HttpClient Query Parameters Load json from local file with http.get() in angular 2 Angular 2: How to access an HTTP response body? What is HTTP "Host" header? Golang read request body Angular 2 - Checking for server errors from subscribe

Examples related to get

Getting "TypeError: failed to fetch" when the request hasn't actually failed java, get set methods For Restful API, can GET method use json data? Swift GET request with parameters Sending a JSON to server and retrieving a JSON in return, without JQuery Retrofit and GET using parameters Correct way to pass multiple values for same parameter name in GET request How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list? Curl and PHP - how can I pass a json through curl by PUT,POST,GET Making href (anchor tag) request POST instead of GET?

Examples related to download

how to download file in react js How do I download a file with Angular2 or greater Unknown URL content://downloads/my_downloads python save image from url How to download a file using a Java REST service and a data stream How to download file in swift? Where can I download Eclipse Android bundle? How to download image from url android download pdf from url then open it with a pdf reader Flask Download a File

Examples related to wget

How to `wget` a list of URLs in a text file? How to install wget in macOS? wget ssl alert handshake failure How to run wget inside Ubuntu Docker image? Unable to establish SSL connection upon wget on Ubuntu 14.04 LTS How to use Python requests to fake a browser visit a.k.a and generate User Agent? wget/curl large file from google drive wget: unable to resolve host address `http' Python equivalent of a given wget command How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list?