[regex] How to download an entire directory and subdirectories using wget?

I am trying to download the files for a project using wget, as the SVN server for that project isn't running anymore and I am only able to access the files through a browser. The base URLs for all the files is the same like

http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/*

How can I use wget (or any other similar tool) to download all the files in this repository, where the "tzivi" folder is the root folder and there are several files and sub-folders (upto 2 or 3 levels) under it?

This question is related to regex linux bash wget

The answer is


You can use this in a shell:

wget -r -nH --cut-dirs=7 --reject="index.html*" \
      http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/

The Parameters are:

-r recursively download

-nH (--no-host-directories) cuts out hostname 

--cut-dirs=X (cuts out X directories)

This works:

wget -m -np -c --no-check-certificate -R "index.html*" "https://the-eye.eu/public/AudioBooks/Edgar%20Allan%20Poe%20-%2"

use the command

wget -m www.ilanni.com/nexus/content/

This will help

wget -m -np -c --level 0 --no-check-certificate -R"index.html*"http://www.your-websitepage.com/dir


you can also use this command :

wget --mirror -pc --convert-links -P ./your-local-dir/ http://www.your-website.com

so that you get the exact mirror of the website you want to download


This link just gave me the best answer:

$ wget --no-clobber --convert-links --random-wait -r -p --level 1 -E -e robots=off -U mozilla http://base.site/dir/

Worked like a charm.


wget -r --no-parent URL --user=username --password=password

the last two options are optional if you have the username and password for downloading, otherwise no need to use them.

You can also see more options in the link https://www.howtogeek.com/281663/how-to-use-wget-the-ultimate-command-line-downloading-tool/


Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to bash

Comparing a variable with a string python not working when redirecting from bash script Zipping a file in bash fails How do I prevent Conda from activating the base environment by default? Get first line of a shell command's output Fixing a systemd service 203/EXEC failure (no such file or directory) /bin/sh: apt-get: not found VSCode Change Default Terminal Run bash command on jenkins pipeline How to check if the docker engine and a docker container are running? How to switch Python versions in Terminal?

Examples related to wget

How to `wget` a list of URLs in a text file? How to install wget in macOS? wget ssl alert handshake failure How to run wget inside Ubuntu Docker image? Unable to establish SSL connection upon wget on Ubuntu 14.04 LTS How to use Python requests to fake a browser visit a.k.a and generate User Agent? wget/curl large file from google drive wget: unable to resolve host address `http' Python equivalent of a given wget command How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list?