I was trying to download zip files linked from Omeka's themes page - pretty similar task. This worked for me:
wget -A zip -r -l 1 -nd http://omeka.org/add-ons/themes/
-A
: only accept zip files-r
: recurse-l 1
: one level deep (ie, only files directly linked from this page)-nd
: don't create a directory structure, just download all the files into this directory.All the answers with -k
, -K
, -E
etc options probably haven't really understood the question, as those as for rewriting HTML pages to make a local structure, renaming .php
files and so on. Not relevant.
To literally get all files except .html
etc:
wget -R html,htm,php,asp,jsp,js,py,css -r -l 1 -nd http://yoursite.com
You may try:
wget --user-agent=Mozilla --content-disposition --mirror --convert-links -E -K -p http://example.com/
Also you can add:
-A pdf,ps,djvu,tex,doc,docx,xls,xlsx,gz,ppt,mp4,avi,zip,rar
to accept the specific extensions, or to reject only specific extensions:
-R html,htm,asp,php
or to exclude the specific areas:
-X "search*,forum*"
If the files are ignored for robots (e.g. search engines), you've to add also: -e robots=off
This downloaded the entire website for me:
wget --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla http://site/path/
wget -m -p -E -k -K -np http://site/path/
man page will tell you what those options do.
wget
will only follow links, if there is no link to a file from the index page, then wget
will not know about its existence, and hence not download it. ie. it helps if all files are linked to in web pages or in directory indexes.
wget -m -A * -pk -e robots=off www.mysite.com/
this will download all type of files locally and point to them from the html file and it will ignore robots file
Try this. It always works for me
wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
Source: Stackoverflow.com