Monday, February 25, 2008

batch web downloading

Under Linux there several types of web sites mirrorers and downloaders.

Some have GUI, however I prefer to use command line, as more visual then GUI (you can see all your options you inserted).

httrack is initially intended for mirroring sites, it doesn't require a lot of options in line to insert.

$ httrack --user-agent="" http://rgritsulyak.googlepages.com


It will download site on the domain recursively.

Another, more traditional web-site copier is

wget

I use it mostly with following line:
$ wget -r -l3 --user-agent="" http://rgritsulyak.googlepages.com


options:
--user-agent
is string identifying user agent (i.e. browser name). Same for wget and httrack.
-r
is for downloading recursive, i.e. more than one page, following links. it is wget option.
-l3
for downloading 3 levels deep.Wget option.

Of cause, you can get more usage by [man wget] and [man httrack] command.

No comments: