Tuesday, December 1, 2009

Some wget tricks

The wget is a command line utility that can download files from web servers and FTP servers.
For example , you can download the DVD image of Karmic Koala using the following command.

$ wget http://cdimage.ubuntu.com/releases/9.10/release/ubuntu-9.10-dvd-i386.iso

If an FTP server requires a login and password, you can enter that information on
the wget command line in the following form.

$ wget ftp://user:password@ftp.example.com/path/to/file

You can use wget to download a single web page as follows:

$ wget http://unixlab.blogspot.com

A file named index.html will be created in your current directory

If you open this index.html in a web browser , you will find some of the links broken especially images. . To download all the images and other elements required to render the page properly, -p option can be used.

$ wget -p http://unixlab.blogspot.com

This will create a folder named unixlab.blogspot.com with index.html in it.

But if you open the resulting index.html in your browser, chances are you will still
have all the broken links even though all the images were downloaded. That’s because
the links need to be translated to point to your local files. So instead, do this:

$ wget -pk http://unixlab.blogspot.com

Sometimes an HTML file you download does not have an.html extension, but ends
in .php or .cgi instead. . If you wget files from such a site , your browser will complain when you try to open the file. To solve the problem , you can ask wget to append .html to such files using the -E option:

$ wget -E http://unixlab.blogspot.com

I use the following command line to keep a usable copy of the website on my hard disk.

$ wget -mEkK http://unixlab.blogspot.com

No comments: