Keep online documentation offline with wget

Old 1 Comment on Keep online documentation offline with wget 28

Most of the programmers or people who need to have access to some documentation are required to be connected to the Internet. But what if you need the documentation for offline use? Well, there is a utility called wget which is specifically built for such type of tasks.

Wget is a command line utility and is also open-source, built by GNU. Linux distributions typically have it incorporated into them. Wget can download selected pages or mirror an entire website and also handles poor network lines by automatically retrying download until it gets the files.

The manual page of wget will definitely help you mirror a site, but is too overwhelming for beginners. So here’s the command which you can simply copy and paste into the terminal and specify the site to be mirrored.

I was in need of OpenCV documentation to be present on by Ubuntu locally. So I mirrored the site docs.opencv.org using wget as –

wget -r -l inf -k -np -p -nc http://docs.opencv.org

Replace the site with the one you want to mirror for your reference.

Let’s see what this command does –

-r
Turn on recursive retrieval, so that all subdirectories are also taken, instead of only files in the directory specified.
-l inf
Level/depth to which recursive retrieval is to be done. inf indicates infinite, which takes the entire directory structure.
-k
Convert links in downloaded pages so that they work when browsed locally, otherwise the downloaded content is of no use if you yourself had to hunt down the local pages while the links will direct you over to the Internet.
-np
No parent. Recursive downloading typically also downloads upper directories, which we do not want. You can discard it if you need to download the entire site, but there’s a better option -m which does the whole mirroring without specifying -r or -l. The manual for wget contains more information on that.
-p
Download page requisites. For example, any images, CSS files, etc. required to best view the site.
-nc
Download a file only once. Normally, if a file is downloaded twice, the original is retained and newer one is given a different file name, which is really not needed, atleast in this case.

This operation typically takes time to finish, so do something else in the meantime and utilize that time. Wget is best suited for static pages and online documentations fit under this category well. Do see the manual page http://linux.die.net/man/1/wget for more information. So, happy downloading!

Author

Vivek Prajapati

A moderate level programmer interested in administration and Arduino. Familiar with C++, Java, PHP, C# with my favourite being C++. Just finished my bachelor's degree in IT.

Related Articles

1 Comment

  1. Andrei Telteu June 25, 2014 at 2:41 pm

    I didn’t knew that wget can be used for that.
    Awesome !

Leave a comment

Back to Top