Re: wget - the question for advanced users



On Fri, 29 Dec 2006 16:11:31 +0100, Piotr wrote this:

Hmmm sorry I'm not following you too well. You want to follow the links
for sub directory html and photos but exclude parent directories? If
you've run wget and have every thing then you can delete what you don't
need right ?

No I can't. When wget follows the links from the documents in the main
directory, it would download the contents of all server and it means it
will download terribly, terribly huge amounts of data.

Also, a link on wget for following links if you haven't read it yet:

http://www.gnu.org/software/wget/manual/wget.html#Following-Links

Yep, I've read manual many times. But still I can't catch how to solve my
problem.


---------------------------------------------------

http://www.pbase.com/piotrstankiewicz


Try the no-parent option otherwise you'll have resolve the link problem
using multiple wgets or a rewrite of your index.html.

-np
--no-parent
no_parent = on
The simplest, and often very useful way of limiting directories is disallowing retrieval of the links that refer to the hierarchy above than the beginning directory, i.e. disallowing ascent to the parent directory/directories.

The --no-parent option (short -np) is useful in this case. Using it guarantees that you will never leave the existing hierarchy. Supposing you issue Wget with:

wget -r --no-parent http://somehost/~luzer/my-archive/


You may rest assured that none of the references to
/~his-girls-homepage/ or /~luzer/all-my-mpegs/ will be followed. Only
the archive you are interested in will be downloaded. Essentially,
--no-parent is similar to -I/~luzer/my-archive, only it handles
redirections in a more intelligent fashion.
.



Relevant Pages

  • Re: wget - the question for advanced users
    ... then maybe you can see what Wget is doing. ... download documents from the main directory / and I don't know why. ... You want to follow the links for sub directory html and photos but exclude ...
    (alt.os.linux)
  • Re: wget - the question for advanced users
    ... You want to follow the links for sub directory html and photos but exclude ... When wget follows the links from the documents in the main ...
    (alt.os.linux)
  • Re: Free Metalworking Plans
    ... NRA LOH & Endowment Member, Golden Eagle, Patriot"s Medal. ... | I Dl'd the lathe plans, ... Use something like Wget ... or some other worthy download manager to retrieve it. ...
    (rec.crafts.metalworking)
  • Wget usage : request for comments
    ... I am going to start a small project to analyze ... 8 websites with hyperlinks, images, js, etc.. ... I will use wget as a crawler (I like command ... seems I can't download the results in text format. ...
    (comp.os.linux.misc)
  • Re: [unix, console] : something like wget but with forms + authentification
    ... >> download the response after submitting your name and password, ... > tools, like lynx, links, elinks, wget, don't seem to access to this ... I used wget to download the pages you linked to, ... html expert, but perhaps I have been exposed to a little more ...
    (comp.unix.shell)