Re: how to download a web page (a LOT harder than it sounds)



On 2008-02-07, dementrio <dementrio@xxxxxxxxx> wrote:


Hi,

I've been spending a couple hours trying to do something which I
thought should've taken AT MOST maybe 2 minutes.

Basically I want to print a collection of Wikipedia articles. To do
that I'd like to first download them and convert them to latex.

The problem is, I can't get past step 1. Here I'll describe the
attempts I've made, and I'm begging you for more ideas.

- For some reason, html2latex can't see the web (URI: Unable to
access http://en.wikipedia.org/whatever), but seems to work on local
files. So I'm going to fetch the pages for him.

- wget has an option (-p) to download "page requisites" (I'm
interested in images, mostly). But, all images on en.wikipedia reside
on upload.wikimedia. Wget doesn't follow external links, so -p doesn't
work.

wget also has options -r, -l and --span-hosts.

.



Relevant Pages