Re: htaccess file



David Bolt wrote:
On Sun, 30 Dec 2007, Barely Audible wrote:-

<big snip>

Something I missed, the first time I read the list:

RewriteCond %{HTTP_USER_AGENT} ^.*Ants.*[NC,OR]
^^
There should be a space between the * and [ .

RewriteCond %{HTTP_USER_AGENT} ^.*[Ww]eb[Bb]andit.*[NC,OR]
^^^^ ^^^^
Are also unnecessary. The NC tells Apache to ignore the case so you
could just use webbandit and it would match.

Another thing you'll probably find, is that including wget isn't going
to help very much when someone does go to use it to grab your site. It
is very easy to disguise wget by telling it to supply another user agent
header, and anyone using it will most likely have looked up the options
using the man page, where they would have also seen this:

--random-wait
Some web sites may perform log analysis to identify retrieval programs such
as Wget by looking for statistically significant similarities in the time
between requests. This option causes the time between requests to vary
between 0 and 2 * wait seconds, where wait was specified using the --wait
option, in order to mask Wget's presence from such analysis.

A recent article in a publication devoted to development on a popular conâ
sumer platform provided code to perform this analysis on the fly. Its
author suggested blocking at the class C address level to ensure automated
retrieval programs were blocked despite changing DHCP-supplied addresses.

The --random-wait option was inspired by this ill-advised recommendation to
block many unrelated users from a web site due to the actions of one.

Using the options:

--wait=30 --random-wait --user-agent="$something_resembling_a_browser_user_agent"

along with others required, and you probably won't even know they're
doing it.

Thanks Dave - I have learnt a lot!

I'm still getting the 500 error though :-(
.



Relevant Pages

  • Re: htaccess file
    ... is that including wget isn't going ... Some web sites may perform log analysis to identify retrieval programs such ... This option causes the time between requests to vary ...
    (alt.os.linux.suse)
  • Re: Hi Sandman
    ... digging through my web sites. ... a commercial site, or just obsessive), godaddy doesn't provide hits by ... host- only *REQUESTS* by host, which is a very different beast. ... A server logfile generally lists one line per request. ...
    (comp.sys.mac.advocacy)
  • Re: Easy script for "http post"ing?
    ... > would like them to periodically send some FORMATED data (not just log ... > server I though to send to it the data throught http/https post ... I probed with wget but it just look to be designed to ... > receive data and by using http get requests my scripts got to complex. ...
    (comp.os.linux.misc)
  • Re: HSDPA and LAN
    ... If the IT department here would be receptive to requests for open internet ... pretty much every port I can think of, and filtered out half the web sites ... firewall. ...
    (microsoft.public.windowsxp.network_web)
  • Re: Dedicated Nics/Web Addresses
    ... >to receive moderate to large numbers of requests. ... >two web sites, each with its own dedicated NIC and IP address, instead of ... virtual directories *may* have a difference, ...
    (microsoft.public.inetserver.iis)