Re: When is the FTP version available?

From: Arthur Hagen (art_at_broomstick.com)
Date: 06/17/05


Date: Thu, 16 Jun 2005 22:20:43 -0400

houghi <houghi@houghi.org.invalid> wrote:
> Arthur Hagen wrote:
>> houghi <houghi@houghi.org.invalid> wrote:
>>>
>>> I came up with (in this case /var/spool/news/alt/os/linux/suse)
>>> grep -i -E '^(User-Agent|X-(mailer|Newsreader))' *|awk -F: '{print
>>> $3}'|sort|uniq -c|sort -nr
>>
>> If going for speed, sed is probably faster than awk:
>>
>> sed </var/spool/news/alt/os/linux/suse/* \
>> 's/^\(User-Agent\|X-\(Mailer\|Newsreader\)\): //i;tt;d;:t;s/.*: //'
>> |\ sort | uniq -c | sort -rn
>>
>> Of course, even better would be to change the algorithm so it stops
>> reading the file after a match, or when there's no more headers.
>
> I get:
> bash: /var/spool/news/alt/os/linux/suse/*: ambiguous redirect

Ah, my bad (I did it with a single file). Try

sed 's/^\(User-Agent\|X-\(Mailer\|Newsreader\)\): //i;tt;d;:t;s/.*: //'
\
/var/spool/news/alt/os/linux/suse/* |\
sort | uniq -c | sort -rn

Another improvement that can be made is to avoid a problem when there's
so many files in the directory that the shell can't handle it on one
line, and you get an "Argument list too long" error. It can also easily
be made into a script:

-- cut here -- clientstats --
#!/bin/sh
for group in $*; do
 GPATH="/var/spool/news/`echo $group | sed 's/\./\//g'`"
 echo "Statistics for $group:"
 find $GPATH -type f -maxdepth 1 | xargs\
 sed 's/^\(User-Agent\|X-\(Mailer\|Newsreader\)\): //i;tt;d;:t;s/.*:
//'|
 sort | uniq -c | sort -nk1r,1f
done
-- cut here -- clientstats --

Call with one or more group names as parameters, e.g.:

./clientstats alt.os.linux.suse

(I also took the liberty of "improving" the last sort, so agents with
the same number will be sorted alphabetically case insensitive and not
reverse alphabetically case sensitive.)

Have I piqued your interest for sed yet? :-)

Regards,

-- 
*Art


Relevant Pages


Loading