Re: spamassassin setup with Evo - Edgy



On Thu, 2007-03-29 at 21:29 -0400, Jeffrey F. Bloss wrote:
John Dangler wrote:

I have been putting spam mails into a folder called 'Possible_Spam' for
the last hour or so (it has 65 messages in it now), and have run
sa-learn --spam --mbox ./Possible_Spam against it... once the messages
in the folder have been run through sa-learn, do I need to leave them
there, or is it okay to delete them ? (I'm not sure I can see the reason
for running sa-learn multiple times against the same messages) ...

You can move them or whatever, but it's always nice to have a set of
known spam messages handy in case you need to re-train so I wouldn't
blow them up. Unless you have an unlimited supply...
According to the man page on sa-learn -
--ham
Learn the input message(s) as ham. If you have previously learnt any
of the messages as spam, SpamAssassin will forget them first, then
re-learn them as ham. Alternatively, if you have previously learnt them
as ham, it’ll skip them this time around. If the messages have already
been filtered through SpamAssassin, the learner will ignore any
modifications SpamAssassin may have made.

--spam
Learn the input message(s) as spam. If you have previously learnt any
of the messages as ham, SpamAssassin will forget them first, then
re-learn them as spam. Alternatively, if you have previously learnt
them as spam, it’ll skip them this time around. If the messages have
already been filtered through SpamAssassin, the learner will ignore any
modifications SpamAssassin may have made.

So, re-running messages previously identified as one or the other is ok,
since the utility will know which classification the message has been
set to.


It's not a good thing to re-run sa-learn across the same set of
messages because it muddies up SA's database of tokens I think, or
rather there's a command line switch that needs thrown to make
'sa-learn --forget' about a message before re-learning it as spam or
ham. ;)
I'm not sure this is necessary, given the above explanations... although
there is also --clear, which will wipe a learn database so you can start
over...

And don't neglect to run 'sa-learn --ham' on a goodly amount of known
ham messages, it's equally important for initial setup.
Yeah - about this... my email tree is setup so that all of my folders
are set at the same level as Inbox. Yet, when I run sa-learn against my
Inbox folder, which currently has 746 messages in it, sa-learn reports:
root@croatus: sa-learn --ham --mbox ./Inbox
Learned tokens from 236 message(s) (2617 message(s) examined)

I'm not sure how this is possible. But if it's looking at everything on
this pass, then it's taking my Spam folder and re-learning it as ham
every time...



--
ubuntu-users mailing list
ubuntu-users@xxxxxxxxxxxxxxxx
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users


Relevant Pages

  • Re: OT - has my email domain been hijacked?
    ... >> spammer bothered to forge them anymore. ... >> automated tools like SpamAssassin have gotten pretty good at finding. ... Some people consider the darndest things to be ham or spam. ...
    (Fedora)
  • Re: ping: Mara - Apache2 SSL...
    ... I have most of the spam ... ham, and it has to be trained by some intelligent being (i.e. ... It is a bad idea to run sa-learn directly against forwarded ... I think I can tweak the below script to ...
    (alt.2600)
  • Re: sendmail milter for fc1?
    ... > flag a message when it went through so spamassassin did not get run ... If it was marked as spam I have procmail deliver the message ... > so the database is kept under that users home directory. ... each user needs to run sa-learn to let ...
    (Fedora)
  • Re: How to replace sendmail with postfix?
    ... spamassassin, but anywhere I post an adres, it's always a valid one. ... Also most of the 'ham' is really ham and not spam pretending to be ham. ... generally get one or two false negatives a day, often days go by without any. ...
    (comp.unix.bsd.freebsd.misc)
  • Re: Spamassasin bribed?
    ... Feed the spam to salearn and watch it evaporate away after awhile. ... I have a long mantra about the RIGHT way to use SpamAssassin ... Use per user Bayes. ... user Bayes has some serious advantages when one user's ham is ...
    (Fedora)