Re: spamassassin doesn't seem to be using bayes

From: jdow (jdow_at_earthlink.net)
Date: 10/21/05

  • Next message: Richard E Miles: "Re: error inserting acpi-cpufreq no such device"
    To: <akonstam@trinity.edu>, "For users of Fedora Core releases" <fedora-list@redhat.com>
    Date: Fri, 21 Oct 2005 13:55:10 -0700
    
    

    From: <akonstam@trinity.edu>
    > On Fri, Oct 21, 2005 at 10:50:27AM -0700, jdow wrote:
    >> From: "Alexander Dalloz" <ad+lists@uni-x.org>
    >>
    >> >Did you set in local.cf something like following?
    >>
    >> >use_bayes 1
    >> ^^^ good
    >> >auto_learn 1
    >> ^^^ IMAO that is poison unless you also change the threshold
    >> scores for bayes to 'way out there'. These lines will do that:
    >> ---8<---
    >> bayes_auto_learn_threshold_spam 20.0
    >> bayes_auto_learn_threshold_nonspam 0.1
    >>
    >> ---8<---
    > The above is interesting since I would think that the default value 12
    > is too high. Your line says that auto_learning should not happen
    > unless the score is greater than 20. Why do your think that is good?
    > -------------------------------------------

    It avoids false white-listing and false training. The "cost" of repair
    for false training and white-listing is disproportionately high compared
    to normally expected levels of spamassassin maintenance. Once you have
    operated for a significant period of time you should be able to reduce
    the scores to "stock" levels safely. If you watch spam scores and note
    levels that are questionable you may be able to set scores even tighter
    than stock. However, it is not uncommon to see hams from SOME sources
    that score into the 20s. In my case that happens with LKML from time
    to time. Rules that generally work on normal mail go crazy with patchs
    and kernel debug reports.

    I note that I am not the only person suggesting this on the spamassassin
    users mailing list. The maintainers are mum on the issue for the most
    part.

    I also note that retraining Bayes on messages that already have high
    Bayes scores seems to be pointless based on my own results. I train
    only on messages that score low numbers of points and have Bayes
    scores below 99. I also grab periodic bundles of ham to feed my Bayes
    system when it starts getting imbalanced between ham and spam. At the
    moment I have trained with about 10% of the numbers D. D.'s -D results
    indicated he had. And I've never had to go find the WIKI page to learn
    how to correct an auto-whitelist (don't use it at all) or bayes
    screwup. This makes life easier for me. (I've never had an expire
    go awry, either. Um, I've never RUN an expire. {^_-})

    Let's see, the general wisdom on the user's list is that the nonspam
    threshold should be at least slightly negative if you have any rules
    that hit ham preferrentially. But 0.1 is probably OK. I solve the
    threshold problem by using meatware. Once a day I sort the spam folder
    by score and check out the lowest few scores and make a quick scan
    for keywords that might indicate an interesting LKML message that was
    mismarked. (Although all I usually do with that list is scan subjects
    for the current "buzz", like real time precision clocks appearing to
    run backwards because 2 seconds is long enough to wrap a 32 bit counter.)
    The scanning is of modest interest. Sometimes I go through the email to
    see what new rules might be called for or admire the new all time high
    score for the machine from one of Leo's postings. (Drug spams with
    base64 encoded bodies coupled with odd DNS entries are among his
    signatures. He's headed for number one on ROKSO it appears. Even though
    Ralsky was busted and (temporarily) shut down by the FBI Leo is still
    number two on the top 10 list. He's sort of cute and deadly smart with
    DNS tricks. He seems to be into drugs and kinky sex. He's managed a
    message that hit 72 rules, 8 of which are zero score rules used in
    meta rules, that ran to over 105 points. This was with a remarkably
    short base64 encoded drug spam. I didn't feed it to bayes. My bayes
    already knows all about the V drug and E Dysfunct issues.)

    I also don't use tools such as amavis or milters. Plain old procmail
    is remarkably transparent about what it does. If I ask spamassassin for
    a specific markup I get that markup and not what some other futility
    dictates I'll get. (I can also do such perverted things as playing a
    tune after procmail processes emails from customers. {^_-})

    {^_^} Joanne

    -- 
    fedora-list mailing list
    fedora-list@redhat.com
    To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
    

  • Next message: Richard E Miles: "Re: error inserting acpi-cpufreq no such device"

    Relevant Pages

    • Re: Spamassassin 3.0.2 autolearn=ham/spam/no
      ... > that threshold it does not use bayes to compute the score. ... it's 100 each of spam and ham for Bayes to activate. ... SpamAssassin requires at least 3 points from the header and 3 ...
      (Fedora)
    • Re: Spamassassin and Spambayes
      ... I use kmail - supposedly, kmail detects spamassassin and give you the option of configuring/using it - it provides training buttons on the menu-bar, so you can go through your inbox and separate the spam; I also trained on a number of ham messages that had been filtered into various folders, training on several from each folder. ... Today, I finally got Spambayes running; after initial training on about 200+ messages, I'm already getting around 95% spam detection... ... (And for the most part those train at near Bayes 0.50.) ...
      (Fedora)
    • Re: For sale Brand New Juicy Couture Sidekick II for $120
      ... Ideally spam should be stopped at source, but I don't suppose there's much chance of that happening. ... BogoFilter authors or the SpamAssassin authors. ... There's no way a BAYES engine alone can do this ... out on rule based flexibility and these BL lists. ...
      (Fedora)
    • Re: Spamassassin and Spambayes
      ... (And for the most part those train at near Bayes 0.50.) ... Is there a better way to train spamassassi ... > the spam cutoff or slightly below it. ... You have obviously figured out spamassassin - every time I've tried, I've found the documentation cryptic and tedious - maybe there's better out there, and I need to work on it some more, but, in the spirit of your last quoted sentence just above, after getting Spambayes working yesterday afternoon, and training on a couple of hundred messages, I came home this evening and found only two spam mails in my inbox - there were 313 classified spam mails in the trash, and after going through those, there was only one false positive, and that was from a commercial advertising list I'm subscribed to - I guess my solution ain't broke either... ...
      (Fedora)
    • Re: Spamassassin and Spambayes
      ... (And for the most part those train at near Bayes 0.50.) ... the spam cutoff or slightly below it. ... I first hit SA at ... spam mails in my inbox - there were 313 classified spam mails in the trash, ...
      (Fedora)