Re: [SLE] OT - SA defeating thesauri

From: Dylan (dylan_at_dylan.me.uk)
Date: 12/31/03

  • Next message: Stephen P. Molnar, Ph.D.: "Fwd: Re: [SLE] Configuring a Router"
    To: suse-linux-e@suse.com
    Date: Wed, 31 Dec 2003 16:32:26 +0000
    
    

    On Wednesday 31 December 2003 14:51 pm, Nick Selby wrote:
    <SNIP>
    > I guess that they're doing this to increase the message size with
    > un-spamlike words to decrease the ratio of spam-like words to
    > non-spam-like words? Does this sound right?

    Yes, that sounds quite plausible. What they are also doing is skewing
    the ratio of content-to-function words, in a grammatical sense. The
    ratio is relatively constant for a given language (for English approx
    25-35% function words, like it, is, that, ...) so a list of purely
    content words would likely be asy to identify - it having a 0% score of
    function words.

    >Anyone heard of this
    > and/or have a sense of how to defeat this strategy with current
    > available configurable settings on SA? Spastic? Anything?

    Not sure how to implement it with current setups, but you would only
    need to count the occurances of about 50 specified words and compare to
    total word count.

    Dylan

    >
    > TIA

    -- 
    Sweet moderation
    Heart of this nation
    Desert us not
    We are between the wars
    - Billy Bragg
    -- 
    Check the headers for your unsubscription address
    For additional commands send e-mail to suse-linux-e-help@suse.com
    Also check the archives at http://lists.suse.com
    Please read the FAQs: suse-linux-e-faq@suse.com
    

  • Next message: Stephen P. Molnar, Ph.D.: "Fwd: Re: [SLE] Configuring a Router"