Un-Bayesing SpamAssassin

It was getting crazy last week. I was getting more and more and more spam. I would go to bed, wake up 8 hours later and have 50+ messages waiting for me. My SpamAssassin was just completely falling down on the job.

At first I lowered the hit level, and that didn't seem to help. Then I went throught he .spamassassin/user_prefs and added points for the following:

  score DISGUISE_PORN 3.0
  score PORN_16 4.0
  score PORN_6  4.0
  score PORN_PASSWORD 4.0
  score MUST_BE_18 4.0
  score ADULT_SITE 4.0
  score BEST_PORN 4.0
  score ITS_LEGAL 4.0
  score X_OSIRU_DUL 0.0
  score X_OSIRU_DUL_FH 0.0
  score X_OSIRU_OPEN_RELAY 0.0
  score X_OSIRU_SPAM_SRC 0.0
  score HTML_WEB_BUGS 4.0
  score HTML_IMAGE_ONLY_02 4.0
  score HTML_IMAGE_ONLY_04 4.0
  score HTML_IMAGE_ONLY_06 4.0

Nothing seemed to helping. I spent all day while I was working with a tail -f of .procmail.log in a window trying to monitor what was happening and comparing the spam headers I got to what was supposed but I couldn't figure it out. *Then* I noticed that many of the headers had a BAYES_0 in it. What that means is that the Bayesian filter had determined there was a 0% chance that the email was spam. Unlike what I first thought, instead of just leaving it at 0, the higher Bayes score actually *subtracted points* from the hit count, thus putting it under my spam limit.

Ahh. So first I modified the scores for the BAYES_xx but started getting false positives, which is bad. I was stumped a bit until my coworker Vineet told me that probably what had happened was the the Bayes filtering had "learned badly". Ahhhh. *That* made sense.

So instead of trying to untrain it, or whatever. I wacked the bayes_seen and bayes_tok files. I'm *sure* there are more elegant ways of doing it, but I figured I'd start from scratch and see if that helped. It definitely helped. I'm still getting *a ton* of spam, but Thunderbird is also helping.

Does anyone know the right score modified for attachments in general? I'm *sooo* fucking sick of that virus or whatever it is with insanely stupid text and a .zip file attachement. "I hate cleartext. Password is 21341254". AAAAhhh.

Anyways, that's my suggestion. I cannot wait until there are some solid solutions for Spam. It's just gotten to a crazy level lately.


< Previous         Next >