6.2. SpamAssassin

Both SpamAssassin and amavisd-new are Perl programs, and amavisd-new includes SpamaAssassins libraries so it doesn't need SpamAssassin daemon running on the server. We are going to turn off the daemon and prevent it from starting up during boot.

/etc/init.d/spamassassin stop
update-rc.d -f spamassassin remove
etckeeper commit "Removed spamassassin from rcX.d"

To enable DKIM checking of received emails in SpamAssassin one has to install Mail::DKIM Perl library.

apt-get install libmail-dkim-perl

Edit /etc/spamassassin/v312.pre and check that this line is uncommented:

loadplugin Mail::SpamAssassin::Plugin::DKIM

We are also going to install Pyzor and Razor for additional checks.

apt-get install pyzor razor

After that you can try running SpamAssassing manually:

spamassassin -D -t < /usr/share/doc/spamassassin/examples/sample-spam.txt 2>&1 | tee sa.out

You should see DKIM mentioned in the sa.out file, and the end of the output should look something like this:

Content analysis details:   (1004.5 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
-0.0 NO_RELAYS              Informational: message was not relayed via SMTP
1000 GTUBE                  BODY: Generic Test for Unsolicited Bulk Email
 0.4 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
                            [cf: 100]
 0.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level
                            above 50%
                            [cf: 100]
 1.7 RAZOR2_CHECK           Listed in Razor2 (http://razor.sf.net/)
 2.0 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
 0.0 DIGEST_MULTIPLE        Message hits more than one network digest check
-0.0 NO_RECEIVED            Informational: message has no Received headers

To enable AWL edit /etc/spamassassin/v310.pre and uncomment:

loadplugin Mail::SpamAssassin::Plugin::AWL

Edit /etc/spamassassin/local.cf and add:

use_auto_whitelist 1

6.2.1. Training SpamAssassin

Bayes filtering is a strong weapon for fighting spam. It works by learning what is spam to you and what isn't. For SpamAssassing to start using Bayes filtering you have to train it first. Training your Bayes filters is something that you should do on a regular basis. The more emails it process the smarter it gets.

To learn SpamAssassin what is spam, you have to use the sa-learn utillity on a folder where your spam messages are stored (in my case the folder is called Junk).

Because SpamAssassin is run by amavisd-new you have to run the sa-learn utility as the amavis user.

su amavis -c 'sa-learn --no-sync --spam /home/vmail/example.com/demo/.Junk/cur'

To learn what is not spam run sa-learn in the folder that only contains your non-spam mail (in this case, sa-learn examines the Inbox folder).

su amavis -c 'sa-learn --no-sync --ham /home/vmail/example.com/demo/cur'

Bayes filtering will be used once you train SpamAssassin on more than 200 spam and ham messages.

To update SpamAssassin you can run:

sa-update -D

-D is for debug.

6.2.2. Move Bayes and AWL data to MySQL

Enter mysql and create database table and user for SpamAssassin.

CREATE DATABASE mail_spamassassin;
CREATE USER 'spamassassin'@'localhost' IDENTIFIED BY 'new_password';
GRANT ALL PRIVILEGES ON `mail_spamassassin` . * TO 'spamassassin'@'localhost';
FLUSH PRIVILEGES;

Usw wget to download scripts that we will need from http://spamassassin.apache.org/full/3.0.x/dist/tools/.

cd /root
wget http://spamassassin.apache.org/full/3.0.x/dist/tools/convert_awl_dbm_to_sql

To create the tables in MySQL run:

mysql -u root -p mail_spamassassin < /usr/share/doc/spamassassin/sql/bayes_mysql.sql
mysql -u root -p mail_spamassassin < /usr/share/doc/spamassassin/sql/awl_mysql.sql

Edit /etc/spamassassin/local.cf and add this at the end:

bayes_store_module              Mail::SpamAssassin::BayesStore::MySQL   
 
bayes_sql_dsn                   DBI:mysql:mail_spamassassin:localhost
bayes_sql_username              spamassassin
bayes_sql_password              new_password
bayes_sql_override_username     amavis

auto_whitelist_factory          Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn                    DBI:mysql:mail_spamassassin:localhost
user_awl_sql_username           spamassassin
user_awl_sql_password           new_password

Now we need to initialise the database table:

su amavis -c 'sa-learn --spam /usr/share/doc/spamassassin/examples/sample-spam.txt'

If you are starting clean and do not have existing Bayes data you can skip importing of existing data into the database.

Move to the amavis users home folder and dump the existing data:

cd /var/lib/amavis/.spamassassin
su amavis -c 'sa-learn --sync --force-expire
su amavis -c 'sa-learn --backup > /root/backup.txt

To convert the AWL data:

cd /root
chmod +x convert_awl_dbm_to_sql
./convert_awl_dbm_to_sql 

To run the actuall convert copy this line and replace your password and other data you might have modified:

./convert_awl_dbm_to_sql --username amavis --dsn DBI:mysql:mail_spamassassin:localhost --dbautowhitelist /var/lib/amavis/.spamassassin/auto-whitelist --sqlusername spamassassin --sqlpassword new_password --ok 

To insert the Bayes data run:

su amavis -c 'sa-learn --restore backup.txt'

The existing data should be inserted into MySQL.

Restart amavisd-new and commit the changes made.

/etc/init.d/amavis restart
etckeeper commit "Moved Bayes and AWL data to MySQL"

If you have enough Bayes and AWL data you can test SpamAssassin like this:

atlantis:~# su amavis -c "spamassassin -D -t < /usr/share/doc/spamassassin/examples/sample-spam.txt 2>&1 | egrep '(bayes:|whitelist:|AWL)'" 
[26387] dbg: plugin: loading Mail::SpamAssassin::Plugin::AWL from @INC
[26387] dbg: bayes: using username: amavis
[26387] dbg: bayes: database connection established
[26387] dbg: bayes: found bayes db version 3
[26387] dbg: bayes: Using userid: 1
[26387] dbg: bayes: corpus size: nspam = 27443, nham = 7248
[26387] dbg: bayes: tok_get_all: token count: 65
[26387] dbg: bayes: score = 0.25382662007679
[26387] dbg: bayes: DB expiry: tokens in DB: 116828, Expiry max size: 150000, Oldest atime: 1287200200, Newest atime: 1298305866, Last expire: 1298265907, Current time: 1298310696
[26387] dbg: auto-whitelist: sql-based connected to DBI:mysql:mail_spamassassin:localhost
[26387] dbg: auto-whitelist: sql-based using username: amavis
[26387] dbg: auto-whitelist: sql-based get_addr_entry: no entry found for sender@example.net|ip=none
[26387] dbg: auto-whitelist: sql-based sender@example.net|ip=none scores 0/0
[26387] dbg: auto-whitelist: AWL active, pre-score: 1006.014, autolearn score: 6.014, mean: undef, IP: undef
[26387] dbg: auto-whitelist: sql-based add_score: created new entry for sender@example.net|ip=none with totscore: 6.014
[26387] dbg: auto-whitelist: sql-based finish: disconnected from DBI:mysql:spamassassin:localhost
[26387] dbg: auto-whitelist: post auto-whitelist score: 1006.014
[Warning]Warning

Some of the data from the database should be pruned regularly. For more information on how to create scripts that will automatically prune your database from stale records take a look here.