http:// qmail.jms1.net / dspam / index.shtml

Using DSPAM with qmail

DSPAM is an open-source content-based email filtering system. The idea is similar to SpamAssassin, in that it "scores" each message passing through the filtering engine. It adds headers to the message which can be used by other scripts (such as maildrop) to control the delivery of the message (i.e. send spam to a dedicated spam folder, or delete spam if the score is high.) These headers can also be used by MUAs (Mail User Agents, programs which are used to read and write email messages, such as Thunderbird) to trigger client-side rules which might do things like showing spam messages in red, or moving spam messages to a different folder (if the server doesn't do this automatically.)

I have a client who wants to build a new machine to replace his existing system (which has been running for about five years.) He has a typical small-company server: qmail with the combined patch, vpopmail, qmailadmin, vqadmin, simscan, and dovecot. He also only has a single domain on the machine, with a couple hundred mailboxes. On the new server he wants to use DSPAM to filter the incoming mail.

Until yesterday, I was vaguely aware of DSPAM, but I had never actually set it up or administered it. I'm writing this page to document how I built and deployed it on my own server. I will probably update this page as time goes by and I learn more about it.


Pre-Requisites

DSPAM can be built to use several different back-end storage mechanisms to physically store the data about the messages it has seen and which tokens (i.e. words, sequences of words) should serve to indicate that a given message is spam, or is "ham" (i.e. not spam.) Some of the storage back-ends require other software to be present and running on the machine.

As of the current version (which is dspam-3.10.2 as I write this) the following storage back-ends are available:

Before you start, you should choose which storage back-end you will be using. If it requires a database server (i.e. MySQL or PostgreSQL) you should make sure that the server is installed and running, and that you have "root" access to the database server, because part of the setup process will be to create a database and assign permissions to a userid which will be dedicated to DSPAM.

The directions below will show the actual commands needed to set up the MySQL database. If you are using PostgreSQL or SQLite, the documentation which comes in the DSPAM distribution will explain how to set things up.


Collect some messages

In your own mailbox (or a mailbox which you can use for testing, without affecting your users) create two IMAP folders. Copy 10-20 typical non-spam messages into one folder. Copy 10-20 spam messages into the other. We will be using these messages for testing the system later on.

When we use these messages later on, I'm going to assume that the folders are called "test-ham" and "test-spam", and that they are direct children of the INBOX folder. If you choose to call them something else, be prepared to adjust the names when we start testing things (below.)


Download DSPAM

Visit the SourceForge download link in your browser to download the latest DSPAM package. Then upload the file to the server.

On my own server I did this: instead of downloading the file to my desktop and then uploading it to the server, I visited the page above, but immediately hit ESC so that the automatic download didn't start. I then right-clicked on the "direct link" portion of the "Problems with the download? Please use this direct link, or try another mirror." message, and chose "Copy link location". Then, on the server, I typed a wget command, with single quotes around the URL, like so:

$ wget '(Do a PASTE here. The URL fills in.)'

However you end up downloading the file, it needs to be on the server, in the home directory of the non-root user you will be using to configure and compile the software.


Configure and compile the software

Before you configure the software, you need to create a userid to run the software, and the "dspam home" directory where it keeps its log files. Depending on which storage back-end you're using, it may also keep the per-user preferences there, or the spam/ham database files.

I created a dedicated "dspam" user for this, and set the user's home directory to "/var/dspam", which is the "dspam home" directory. The commands looked like this:

# useradd -s /sbin/nologin -d /var/dspam -M dspam
# mkdir -m 0700 /var/dspam
# chown dspam:dspam /var/dspam

Once the user and the dspam home directory existed, I expanded the software. I then wrote a "go" script which contains the configure script, because (1) it's easier to check the command before you run it, (2) unless you delete the script, you can refer back to it later to see how you built the software, and (3) when you upgrade to a new version, you can copy the "go" script from the old version and (if necessary) edit the file in order to build the new version.

You may need to change a few things in the configure command line to match your system. The command line shown below is the one I used on my own server (running CentOS 5.8.)

Here's what the procedure looked like for me:

$ cd
$ tar xzf dspam-3.10.2.tar.gz
$ cd dspam-3.10.2
$ nano go (I prefer nano, feel free to use whatever editor you like.)
#!/bin/bash ./configure \ --enable-daemon \ --enable-debug \ --enable-clamav \ --enable-domain-scale \ --enable-long-usernames \ --with-dspam-home=/var/dspam \ --with-dspam-home-owner=dspam \ --with-dspam-home-group=dspam \ --with-dspam-owner=dspam \ --with-dspam-group=dspam \ --with-storage-driver=mysql_drv \ --with-mysql-includes=/usr/include/mysql \ --with-mysql-libraries=/usr/lib/mysql \ --enable-preferences-extension \ --enable-virtual-users
$ chmod 0755 go
$ ./go
...
$ make
...
$ sudo make install
[sudo] password for jms1:
Making install in .
...
$ sudo ldconfig

The last command, "ldconfig", rebuilds the cache used by ld to find shared libraries. Normally if a Makefile ends up installing any shared libraries it will run this command, but I've seen cases where the developer forgets to add it in there, and it doesn't really hurt anything to run it again, so I've gotten into the habit of always running this command whenever installing something which includes shared libaries.


Set up the database

Before we can actually run the software, we will need to create a database with the tables used by DSPAM. My server is using the mysql_drv storage back-end, so I had to create a database to hold the data, and a mysql user to access the data.

$ cd ~/dspam-3.10.2/src/tools.mysql_drv $ mysql -u root -p Enter password: (Enter your mysql root password.) Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 11 Server version: 5.0.95 Source distribution Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE DATABASE dspam ; Query OK, 1 row affected (0.00 sec) mysql> \. mysql_objects-4.1.sql Query OK, 0 rows affected (0.04 sec) Query OK, 0 rows affected (0.06 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.06 sec) Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.06 sec) Query OK, 0 rows affected (0.04 sec) Query OK, 0 rows affected (0.03 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> \. virtual_users.sql Query OK, 0 rows affected (0.05 sec) Query OK, 0 rows affected (0.04 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> GRANT ALL ON dspam.* TO dspam@localhost IDENTIFIED BY 'p4ssw3rd' ; Query OK, 0 rows affected (0.00 sec) mysql> \q

2012-08-02 I got an email from David Wadson on the qmail-patch mailing list, reminding me to convert the tables from MyISAM to InnoDB. I vaguely remember reading something about that while I was figuring out how to set up DSPAM, but it didn't "click" because I don't normally use MySQL. I figured I would just get it running, write the web page, and then come back to it later.

I didn't realize just how quickly the tables would grow - in less than a week, my dspam_token_data table has grown to over a million records. As you can see, it took quite a while to convert - and during that time, I had to shut down qmail entirely, so that incoming messages wouldn't try to use DSPAM and have to wait for the conversion to finish before they could be processed and delivered.

Do yourself a favour - DON'T WAIT to do this conversion. The more data that builds up, the longer the conversion will take, and during the time that the conversion is happening, DSPAM will not work because it won't be able to access whichever table happens to be in the middle of being converted at the time.

In addition, creating a few indexes on the dspam_token_data table can, with a few minor changes to the purge-4.1.sql script, greatly increase the speed of the purge process. It's easier and faster to create these indexes before the database has any data. The "ALTER TABLE ... ADD INDEX" queries shown below will do this.

$ mysql -u dspam -p dspam Enter password: (Enter the "dspam" mysql user's password.) Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 245 Server version: 5.0.95 Source distribution Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> ALTER TABLE dspam_preferences ENGINE=innodb ; Query OK, 1 row affected (0.50 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE dspam_signature_data ENGINE=innodb ; Query OK, 475 rows affected (2.04 sec) Records: 475 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE dspam_stats ENGINE=innodb ; Query OK, 15 rows affected (0.31 sec) Records: 15 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE dspam_token_data ENGINE=innodb ; Query OK, 1175704 rows affected (1 hour 16 min 35.29 sec) Records: 1175704 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE dspam_virtual_uids ENGINE=innodb ; Query OK, 15 rows affected (1.13 sec) Records: 15 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE dspam_token_data ADD INDEX( spam_hits ) ; Query OK, 1175704 rows affected (52.53 sec) Records: 1175704 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE dspam_token_data ADD INDEX( innocent_hits ) ; Query OK, 1175704 rows affected (1 min 10.16 sec) Records: 1175704 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE dspam_token_data ADD INDEX( last_hit ) ; Query OK, 1175704 rows affected (1 min 36.51 sec) Records: 1175704 Duplicates: 0 Warnings: 0 mysql> \q

David has also also written his own page about using DSPAM with qmail, which included the queries shown above.

He also included a link to a page showing how to optimize the DSPAM purge process by creating these indexes and modifying the queries in the purge-4.1.sql script to take advantage of them. However, in dspam-3.10.2.tar.gz it looks like the queries in the purge-4.1.sql script have already been re-structured to take advantage of these indexes (i.e. the WHERE clauses have been re-written so that the indexed fields are not within a function call) even though the mysql_objects-4.1.sql file doesn't create the indexes. Strange but true.

I will be setting up the purge script as a cron job on my own server within the next day or two, and will be updating this web page with that information.

2012-08-23 While doing an install on another client's machine, I discovered (the hard way) that the GRANT ALL query needs to be run AFTER the scripts which create the tables. I have adjusted the order of the queries in the example above.


Configure DSPAM

The next step is to create the dspam.conf file which configures the software. When you did the "make install" above, it created this file for you, however we need to change several things.

The first step is finding the file. If your configure command line did not include a "--sysconfdir" option, you should find this file in the /usr/local/etc directory.

When you find the file, first make a backup copy of the original file so you have something to refer back to:

# cp -a dspam.conf dspam.conf.dist

Then edit the file, using a text editor. I prefer nano, but feel free to use whatever editor you like.

# nano dspam.conf

You should probably go through the entire file and try to become familiar with what's there. Here's a list of the things I ended up changing:


Test DSPAM by hand

Remember at the top of the page, when I asked you to put a few spam and ham messages aside? This is where we're going to use some (but not all) of them. We're going to pick a few spam messages, and a few ham messages, and feed them into DSPAM for classification.

Again, these directions are going to assume that the folders are direct children of INBOX, and that they are called "test-ham" and "test-spam".

Start by cd'ing to the physical directory where the spam messages are stored, and listing the files.

# cd ~vpopmail/domains/domain.xyz/userid/Maildir/.test-spam/cur
# ls -1
1343491113.M126703P3740V000000000000CA01I005900A3_0.server.domain.xyz,S=1511:2,
1343500579.M570776P6229V000000000000CA01I005900A2_0.server.domain.xyz,S=6965:2,
1343506628.M396153P7391V000000000000CA01I005900A4_0.server.domain.xyz,S=8393:2,
1343514125.M510854P9684V000000000000CA01I005900A5_0.server.domain.xyz,S=3953:2,
1343522832.M87742P11325V000000000000CA01I005900A6_0.server.domain.xyz,S=39954:2,
1343523432.M594299P11417V000000000000CA01I005900A7_0.server.domain.xyz,S=4002:2,
1343536180.M515867P14022V000000000000CA01I005900A8_0.server.domain.xyz,S=4127:2,
1343544922.M460636P15863V000000000000CA01I005900A9_0.server.domain.xyz,S=3253:2,
1343560401.M161930P12625V000000000000CA01I0061C048_0.server.domain.xyz,S=4790:2,
1343574392.M805375P17565V000000000000CA01I006180C5_0.server.domain.xyz,S=17318:2,
1343620848.M379166P27347V000000000000CA01I006180D5_0.server.domain.xyz,S=13946:2,
1343634463.M188220P30153V000000000000CA01I0023006E_0.server.domain.xyz,S=3033:2,
1343643453.M680065P4748V000000000000CA01I0023004F_0.server.domain.xyz,S=2038:2,
1343646941.M689919P5313V000000000000CA01I00230083_0.server.domain.xyz,S=10693:2,
1343650420.M122150P6013V000000000000CA01I005900AA_0.server.domain.xyz,S=3818:2,
1343654992.M514765P7204V000000000000CA01I005900AB_0.server.domain.xyz,S=4608:2,
1343661333.M803929P9153V000000000000CA01I005900AC_0.server.domain.xyz,S=2634:2,
1343670657.M593803P12608V000000000000CA01I005900AD_0.server.domain.xyz,S=3865:2,
1343673448.M713678P13671V000000000000CA01I0061C04A_0.server.domain.xyz,S=12435:2,

Highlight the first filename and do a COPY. Then ask DSPAM to process it, using a command like this:

# cat '(do a PASTE, the filename should appear)' | dspam --user userid@domain.xyz --deliver=summary
X-DSPAM-Result: user@domain.xyz; result="Innocent"; class="Innocent"; probability=0.0000; confidence=0.80; signature=N/A

DSPAM has a "--classify" option which will examine a message and decide whether it's ham or spam, without modifying the message or using it to further train the user. However, this will not work if DSPAM has never processed at least one message for that user. Therefore, the first time you run a dspam or dspamc command for a particular user, it cannot be a "--classify" command.

If you see output which looks like this, then DSPAM is working correctly. It may not be accurate yet, but that's because it doesn't have any data on what kinds of messages are spam or ham. The accuracy will improve in time - in my own case, after feeding it a corpus (a collection of messages whose spam/ham status is known) of about a thousand spams and about 350 hams, it has become VERY accurate.

This particular message was spam, but it was incorrectly classified as "Innocent" because the user had no training data, and the message itself was one which was written to look like a legitimate message (i.e. it was advertising for a software company, but it was one I had never heard of, trying to sell custom vertical market programs.) We should probably tell DSPAM that this particular message is indeed spam. (This is known as "training".)

# cat '(PASTE)' | dspam --user userid@domain.xyz --class=spam --source=corpus --deliver=summary
X-DSPAM-Result: user@domain.xyz; result="Spam"; class="Spam"; probability=1.0000; confidence=1.00; signature=N/A

When training a message that you know is not spam, you will use the same command, but instead of using "--class=spam" you will use "--class=innocent".

If you ask DSPAM to classify this mesage again, it will probably still consider it "Innocent", but now it won't be so sure. Compare the "confidence" number to the output from before training the message:

# cat '(PASTE)' | dspam --user userid@domain.xyz --classify
X-DSPAM-Result: userid@domain.xyz; result="Innocent"; class="Innocent"; probability=0.7127; confidence=0.75; signature=N/A

Also, now that we have trained this particular message, we should probably delete it, so that it doesn't accidentally get trained again.

# rm '(PASTE)'


Set up the dspam daemontools service

Setting up the dspam service is just like setting up any other daemontools service. The only tricky part is usually writing the "run" scripts, and in this case the scripts were very simple.

We are doing one slightly unusual thing here... We are creating a "var" directory, which will be owned by the dspam user. This will give the dspam server process somewhere to write out the dspam.pid file, without having to open up permissions on the /var/run directory. (If you used a different value for your ServerPID entry in your dspam.conf file, you may not need to do this here.)

# mkdir -m 1755 /var/service/dspam
# cd /var/service/dspam
# mkdir -m 0750 log var
# wget http://qmail.jms1.net/dspam/service-dspam-run
...
# mv service-dspam-run run
# wget http://qmail.jms1.net/dspam/service-dspam-log-run
..
# mv service-dspam-log-run log/run
# chmod 0750 run log/run
# chown dspam:dspam log var
# chown root:dspam run log/run

Here are the download links, sizes, and checksums for the two files:

File: service-dspam-run
Size: 123 bytes
Date: 2012-07-30 23:30:41 +0000
MD5: 6dba87b8bc6c0722aa198559c79c0a0e
SHA-1: d239ebb7014ecbfed11b056f8d1d3b2da1002622
RIPEMD-160: 26bf3670ef1d5335966a5cb17e7a112195632a20
PGP Signature: service-dspam-run.asc
File: service-dspam-log-run
Size: 99 bytes
Date: 2012-07-30 23:30:41 +0000
MD5: 4ead3cb79badbb1baf7c969d9f9a3d3c
SHA-1: 24c14a92e2910bf98d5f62dbf51ba0c058545d7a
RIPEMD-160: f32eb4a1c25062f529c4a3ebba8dccd93e3d455b
PGP Signature: service-dspam-log-run.asc

Before you start the script, check /var/service/dspam/run. The last line is the actual dspam command which will run the daemon process. If it's not already there, add the "--debug" option to the end of this command line in order to see what the process is doing, for every message it processes. (Without this option, you will only see messages when the service starts and stops - there won't be any output showing that messages are being processed.)

Once the directory is set up and the permissions are set, you should be able to test the service by manually running the "run" script. This is pretty much the same thing that daemontools will be doing, but without daemontools being involved.

# cd /var/service/dspam
# ./run
9561: [07/30/2012 17:17:56] dspam_init_driver: initializing lock 0
9561: [07/30/2012 17:17:56] dspam_init_driver: initializing lock 1
9561: [07/30/2012 17:17:56] dspam_init_driver: initializing lock 2
9561: [07/30/2012 17:17:56] dspam_init_driver: initializing lock 3
9561: [07/30/2012 17:17:56] dspam_init_driver: initializing lock 4
9561: [07/30/2012 17:17:56] dspam_init_driver: initializing lock 5
9561: [07/30/2012 17:17:56] dspam_init_driver: initializing lock 6
9561: [07/30/2012 17:17:56] dspam_init_driver: initializing lock 7
9561: [07/30/2012 17:17:56] dspam_init_driver: initializing lock 8
9561: [07/30/2012 17:17:56] dspam_init_driver: initializing lock 9
9561: [07/30/2012 17:17:56] Spawning daemon listener
9561: [07/30/2012 17:17:56] Creating local domain socket /tmp/dspam.sock

You should see output similar to this, and then nothing... there will be no command prompt returned yet. This is because the dspam server process is running in the foreground. The output you see in this window is what it would normally send to the daemontools service's log file.

We can try using the service by using the same commands shown above, however instead of using the dspam command, we'll be using the dspamc command.

In another window, cd to the IMAP mailbox and use dspamc to classify, and possibly train, a message:

# cd ~vpopmail/domains/domain.xyz/userid/Maildir/.test-spam/cur
# ls -1
... (Choose a file, highlight it, and do a COPY.)
# cat '(do a PASTE, the filename should appear)' | dspamc --user userid@domain.xyz --classify
X-DSPAM-Result: user@domain.xyz; result="Innocent"; class="Innocent"; probability=0.0000; confidence=0.80; signature=N/A
# cat '(PASTE)' | dspamc --user userid@domain.xyz --class=spam --source=corpus --deliver=summary
X-DSPAM-Result: user@domain.xyz; result="Spam"; class="Spam"; probability=1.0000; confidence=1.00; signature=N/A
# cat '(PASTE)' | dspamc --user userid@domain.xyz --classify
X-DSPAM-Result: userid@domain.xyz; result="Innocent"; class="Innocent"; probability=0.7992; confidence=0.77; signature=N/A
# rm '(PASTE)' (make sure we don't end up re-training the same message again in the future.)

As you run each dspamc command, you should see a burst of log messages scrolling by in the window where the server process is running.

When you are satisfied that dspamc is talking to the server, you can stop the server process (click back into the window where it's running and hit CONTROL-C) and finish activating the service.

You may or may not want to remove the "--debug" option from the dspam command line within the run script. Personally, I left mine in there, since multilog automatically trims the log files within the log/main directory.

9561: [07/30/2012 17:46:26] Burton-Bayesian Probability: 0.000145 Samples: 27
9561: [07/30/2012 17:46:26] using Graham factors
9561: [07/30/2012 17:46:26] Result Confidence: 0.48
9561: [07/30/2012 17:46:26] total processing time: 0.08668s
9561: [07/30/2012 17:46:26] libdspam returned probability of 0.999989
9561: [07/30/2012 17:46:26] message result: SPAM
9561: [07/30/2012 17:46:26] checking trusted user list for dspam(526)
^C
9561: [07/30/2012 17:47:09] dspam_shutdown_driver: destroying lock 0
9561: [07/30/2012 17:47:09] dspam_shutdown_driver: destroying lock 1
9561: [07/30/2012 17:47:09] dspam_shutdown_driver: destroying lock 2
9561: [07/30/2012 17:47:09] dspam_shutdown_driver: destroying lock 3
9561: [07/30/2012 17:47:09] dspam_shutdown_driver: destroying lock 4
9561: [07/30/2012 17:47:09] dspam_shutdown_driver: destroying lock 5
9561: [07/30/2012 17:47:09] dspam_shutdown_driver: destroying lock 6
9561: [07/30/2012 17:47:09] dspam_shutdown_driver: destroying lock 7
9561: [07/30/2012 17:47:09] dspam_shutdown_driver: destroying lock 8
9561: [07/30/2012 17:47:09] dspam_shutdown_driver: destroying lock 9
# ln -s /var/service/dspam /service/
# (Wait about ten seconds.)
# svstat /service/dspam /service/dspam/log
/service/dspam: up (pid 17449) 8 seconds
/service/dspam/log: up (pid 17451) 8 seconds
# tail /service/dspam/log/main/current
@40000000501701523584b4dc 17449: [07/30/2012 17:48:56] dspam_init_driver: initializing lock 2
@40000000501701523584bcac 17449: [07/30/2012 17:48:56] dspam_init_driver: initializing lock 3
@40000000501701523584c094 17449: [07/30/2012 17:48:56] dspam_init_driver: initializing lock 4
@40000000501701523584c864 17449: [07/30/2012 17:48:56] dspam_init_driver: initializing lock 5
@40000000501701523584cc4c 17449: [07/30/2012 17:48:56] dspam_init_driver: initializing lock 6
@40000000501701523584ff14 17449: [07/30/2012 17:48:56] dspam_init_driver: initializing lock 7
@4000000050170152358502fc 17449: [07/30/2012 17:48:56] dspam_init_driver: initializing lock 8
@400000005017015235850acc 17449: [07/30/2012 17:48:56] dspam_init_driver: initializing lock 9
@400000005017015236bea7f4 17449: [07/30/2012 17:48:56] Spawning daemon listener
@400000005017015236beb3ac 17449: [07/30/2012 17:48:56] Creating local domain socket /tmp/dspam.sock

Congratulations, your dspam daemontools service is up and running.


Test the DSPAM service

You can test the daemontools service by running the same dspamc commands you used when the service was running without daemontools. (This example shows training a ham (i.e. non-spam) message.)

# cd ~vpopmail/domains/domain.xyz/userid/Maildir/.test-ham/cur
# ls -1
... (Choose a file, highlight it, and do a COPY.)
# cat '(do a PASTE, the filename should appear)' | dspamc --user userid@domain.xyz --classify
X-DSPAM-Result: user@domain.xyz; result="Innocent"; class="Innocent"; probability=0.0000; confidence=0.80; signature=N/A
# cat '(PASTE)' | dspamc --user userid@domain.xyz --class=innocent --source=corpus --deliver=summary
X-DSPAM-Result: user@domain.xyz; result="Innocent"; class="Innocent"; probability=1.0000; confidence=1.00; signature=N/A
# cat '(PASTE)' | dspamc --user userid@domain.xyz --classify
X-DSPAM-Result: user@domain.xyz; result="Innocent"; class="Innocent"; probability=0.0000; confidence=0.89; signature=N/A
# rm '(PASTE)' (don't re-train the same message again in the future.)


Integrate DSPAM with vpopmail

This is probably going to be the most compliacated part of the whole process, because we're going to be manually changing the commands in the domains' .qmail-default files (and other .qmail-* files, if they exist.) The changes we make will depend on what kind of processing is happening in each file.

Change permissions on dspam.conf

When qmail-local processes a local delivery for a vpopmail domain, it needs to read the dspam.conf file to get the named pipe filename and authentication information necessary to reach the dspam service. It executes as userid vpopmail and group vchkpw, but it does not inherit any supplementary groups that the vpopmail user may have been added to. Therefore, we need to change the permissions on the dspam.conf file so that it's readable to this userid or this group ID.

There are two ways to do this. On my own server, I made the file owned by the dspam user and the vchkpw group. You could also just make the file world-readable, however this is not recommended unless normal (i.e. non-administrative) users have no access to the machine at all.

# chown dspam:vchkpw /usr/local/etc/dspam.conf

2012-08-23 I forgot to add this to my notes when I originally got DSPAM working on my own machine, so I forgot to add it to this page until just now. I've been working on a server for a new client, and forgetting this step has caused me to spend about four extra hours trying to figure out the problem again. I knew I had seen it before, but I couldn't remember the exact details of how I fixed it on my own server. Eventually I went back to my own server and checked the permissions and ownership on every related file, and when I saw the "dspam:vchkpw" ownership, it all came back to me. Adding the vpopmail user to the dspam group does not work in this case. Derp.

Changing a domain's .qmail-default file

For each domain, you will need to go to that domain's directory. This will almost always be "~vpopmail/domains/domain.xyz", but to be sure you can use the vdominfo command:

# ~vpopmail/bin/vdominfo -d domain.xyz
/home/vpopmail/domains/domain.xyz
# cd /home/vpopmail/domains/domain.xyz
# ls -1 .qmail*
.qmail-default

As you probably know already, the ".qmail-default" file handles incoming mail for any recipient address where a specific ".qmail-user" file doesn't exist. In many cases, this will be the only .qmail-* file in the domain.

Start by looking at the current contents of each file. We will be adding a "dspamc ... --deliver=stdout" command to the beginning of each line (or to certain lines, in case the file has multiple lines) however the structure of that addition depends on what the line currently looks like.

As explained in the dot-qmail man page, there are five valid types of lines in a .qmail file. If the line...

Note that some .qmail-* files may contain commands intended to be run in a specific sequence. Be sure you understand exactly what every command in a .qmail-* is doing before you change it. Also, if it isn't obvious, be sure to save backups of the files so that you can quickly restore the previous contents in case of problems.


Test DSPAM with vpopmail

This is actually fairly simple - send yourself an email, let it be processed by vpopmail and DSPAM, and check the received messages for the headers that DSPAM adds. They will look something like this:

X-DSPAM-Result: Innocent
X-DSPAM-Processed: Mon Jul 30 12:11:12 2012
X-DSPAM-Confidence: 0.9899
X-DSPAM-Improbability: 1 in 9809 chance of being spam
X-DSPAM-Probability: 0.0000
X-DSPAM-Signature: 1,5016b220131959020616053

In addition, if you forgot to set "Preference "signatureLocation=headers"" in your dspam.conf file, you may see a line at the bottom of the message body which looks like this:

!DSPAM:1,5016b220131959020616053!

The signature code in this line should match the X-DSPAM-Signature: header in the same message.


Maintaining DSPAM

As DSPAM runs, it builds up a lot of data about the tokens (words, and/or sequences of words) it sees in the messages it processes for each user. In order to keep the size of the database down to a manageable size, you need to clean out the database periodically.

There are two different procedures involved in cleaning up the database. One is the "dspam_clean -u" command, and the other is (for MySQL) the purge-4.1.sql script.

The purge-4.1.sql script is not installed as part of the normal "make install" process. You will need to manually copy it somewhere that it will be available. I've installed mine in the dspam daemontools service directory:

# cd ~jms1/dspam-3.10.2/src/tools.mysql_drv
# install -m 0644 purge-4.1.sql /var/service/dspam/

Once this is done, you can write a simple script which can be executed via cron which runs both commands. However, providing the dspam mysql user's password without using the command line is a bit tricky. I got around this by creating /var/dspam/.my.cnf like so:

# cd /var/dspam
# nano .my.cnf (Feel free to use whatever editor you like)
[client] user=dspam password=p4ssw3rd
# chown dspam:dspam .my.cnf
# chmod 0600 .my.cnf

Then, I wrote /var/service/dspam/cron.cleanup with the following contents:

#!/bin/bash PATH="/usr/local/bin:/usr/bin:/bin" cd /var/service/dspam for n in debug messages do if [ -f /var/dspam/log/dspam.$n ] then echo "===== Trimming /var/dspam/log/dspam.$n =====" mv /var/dspam/log/dspam.$n /var/dspam/log/dspam.$n.$( date +%s ) /usr/local/bin/delbut -3 /var/dspam/log/dspam.$n* fi done echo "===== Running \"dspam_clean -u\" =====" dspam_clean -u echo "===== Running purge-4.1.sql =====" cat purge-4.1.sql | ( HOME=~dspam setuidgid dspam mysql dspam )

The mysql program looks for "$HOME/.my.cnf" for its configuration. Because "setuidgid" doesn't set the HOME variable to the userid it's changing to, we need to explicitly set the variable in the script, as shown here.

The last piece is to install the script so that cron will run it. On my system I created /etc/cron.d/clean-dspam with the following contents:

MAILTO="badguy1@jms1.net" 38 1 * * 0 root /var/service/dspam/cron.cleanup

Obviously the MAILTO address shown here is not the one I'm actually using. The "badguy" addresses are used on one of the other pages on this site (I believe it was the example of the validrcptto mechanism disconnecting a spammer after ten invalid RCPT commands.) A few days after I created that page, I found that several spammers had harvested the addresses on that page and were sending spam to them. So I added them to my server, as honeypots, so that these spammers would add themselves to my private blacklist. It's been over a year since I did this, and the spammers are STILL using them, and in the process adding their compromised machines to my blacklist.

This runs the script once a week, at 01:38 local time every Sunday.

2012-09-30 A few weeks ago, I got a panicked phone call from one of my users, saying that all of their incoming messages were arriving as empty messages, with no sender, subject, body, or anything. When I looked at the files in their mailbox, each one only had the first two lines of headers, and then nothing. Because I was in the middle of packing everything up to move, I didn't really have time to dig into the problem, so I pulled dspam out of the processing chain and the messages started arriving, albeit without having been processed by dspam.

After I was finished with the move and had time to look at it, I discovered that any messages processed through the dspam daemontools service were being truncated. I spent several hours trying to debug the problem, and ended up throwing my hands in the air in disgust, because dspam doesn't offer a whole lot in the way of useful diagnostic logging.

This morning, after only one cup of coffee, I decided to have another look at it. I started by wiping out and re-loading the mysql database, but that didn't help much... I was able to do two successful "dspam ... --deliver=summary" commands on the command line, but after that it stopped working. Then I tried adding a "--debug" option to the same command, and got this:

# cat 1344346871.M196488P31171.phineas.jms1.net,S=14126:2,Sabe | dspam --debug --user badguy7@jms1.net --class=spam --source=corpus --deliver=summary
Filesize limit exceeded
Exit 153

I did a search through the source code, and it turns out the text "Filesize limit exceeded" does not appear at all. Then I did some google searching, and found that this is an error message from glibc, which means one of two things: either it's trying to write a file which is too big to be supported by the underlying filesystem (I'm using ext3, so that shouldn't be an issue) or the filesystem is out of space. One quick "df" command later, and it turned out that the filesystem where "/var/dspam" was stored, was 95% full. Most filesystems reserve a certain amount of space for the root user, so that normal users can't fill up a filesystem... and this is what happened to me. Because dspam runs as non-root, it wasn't able to extend one of these log files any further, so it crashed with this "Filesize limit exceeded" error, but didn't tell which file it had been trying to write to.

When I wrote the "cron.cleanup" script, I forgot about the log files that dspam writes under "/var/dspam/log", and they had filled up the filesystem. I have since added a block which will cut the files every day, and then use my delbut script to delete all but the three newest files.

Aside from the fact that I wiped my database when I didn't need to, everything seems to be working normally now, messages are being processed by dspam, and my client-side filters are handling the messages as expected.