http:// qmail.jms1.net / honeypot.shtml

Setting up a Honeypot

A honeypot is a network, computer, or other system designed as a trap, to identify "bad people" and watch what they do, in order to learn about how they do it, so that defenses can be built against the attacks which are seen.

For a mail server, a honeypot is an email address which is a "trap" of sorts- it's an address which is otherwise not valid (and has never been valid.) Because it is not a valid address, you know that any messages which arrive for that address are not valid- usually they are spam. Depending on who you ask, they may call this a "spamtrap" rather than a "honeypot", but whatever you call it, it's a way to identify the IP addresses from which spam is sent.

Why would somebody want to set up a honeypot, you may ask? Speaking for myself, I'm doing it because I hate spam. I have several honeypot addresses set up, and whenever somebody (usually a spammer) sends something to one of these addresses, I know that the IP address which handed me the message is controlled by a spammer, and I can blacklist that IP address. I also forward the message itself to SpamCop, so that they can use the information to build and maintain their own blacklist (which is used by thousands, if not millions, of mail servers around the world.)

Background - how and why I started this

Because I used to host a web site for a friend called "Control Alt Delete" (not the one relating to comics and video games, this one was the online fan club for a synth-pop band called Information Society) I ended up owning the domain name "delete.net" when a disk crash on my server many years ago, coupled with a bad backup tape, conspired to destroy her site... she eventually built her own web site and gave up on the "Control Alt Delete" idea, and since I was the one who originally purchased the "delete.net" domain name for her, I ended up owning it.

One thing I noticed almost immediately was how much spam it received, and how many messages it received from people who thought they would be removed from the spammers' lists. Then I started looking at some of the spams, and apparently a lot of spammers thought it was cool to include something like "Send a message to delete@delete.net to be removed" at the bottom of their messages... hence the people emailing me, asking to be removed.

So I put up a "delete.net" web site with information about how they had been lied to, and hints about what they shouldn't be doing (i.e. replying to spam and confirming that their email address is valid.) When I did this, I noticed a very small decrease in the number of "remove me" messages, but the spam volume almost doubled- and looking at the logs, I noticed that over a third of the spam I was receiving, came from one specific class-C block (which I guessed was under the control of the spammer.)

Of course I blocked that IP range, not only from my server but from the ISP where I worked at the time- and not only did my own spam volume go down, but we actually got a few calls from users, commenting on how they weren't getting as much spam as they used to.

So I figured, if I can make that much of a dent in the spam by blocking one IP range, what other IP ranges could I block?

After a few months of this, I found I was spending almost an hour every day going through the mail servers' logs, looking for spammers' IP addresses, and decided that there had to be a way to automate the process... and thus was born the delete.net honeypot.

2008-01-18 I have sold the "delete.net" domain, and now use the dont-spam.us domain for the same purpose. It probably won't be as effective, but I *do* still have all of the old IPs, and the list is still active.


How it works

My blacklisting system consists of several items. I found it easier to develop the database and the "output" side of the system (i.e. generating and installing the DNS files) before developing the "input" side of the system (i.e. processing incoming messages.)

This is a list of the basic pieces:

The Database

The Output side

The Input side

This page describes the ideas behind how my own server is set up, and includes sample scripts which reflect how I'm doing things on my own server. However, you should not just blindly follow this page. Read through the information and scripts here, and make sure you understand how they work before you change anything on your server.

You need to understand how it all works, because if it breaks, YOU are the one who has to fix it, and YOU are the one who will be in trouble if it doesn't get fixed right away.


SQL Database

I use PostgreSQL as the back-end data storage mechanism for my blacklist. There are three tables- the blacklist itself, a whitelist of IP addresses which should never be added (either because you own them, or because they belong to somebody whom you trust to not be a source of spam), and a "control" table which holds the last time the data was updated.

The tables look like this:

CREATE TABLE rbl ( block inet NOT NULL , added timestamp with time zone DEFAULT ('now'::text)::timestamp with time zone , comments character varying(255) , filename character varying(255) , primary key ( block ) ) ; CREATE TABLE whitelist ( block inet NOT NULL , added timestamp with time zone DEFAULT ('now'::text)::timestamp with time zone , comments character varying(255) , primary key ( block ) ) ; CREATE TABLE control ( key character varying(40) NOT NULL , val_d timestamp with time zone , primary key ( key ) ) ; INSERT INTO control ( key , val_d ) VALUES ( 'last updated' , CURRENT_TIMESTAMP ) ;

You will also need to execute the appropriate GRANT statements in order to give the appropriate permissions to the various programs which will be interacting with the data. Because these will vary from system to system, and because you need to understand how it works, I will leave this as an exercise for the reader.


Blacklist Script

This is the script which reads the list of IP blocks from the database and generates output in the format suitable for whatever scheme you will use to actually reject the connections.

Before I started writing this part of the page, I had written four different scripts to extract the data and print it in different formats. When I started cleaning up the first one to make it suitable for release on the web page, I realized that they all do basically the same thing, and only differed in their output format- so I combined all four of them into one script, added the tinydns output format, and this web page is its documentation. I call it "rbl-output".

The script requires the IPaddr.pm module, available from my normal (i.e. non-qmail) web site. The other modules it needs are either standard modules which come with Perl, or are available through CPAN, the Comprehensive Perl Archive Network.

The script itself can be called as a command line program, or as a CGI program from a web site. When you run it from a command line, you can configure it using command line options. However, to use it as a CGI, you need to edit a few variables within the script itself. The same variables are the default values if you run the script from the command line.

The following options select the output format. They are mutually exclusive- do not use more than one of them on any command line. The descriptions also explain how to make each one the default by changing the script.

The following options modify the output:

This option changes what the script does:

File: rbl-output
Size: 10,909 bytes
Date: 2007-11-12 02:19:14 +0000
MD5: 6e5aeddab7760ead7285a980f2e84706
SHA-1: 19b224a71839af4f3f7fef1b85f25f3da88b57c7
RIPEMD-160: be0b8d2cc833386219163712b90acc72ea89636a
PGP Signature: rbl-output.asc

RBL Service

This is a daemontools service which watches a named pipe for any changes, and runs a script when it receives data (or just a "signal") through that pipe. It's based around the same pipe-watcher script that qmail-updater uses, although it obviously uses a different name for the pipe, and it runs a different script.

If you aren't familiar with this method, think of it as a safer alternative to a setuid binary. Using pipe-watcher and putting the pipe in a world-writable location allows any userid on the system to trigger an update, but not necessarily change how it happens.

Setting it up

Start by creating a normal daemontools service. I use the name "rbl-updater" on my server.

# mkdir -m 755 /var/service/rbl-updater
# cd /var/service/rbl-updater
# mkdir -m 755 log
# wget -O log/run http://qmail.jms1.net/scripts/service-any-log-run
...
# wget -O run http://qmail.jms1.net/scripts/service-pipe-watcher-run
...
# wget http://qmail.jms1.net/scripts/pipe-watcher
...
# wget http://qmail.jms1.net/scripts/rbl-output
...
# chmod 755 log/run run pipe-watcher rbl-output

For the wget commands, make sure to use an uppercase letter "O", rather than a zero or a lowercase "o".

The next step is to configure pipe-watcher with the names of the pipe you want it to watch, and the script you want it to run. The modifications to pipe-watcher are, for example:

# the name of the pipe to watch
my $pfile = "/tmp/update-rbl" ;

# the script (or program) we run whenever activity is seen
my $cmd = "/service/rbl-updater/go" ;
my $cmd_needs_data = 0 ;

# delay (in seconds) to keep the program from being flooded
my $min_delay = 5 ;

I am not providing a downloadable script for pipe-watcher to call when it detects a change, because everybody's systems are different, which means everybody's scripts will be different. The script itself should be fairly simple to write- basically, run rbl-output to create whatever kind of output you need, and write it to whatever file you need it in. It should then run whatever other commands are needed in order to make the new data "active".

Here's a simple example of what such a "go" script might look like:

#!/bin/sh

/usr/local/bin/rbl-output -t -u http://www.dont-spam.us/lookup.cgi/ \
        > /etc/tcp/smtp.rbl

cd /etc/tcp
cat smtp smtp.rbl | tcprules smtp.cdb.new smtp.cdb.tmp
chmod 644 smtp.cdb.new
mv smtp.cdb.new smtp.cdb

As you can see, it runs the rbl-output command to generate the file, uses "cat" to combine it with the "smtp" file (which presumably already exists) and send the combined output to the "tcprules" program, which builds "smtp.cdb.new". It then sets the permissions on the file and then renames it to "smtp.cdb". (This extra step, creating the file with a different name and then renaming it into place, is safer because it allows the script to set the permissions so it's world-readable before making it "live". This way there is no microscopic fraction of time during which the file is not world-readable.)


Reporting Script

The reporting script is executed whenever an email is received by one of the honeypot addresses. The information available to the script (i.e. the sender, recipient, IP address, and so forth) are governed by the MTA which runs the script. My server uses qmail, so the information and the script here will assume that you're using qmail. If not, you will need to write your own script.

I use a perl script called report-spam to handle the actual reporting. It is designed to be called from a .qmail file, and uses the RECIPIENT environment variable, which is set by the qmail-send program.


Reporting Mechanisms

A "reporting mechanism" is whatever needs to happen in order to feed spam into the "report-spam" script. There are any number of ways to make this happen- this is the two ways I'm currently doing it.

Honeypot addresses and/or domains

My server uses vpopmail to manage virtual domains, and in terms of email, all of the domains are "virtual". Creating a "honeypot" under vpopmail means creating an alias which points to the "report-spam" script. Here are two examples:

To create "honeypot@jms1.net" as a honeypot (which really is a honeypot address, don't send anything to it unless you're a spammer...)

# cd `vdominfo -d jms1.net`
# chmod +t .
# echo '| /usr/local/sbin/report-spam -h' > .qmail-honeypot
# chown vpopmail:vchkpw .qmail-honeypot
# chmod -t .

To create "dont-spam.us" as a honeypot domain (again, which it really is...)

# vadddomain dont-spam.us
Please enter password for postmaster:
# cd `vdominfo -d dont-spam.us`
# chmod +t .
# echo '| /usr/local/sbin/report-spam -h' > .qmail-default
# echo './postmaster/Maildir/' > .qmail-postmaster You don't want to accidentally send "postmaster" mail to the reporting script.
# chown vpopmail:vchkpw .qmail-*
# chmod -t .

Note that by totally overwriting the .qmail-default file, you are preventing vpopmail from working normally. Doing this will cause ALL mail sent to the domain to be reported. If you have normal mailboxes in that domain, you should NOT set up a full-domain forwarding like this, unless you're also going to create a ".qmail-userid" file for each mailbox, like I did with the "postmaster" mailbox above.

Also note that RFC 2821 section 4.5.1 requires that "postmaster" be a working email address. You should create a way for "postmaster" mail to be delivered, even if you are using the entire domain as a honeypot. (On my own server, it forwards to the "postmaster" account at the machine's primary domain.)

Cron job for auto-report folders

For the most part, the users on my server use IMAP to access their mailboxes, and are able to create folders on the server which are usable by any IMAP client, including the webmail interface. Several of the mailboxes on my server have folders called "0spam". If users receive spam, they move the message into that folder, and the server automatically runs the message through the report-spam script.

I did this using a very simple script called "auto-report-spam", which my server runs every 20 minutes from a cron job. The script itself looks like this:

#!/bin/sh PATH="/usr/bin:/bin:/usr/local/sbin" find ~vpopmail/domains -type d -name .0spam | \ ( while read d do find $d/{cur,new} -type f | \ ( while read f do report-spam $f && rm $f done ) done )

This "find | ( subshell )" construct is necessary because on a large server, there could be thousands of "0spam" folders, and any one of them could contain thousands of messages. The normal way to write a script like this would be to use backticks, however if the output of one of the "find" commands is over a certain limit (usually about 125K) it will exhaust the exec() buffer in the kernel, and the kernel will refuse to run the command.

For what it's worth, you can use any folder name you like. I chose "0spam" so that it would be at the top of each users' list of folders, so they can find it easily when they need it. The important part is to make sure that the name you choose is what's in the script.

Of course, once the script is written, you need to add it to the "cron" system, so that it runs every so often. The mechanics of this are left up to you. It is, after all, your server.