A honeypot is a network, computer, or other system designed as a trap, to identify "bad people" and watch what they do, in order to learn about how they do it, so that defenses can be built against the attacks which are seen.
For a mail server, a honeypot is an email address which is a "trap" of sorts- it's an address which is otherwise not valid (and has never been valid.) Because it is not a valid address, you know that any messages which arrive for that address are not valid- usually they are spam. Depending on who you ask, they may call this a "spamtrap" rather than a "honeypot", but whatever you call it, it's a way to identify the IP addresses from which spam is sent.
Why would somebody want to set up a honeypot, you may ask? Speaking for myself, I'm doing it because I hate spam. I have several honeypot addresses set up, and whenever somebody (usually a spammer) sends something to one of these addresses, I know that the IP address which handed me the message is controlled by a spammer, and I can blacklist that IP address. I also forward the message itself to SpamCop, so that they can use the information to build and maintain their own blacklist (which is used by thousands, if not millions, of mail servers around the world.)
Because I used to host a web site for a friend called "Control Alt Delete" (not the one relating to comics and video games, this one was the online fan club for a synth-pop band called Information Society) I ended up owning the domain name "delete.net" when a disk crash on my server many years ago, coupled with a bad backup tape, conspired to destroy her site... she eventually built her own web site and gave up on the "Control Alt Delete" idea, and since I was the one who originally purchased the "delete.net" domain name for her, I ended up owning it.
One thing I noticed almost immediately was how much spam it received, and how many messages it received from people who thought they would be removed from the spammers' lists. Then I started looking at some of the spams, and apparently a lot of spammers thought it was cool to include something like "Send a message to delete@delete.net to be removed" at the bottom of their messages... hence the people emailing me, asking to be removed.
So I put up a "delete.net" web site with information about how they had been lied to, and hints about what they shouldn't be doing (i.e. replying to spam and confirming that their email address is valid.) When I did this, I noticed a very small decrease in the number of "remove me" messages, but the spam volume almost doubled- and looking at the logs, I noticed that over a third of the spam I was receiving, came from one specific class-C block (which I guessed was under the control of the spammer.)
Of course I blocked that IP range, not only from my server but from the ISP where I worked at the time- and not only did my own spam volume go down, but we actually got a few calls from users, commenting on how they weren't getting as much spam as they used to.
So I figured, if I can make that much of a dent in the spam by blocking one IP range, what other IP ranges could I block?
After a few months of this, I found I was spending almost an hour every day going through the mail servers' logs, looking for spammers' IP addresses, and decided that there had to be a way to automate the process... and thus was born the delete.net honeypot.
2008-01-18 I have sold the "delete.net" domain, and now use the dont-spam.us domain for the same purpose. It probably won't be as effective, but I *do* still have all of the old IPs, and the list is still active.
My blacklisting system consists of several items. I found it easier to develop the database and the "output" side of the system (i.e. generating and installing the DNS files) before developing the "input" side of the system (i.e. processing incoming messages.)
This is a list of the basic pieces:
A SQL database to hold the IP blocks on the blacklist, along with information about when and why the IP addresses were added.
On my server, I use a PostgreSQL database which contains the IP block, the date and time it was added, a description of why it was added (I also blacklist IP's which send me viruses), and in the case of honeypot messages, the filename where the original message is stored.
I'm using PostgreSQL because it has native data types for IP blocks, as well as operators which understand what IP blocks are (i.e. "is this IP block contained within this IP block?".) This feature saves a lot of time and effort when dealing with IP addresses. (A single IP address is stored as a block with a "/32" netmask.)
Blacklist scripts which read the SQL database and build the various data files needed in order to enforce your policies.
The main script generates a "data" file for use with the rbldns program (part of the djbdns package.)
I also have a script which creates a tcpserver access control file, which can be used by tcpserver to immediately refuse connections from blacklisted IP addresses, without wasting time on DNS lookups.
An "RBL updater" service, running under daemontools, which uses the above-mentioned blacklist scripts to generate and publish the list of IPs as a DNS-based blacklist. This is normally done using rbldns, but it can also be done using tinydns or some other DNS package.
You may also choose to build a tcpserver access control file instead of, or in addition to, the DNS-based list. I do this because on my own server, because it results in connections being immediately blocked by tcpserver, instead of rblsmtpd running and doing a DNS query before the message is blocked.
A reporting script which scans the incoming messages to weed out any legitimate messages, and takes action on the others.
On my server, the script adds the sender's IP address to the SQL database, triggers a rebuild of the blacklist files, forwards a copy of the message to SpamCop, and writes a copy of the message to an "evidence" repository where it can be reviewed later.
Reporting mechanisms. Once everything is set up, you need a mechanism which feeds spam into the reporting script.
A "honeypot" is an email address which feeds everything it receives into the reporting script automatically. On my server, I am using qmail as the MTA (the Mail Transfer Agent, or "mail server program") with vpopmail to manage multiple domains. I can create honeypot email addresses by creating a ".qmail" file which runs the reporting script, or an entire honeypot domain by editing a domain's ".qmail-default" file to run the script.
My server also runs a cron job which looks for a "0spam" folder in each vpopmail mailbox, sends any messages it finds to the reporting script, and then deletes them from the folder. The idea is that when users receive spam, they move the message into this folder, and the messages will be reported to Spamcop using my paid account, and if the message was sent directly to my server by an IP which wasn't whitelisted, the sending IP is automatically added to the database (and the RBL is rebuilt with new data.)
I chose the name "0spam" for this on purpose- the "0" at the beginning makes it appear at the top of the user's IMAP folder list, which makes it very easy for them to find when they need to drag their spam somewhere.
This page describes the ideas behind how my own server is set up, and includes sample scripts which reflect how I'm doing things on my own server. However, you should not just blindly follow this page. Read through the information and scripts here, and make sure you understand how they work before you change anything on your server.
You need to understand how it all works, because if it breaks, YOU are the one who has to fix it, and YOU are the one who will be in trouble if it doesn't get fixed right away.
I use PostgreSQL as the back-end data storage mechanism for my blacklist. There are three tables- the blacklist itself, a whitelist of IP addresses which should never be added (either because you own them, or because they belong to somebody whom you trust to not be a source of spam), and a "control" table which holds the last time the data was updated.
The tables look like this:
CREATE TABLE rbl ( block inet NOT NULL , added timestamp with time zone DEFAULT ('now'::text)::timestamp with time zone , comments character varying(255) , filename character varying(255) , primary key ( block ) ) ; CREATE TABLE whitelist ( block inet NOT NULL , added timestamp with time zone DEFAULT ('now'::text)::timestamp with time zone , comments character varying(255) , primary key ( block ) ) ; CREATE TABLE control ( key character varying(40) NOT NULL , val_d timestamp with time zone , primary key ( key ) ) ; INSERT INTO control ( key , val_d ) VALUES ( 'last updated' , CURRENT_TIMESTAMP ) ;
You will also need to execute the appropriate GRANT statements in order to give the appropriate permissions to the various programs which will be interacting with the data. Because these will vary from system to system, and because you need to understand how it works, I will leave this as an exercise for the reader.
This is the script which reads the list of IP blocks from the database and generates output in the format suitable for whatever scheme you will use to actually reject the connections.
Before I started writing this part of the page, I had written four different scripts to extract the data and print it in different formats. When I started cleaning up the first one to make it suitable for release on the web page, I realized that they all do basically the same thing, and only differed in their output format- so I combined all four of them into one script, added the tinydns output format, and this web page is its documentation. I call it "rbl-output".
The script requires the IPaddr.pm module, available from my normal (i.e. non-qmail) web site. The other modules it needs are either standard modules which come with Perl, or are available through CPAN, the Comprehensive Perl Archive Network.
The script itself can be called as a command line program, or as a CGI program from a web site. When you run it from a command line, you can configure it using command line options. However, to use it as a CGI, you need to edit a few variables within the script itself. The same variables are the default values if you run the script from the command line.
The following options select the output format. They are mutually exclusive- do not use more than one of them on any command line. The descriptions also explain how to make each one the default by changing the script.
-t selects the output format suitable for use in a tcprules access control file, using lines which look like this:
1.2.3.4:allow,RBLSMTPD="message"
The message is either the comments and date from the database, or the URL hard-coded into the program (or specified below) followed by the IP address.
This is the default output format. If you have changed it, you can make it the default again by setting $format = "t" within the script.
-T selects the output format suitable for use in a tcprules access control file, using lines which look like this:
1.2.3.4:deny,WHY="reason"
The reason is the comments and date from the database. Note that the WHY variable is not actually used by the system- it's there for your use, so if you look at the file you can see why a particular block was added to the list.
If you want this to be the default output format, or "the" output format when the script is run as a CGI, set $format = "T" within the script.
-r selects the output format suitable for use in a "data" file for rbldns. The output starts with a line like this...
:127.0.0.2:message
... which sets the IP address returned for all A reocrds, and the message returned in the TXT records. The message is either the URL (specified in the "$info_url"> variable, or using the "-u" option) with a "$" added to the end of it, or a default "We do not accept mail from $" message. (The "$" at the end tells rblsmtpd to substitute the client's IP address.)
Then, for each entry in the database, the output will contain a line like one of these:
1.2.3.4
1.2.3.0/24
The first format is used for single-IP entries, and the second format is used for entries which are larger than one IP address.
If you want this to be the default output format, or "the" output format when the script is run as a CGI, set $format = "r" within the script.
-p selects the output format suitable for use in a "data" file for rbldns, patched with my rbldns patch. The output lines look like this:
1.2.3.4:127.0.0.2:message
1.2.3.0/24:127.0.0.2:message
The first format is used for single-IP entries, and the second format is used for entries which are larger than one IP address.
The message is either the comments and date from the database, or the URL hard-coded into the program (or specified below) followed by the IP address.
If you want this to be the default output format, or "the" output format when the script is run as a CGI, set $format = "p" within the script.
-d selects the output format suitable for use in a
"data" file for tinydns. The output lines look like this:
For "/32" to "/25" entries, one or more of:
+4.3.2.1.zone:127.0.0.2
'4.3.2.1.zone:message
For "/24" to "/17" entries, one or more of:
+*.3.2.1.zone:127.0.0.2
'*.3.2.1.zone:message
For "/16" to "/9" entries, one or more of:
+*.2.1.zone:127.0.0.2
'*.2.1.zone:message
For "/8" and larger entries, one or more of:
+*.1.zone:127.0.0.2
'*.1.zone:message
Note that there are two lines generated for each entry. The first line, starting with a "+", generates an A record pointing to 127.0.0.2, and the second line, starting with a "'", generates a TXT record. Different RBL implementations look for different record types, this will cause tinydns to supply both types.
The message is either the comments and date from the database, or the URL hard-coded into the program (or specified below) followed by the IP address.
If you want this to be the default output format, or "the" output format when the script is run as a CGI, set $format = "d" and $zone = "zone" within the script.
The following options modify the output:
The script can detect when an IP block is "within" another block. By default it hides the smaller blocks, since they would be covered by the larger block. The "-a" option forces it to show all blocks, and the "-A" option forces it to NOT show these other blocks (if you have changed the default.)
You can change the default by changing the $skip_overlap variable within the script.
If you have a web page set up where people can look up the status of a given IP address, you can specify the URL of that page, and all of the "messages" returned to the client will consist of that URL, followed by the IP address, rather than the comments and date from the database. The "-u" option, followed by the URL, sets this URL. The "-U" option clears the URL and makes it use the information from the database (if you have changed the default.)
You can change the default by changing the $info_url variable within the script.
This option changes what the script does:
The "-l" option causes the script to just print the time of the last update to the database, without retrieving every single record (which can take a while, if you have a lot of IP addresses.) This can be useful if you have clients which use HTTP to "pull" the data from you- it allows them to quickly check whether or not any changes have been made, and only pull the full list if something has changed since the copy they're currently using.
Instead of changing the script, a client can request this functionality by adding "/lup/" to the end of the URL they request. For example, if you set up "http://www.domain.xyz/rbllist.cgi" as the script, requesting "http://www.domain.xyz/rbllist.cgi/lup/" will give you just the timestamp.
This allows you to provide both services- the "last updated" time, and the full list- using the same script.
File: | rbl-output |
Size: | 10,909 bytes |
Date: | 2007-11-12 02:19:14 +0000 |
MD5: | 6e5aeddab7760ead7285a980f2e84706 |
SHA-1: | 19b224a71839af4f3f7fef1b85f25f3da88b57c7 |
RIPEMD-160: | be0b8d2cc833386219163712b90acc72ea89636a |
PGP Signature: | rbl-output.asc |
This is a daemontools service which watches a named pipe for any changes, and runs a script when it receives data (or just a "signal") through that pipe. It's based around the same pipe-watcher script that qmail-updater uses, although it obviously uses a different name for the pipe, and it runs a different script.
If you aren't familiar with this method, think of it as a safer alternative to a setuid binary. Using pipe-watcher and putting the pipe in a world-writable location allows any userid on the system to trigger an update, but not necessarily change how it happens.
Start by creating a normal daemontools service. I use the name "rbl-updater" on my server.
# mkdir -m 755 /var/service/rbl-updater
# cd /var/service/rbl-updater
# mkdir -m 755 log
# wget -O log/run
http://qmail.jms1.net/scripts/service-any-log-run
...
# wget -O run
http://qmail.jms1.net/scripts/service-pipe-watcher-run
...
# wget
http://qmail.jms1.net/scripts/pipe-watcher
...
# wget
http://qmail.jms1.net/scripts/rbl-output
...
# chmod 755 log/run run pipe-watcher rbl-output
For the wget commands, make sure to use an uppercase letter "O", rather than a zero or a lowercase "o".
The next step is to configure pipe-watcher with the names of the pipe you want it to watch, and the script you want it to run. The modifications to pipe-watcher are, for example:
# the name of the pipe to watch
my $pfile = "/tmp/update-rbl" ;
# the script (or program) we run whenever activity is seen
my $cmd = "/service/rbl-updater/go" ;
my $cmd_needs_data = 0 ;
# delay (in seconds) to keep the program from being flooded
my $min_delay = 5 ;
I am not providing a downloadable script for pipe-watcher to call when it detects a change, because everybody's systems are different, which means everybody's scripts will be different. The script itself should be fairly simple to write- basically, run rbl-output to create whatever kind of output you need, and write it to whatever file you need it in. It should then run whatever other commands are needed in order to make the new data "active".
Here's a simple example of what such a "go" script might look like:
#!/bin/sh
/usr/local/bin/rbl-output -t -u http://www.dont-spam.us/lookup.cgi/ \
> /etc/tcp/smtp.rbl
cd /etc/tcp
cat smtp smtp.rbl | tcprules smtp.cdb.new smtp.cdb.tmp
chmod 644 smtp.cdb.new
mv smtp.cdb.new smtp.cdb
As you can see, it runs the rbl-output command to generate the file, uses "cat" to combine it with the "smtp" file (which presumably already exists) and send the combined output to the "tcprules" program, which builds "smtp.cdb.new". It then sets the permissions on the file and then renames it to "smtp.cdb". (This extra step, creating the file with a different name and then renaming it into place, is safer because it allows the script to set the permissions so it's world-readable before making it "live". This way there is no microscopic fraction of time during which the file is not world-readable.)
The reporting script is executed whenever an email is received by one of the honeypot addresses. The information available to the script (i.e. the sender, recipient, IP address, and so forth) are governed by the MTA which runs the script. My server uses qmail, so the information and the script here will assume that you're using qmail. If not, you will need to write your own script.
I use a perl script called report-spam
to handle the actual reporting. It is designed to be called from a
.qmail
file, and uses the RECIPIENT
environment
variable, which is set by the qmail-send
program.
A "reporting mechanism" is whatever needs to happen in order to feed spam into the "report-spam" script. There are any number of ways to make this happen- this is the two ways I'm currently doing it.
My server uses vpopmail to manage virtual domains, and in terms of email, all of the domains are "virtual". Creating a "honeypot" under vpopmail means creating an alias which points to the "report-spam" script. Here are two examples:
To create "honeypot@jms1.net" as a honeypot (which really is a honeypot address, don't send anything to it unless you're a spammer...)
# cd `vdominfo -d jms1.net`
# chmod +t .
# echo '| /usr/local/sbin/report-spam -h' > .qmail-honeypot
# chown vpopmail:vchkpw .qmail-honeypot
# chmod -t .
To create "dont-spam.us" as a honeypot domain (again, which it really is...)
# vadddomain dont-spam.us
Please enter password for postmaster:
# cd `vdominfo -d dont-spam.us`
# chmod +t .
# echo '| /usr/local/sbin/report-spam -h' > .qmail-default
# echo './postmaster/Maildir/' > .qmail-postmaster
You don't want to accidentally send "postmaster" mail to the reporting
script.
# chown vpopmail:vchkpw .qmail-*
# chmod -t .
Note that by totally overwriting the .qmail-default file, you are preventing vpopmail from working normally. Doing this will cause ALL mail sent to the domain to be reported. If you have normal mailboxes in that domain, you should NOT set up a full-domain forwarding like this, unless you're also going to create a ".qmail-userid" file for each mailbox, like I did with the "postmaster" mailbox above.
Also note that RFC 2821 section 4.5.1 requires that "postmaster" be a working email address. You should create a way for "postmaster" mail to be delivered, even if you are using the entire domain as a honeypot. (On my own server, it forwards to the "postmaster" account at the machine's primary domain.)
For the most part, the users on my server use IMAP to access their mailboxes, and are able to create folders on the server which are usable by any IMAP client, including the webmail interface. Several of the mailboxes on my server have folders called "0spam". If users receive spam, they move the message into that folder, and the server automatically runs the message through the report-spam script.
I did this using a very simple script called "auto-report-spam", which my server runs every 20 minutes from a cron job. The script itself looks like this:
#!/bin/sh PATH="/usr/bin:/bin:/usr/local/sbin" find ~vpopmail/domains -type d -name .0spam | \ ( while read d do find $d/{cur,new} -type f | \ ( while read f do report-spam $f && rm $f done ) done )
This "find | ( subshell )" construct is necessary because on a large server, there could be thousands of "0spam" folders, and any one of them could contain thousands of messages. The normal way to write a script like this would be to use backticks, however if the output of one of the "find" commands is over a certain limit (usually about 125K) it will exhaust the exec() buffer in the kernel, and the kernel will refuse to run the command.
For what it's worth, you can use any folder name you like. I chose "0spam" so that it would be at the top of each users' list of folders, so they can find it easily when they need it. The important part is to make sure that the name you choose is what's in the script.
Of course, once the script is written, you need to add it to the "cron" system, so that it runs every so often. The mechanics of this are left up to you. It is, after all, your server.