I have written a system which allows the users and domain owners on my server to control the filtering done by qmail-smtpd for mail which is addressed to them.
This filtering takes place during the SMTP envelope exchange, before the body of the message has been sent, so this system cannot be used to do content-based filtering. It can, and does, use the envelope sender and recipient, along with the IP address of the sending machine, to make a decision about whether to accept or reject a given message.
The system runs a processing script after each RCPT command in the SMTP conversation. This means that, if a server is sending a single message to multiple recipients on the same machine, and combines the transfer into a single SMTP conversation, it is quite possible for some of the recipients to allow the message, but others to reject it. In this case, the message will only be delivered to those recipients who allowed it.
The system consists of three parts:
A database containing a list of rules, each specifying a specific test to be executed for each RCPT command, and whether the command should be accepted or rejected when the test finds a match.
A web interface which allows users (i.e. mailbox owners, domain owners, and machine owners) to edit the rules stored in the database.
A filtering script which reads the database, runs the tests, and either accepts or rejects the message when it finds a match.
The system requires that you be running qmail with Jay Soffian's RCPTCHECK patch applied. This patch is part of my combined patch, version 7.07 and later. It may also be part of some of the other combined patches out there (I haven't looked at them in a while, so I don't know for sure which ones do and don't include it.)
The system also requires a running vpopmaild service, in order to validate logins and find each user's access level. This page explains how to set up a vpopmaild service under daemontools.
The web service's CGI scripts and the email processing script are all written in Perl. I tried to make the code as clear as I could, and have included enough comments that anybody who is at least halfway familiar with Perl should be able to read and understand them. There is also one simple shell script, used for setting the permissions on the files.
When you first log into the system, you will see the main screen.
This is a list of the rules which affect the incoming mail of whatever userid you logged in as, with command buttons allowing you to edit some of those rules. The page will will also show you any domain-wide and system-wide rules which will affect this user's incoming mail.
The "Hits" column tells you how many times each rule has matched a message within the current "focus". For example, the first rule in phase 1 might have matched 38,000 messages since being created, but only 1,546 of those were sent to firstname.lastname@example.org. If you are logged in as a system administrator and set your focus to the system-wide rules, you will see the system-wide hit count for that rule. (The rules in phase 5 were added when the system was originally installed, before the domain owner added the all-messages rule in phase 4. Those counts show how this user's email was affected during that first week, before the all-messages rule was added.)
The rules are organized into "phases", which are essentially groups of rules. Within each phase, the rules have sequence numbers. When mail arrives, the processing script will evaluate each rule in order by phase and sequence. The first rule which matches the message will be followed, and the message will be accepted or rejected without processing any of the rules which follow it.
Each phase has a security level associated with it, which controls which types of users (i.e. mailbox owners, domain owners, or system owners) are able to edit the rules in that phase. The default phases, the ones I use on my own systems, are shown in the example above.
In the example, you can see that the machine owner is using several RBLs to filter their incoming mail, but any clients who send a valid AUTH command are still allowed to send mail without being subjected to the blacklist checks or greylisting. In addition, the domain administrator for this domain has created a rule which says "accept all" before the system-wide rules. This causes his domain's mail to not be filtered by RBLs, and not have any greylisting done to it, whether the client is AUTH'd or not.
There are several different types of rules available. Each rule type specifies a type of test which is done against the sender's email, the recipient's email, or the IP address which is sending the message. If the message matches the rule, the message will be accepted or rejected without any further rules being processed.
The rule types are:
Match sender email. The email address of the sender (which may easily be forged) is compared with the regular expression in the rule. This check is NOT case-sensitive. Some examples of regular expressions:
|email@example.com||Matches this email address. Note that this may also match addresses like "firstname.lastname@example.org" as well.|
|^email@example.com$||Matches only this exact email address. The "^" tells it to match the beginning of the string, and the "$" tells it to match the end of the string. If there were anything else in the string before or after the email address, the match would fail.|
|@domain.xyz$||Matches any email address in that exact domain.|
|@(.*\.)?domain.xyz$||Matches any email address in that exact domain, or in any sub-domain of that domain (such as "firstname.lastname@example.org".)|
|\.cn$||Matches any email address which ends with ".cn".|
|.*||Matches anything, including an empty string.|
|.+||Matches any string with at least one character in it.|
Match recipient email. The RECIPIENT value (the email address of the recipient) is compared with the regular expression in the rule. This check is NOT case-sensitive. (Examples of regular expressions are above.)
This type of rule can be used to match specific extension addresses. On a qmail system, the address "email@example.com" has an unlimited number of extension addresses, which look like "firstname.lastname@example.org". You can give these addresses to web sites which you suspect will be sending spam, and if they later prove to be spammers, you can "disable" the address by creating a "Match recipient" rule which rejects any message matching the pattern "^user-ext@".
Match IP address. The IP address of the SMTP client (the machine which is sending the message to your server) is compared to the IP address or CIDR block (i.e. "126.96.36.199/24") contained in the rule. The rule matches if the sending IP address is the same, or in the case of a CIDR block, if the IP is within the block.
Match RBL. The IP address of the SMTP client (the machine which is sending the message to your server) is checked for membership on a DNS-based RBL (Realtime BlackList). The rule matches if the IP is listed in the RBL.
Note that the RBL mechanism can also be used to host DNS-based "whitelists", which are lists of IPs which are "trusted" (i.e. they are known to be "good guys" rather than "bad guys".) If you use one of these lists (or you have your own list), you can add an RBL rule which accepts messages rather than rejecting them.
AUTH client. This rule matches any message where the SMTP client has authenticated (i.e. has sent a successful AUTH command.) It can be used to allow users to bypass rules which exist later in the processing chain (i.e. so that authenticated users are able to send mail, even if their IP address happens to be on a blacklist.)
The function in the processing script which handles this
rule checks for the SMTP_AUTH_USER variable in order to do the
check. The code in qmail-smtpd which sets this variable is
something that I wrote into my combined
patch version 7 (or later). If you are not using the combined patch,
your qmail-smtpd will never set this variable, and this check will
I don't like the idea of a user trying to add this type of rule and having it not work because the server isn't using the combined patch. Therefore, the "Add new rule" form will hide this rule type from the user unless you set the SHOW_AUTH_RULES variable to 1 in your .htaccess file. (The default value is 0.)
All messages. This rule matches every message. It can be used to allow individual users to bypass domain- or system-wide rules which exist later in the processing chain. Rules which come after this rule will never be executed.
Greylisting. The first time a particular sender (identified by the SENDER and TCPREMOTEIP values) tries to send mail to this RECIPIENT, the server will temporarily refuse to accept the message, but will accept the message (and future messages) after a specific length of time has passed. That time limit is specified (in seconds) in the rule. These limits are typically between 120 (two minutes) and 600 (ten minutes.) Rules which come after this rule will never be executed.
Debug. This rule sets a flag which causes the processing script to log every rule that it processes from the database, whether it matches the message or not. I have found this to be incredibly useful in figuring out how a particular message is processed by the script. The sequence of this rule does not matter, as the script checks for the existence of a debug rule separately from querying the message processing rules, so that it can log rules which may exist before the debug rule.
The "Commands" column will contain a set of buttons you can click to do things with the rules. The following commands are available, although not every command will be available for every rule (for example, you can't move a rule "up" if it's the first rule in the phase.)
|Add a new rule below this one. When this button appears on a phase header, it will add a new rule to the phase, before any other rules. You will be shown a form prompting you for the information needed to create the rule. When you complete this form and click the "Add Rule" button, the rule will be created and you will be returned to the main screen.|
|Delete this rule. You will be shown a form with the rule's information. If you click the "Delete Rule" button, the rule will be deleted and you will be returned to the main screen.|
|Edit this rule. This will allow you to change the extra information
for the rule (i.e. the IP address, email address, RBL zone name, or
greylist timeout value), whether to accept or reject messages which
match the rule, and the description of the rule.
Note that if you change anything other than the description of a rule, the rule's hit counter will be reset to zero.
|Move this rule "up" in the list. This button will not be shown for a rule which is the first rule within a phase.|
|Move this rule "down" in the list. This button will not be shown for a rule which is the last rule within a phase.|
Note that any changes you make will take effect immediately.
When using the system, you will only be editing the rules for one "entity" (one mailbox, a domain, or the system at large) at any given time. I refer to this as your "focus". When you first log into the system, your "focus" will be on your own per-user rules. You will be able to see, but not edit, the per-domain and system-wide rules.
If you are logged in as a user with the appropriate access, you will be able to change your focus and edit the rules pertaining to a different security level, using the "Change Focus" section which will be below the list of rules.
A normal user (i.e. a user with no administrative access) can only "focus" on their own per-user rules, and cannot change their focus at all. They will not even see the "Change Focus" section at the bottom of the page.
A domain administrator will be able to change focus between their own rules, the per-user rules of any other user within their domain, and the per-domain rules for their domain.
A system administrator will be able to change focus to their own rules, the per-user rules for any mailbox on the system, the per-domain rules for any domain on the system, or the system-wide rules which affect all incoming mail.
Note that when your focus is set to edit domain-level rules, you will not see any per-user rules. When your focus is set to edit system-level rules, you will not see any per-domain or per-user rules. This is because if you're working with the rules for an entire domain, there may be multiple users within the domain who have rules, and showing all of them would be confusing (at least I found it to be so while writing the program.)
The same applies for editing system-wide rules: you will not see any per-domain or per-user rules.
In the .htaccess file is a line which sets the SHOW_FOCUS_LIST environment variable. If you set this to "1", you will see a list of the entities which currently have at least one rule in the database. The list entries are clickable links which will set the focus directly to that entity.
At first I wasn't sure how I wanted to handle the focus selection interface, so I tried both ways at the same time. I kinda like the list idea, however I can see that if the server has more than a few dozen entities with rules, the list itself will overwhelm the page, so I added a configuration option so that the machine owner can choose whether or not they want to see it, by setting the value of the SHOW_FOCUS_LIST environment variable to a non-zero value. (For what it's worth, on my own server the list is turned on. Not all of my clients are using the interface yet, and it only shows the domains and mailboxes which have at least one rule in the database.)
The idea for this script started after I wrote a three-tuple greylisting system called jgreylist. I ran into a few problems on my own server, where I needed the greylisting to interact with other checks (i.e. certain IPs were trusted and should not invoke greylisting, one client liked the greylisting idea but didn't want to use any blacklists, one client didn't want any kind of filtering on their mail at all, one client wanted ONLY the spamhaus list and no others, one client who wanted filtering for everything except his company's domain and his wife's hotmail account, etc.) I found myself having to set up multiple IP addresses on the machine, so I could run separate qmail-smtp services for each client, each with different rules... It became a royal pain to manange.
When I thought about it, I realized if I did the RBL checks as part of a RCPTCHECK handler instead of using rblsmtpd, I would be able to selectively enable and disable certain checks, and run those checks before or after checking the sender and/or recipient addresses. I ended up writing a Perl script with functions to do the IP, RBL, and sender email address checks, along with a greylisting function, and a main() function which combined calls to these functions in order to implement the policies that each of my clients wanted.
That early script did work, and it did allow me to return to having just a single qmail-smptd service, but it was still rather tedious to maintain, especially when two of the clients started wanting changes almost every day (i.e. whitelisting new email addresses and domains, enabling and disabling specific RBLs, bypassing all of the filtering for the president's email, etc.)
The next idea was to write a web interface, so that my clients could edit the rules on their own without my having to spend several hours every week satisfying their every whim, especially when I knew they were probably going to change their minds a few days later. This meant finding a way to store the rules in a database, in such a way that each domain and/or mailbox had their own set of rules, some of which were shared (i.e. the rules for a mailbox should include the rules for the domain, and I wanted to be able to throw a few system-wide rules in there as well.)
It also meant re-writing the processing script to pull the correct rules out of the database, depending on the recipient address it was processing. My first instinct was to use regular expressions, however PostgreSQL's support for regular expression searching was rather convoluted at the time, and even at this early stage I realized that this would be something awesome to release to the world as open source software, and I wanted it to be able to work with other database engines (because not everybody uses PostgreSQL as their database of choice.)
A little bit of thinking about it made me realize that there were only three types of recipients for whom rules would need to be created: a single mailbox, a domain, and the entire machine. My first thought was to have separate tables for the mailbox, domain, and system rules, but when I started writing the schema, the fact that the tables' structures were identical bothered me - part of a good database design is combining similar data into the same tables, so I added a "rule type" field, which eventually became the "phase" field.
At first the script was doing this really complicated query, which was selecting all system rules, all domain rules where the "owner" was whatever was in the RECIPIENT variable, and then all mailbox rules where the "owner" was an exact match of the RECIPIENT variable. Then, while trying to figure out how to simplify this query, it also occurred to me that if the "owner" field had a regular expression in it, I could have PostgreSQL do a single search and find all three rule types, for any RECIPIENT value.
However, not every database engine can do regular expression searches in a WHERE clause (or if they can, they don't do it using the same syntax that PostgreSQL uses, and I didn't want to have to write different code for different database engines unless I absolutely had to.)
So then I figured out that if I stored the system-wide rules with owner='%' and the domain-wide rules with owner='%@domain', I could use SQL's "LIKE" operator "in reverse" and get all three rule types in a single query. The LIKE operator is a standard part of the SQL language, and it's fairly safe to assume that any other commonly used SQL engine is going to support it.
Back then the field was called "owner", I have since changed it to "recipient".
What do I mean by using the LIKE operator "in reverse"? Most people are used to writing queries like this:
SELECT * FROM tablename WHERE fieldname LIKE 'abc%'
However, I found that there was nothing in the SQL specifications which required that the argument (the pattern) had to be a literal string. A quick test showed that it works "the other way around" just as well, and that if I stored patterns in a field, I could search based on those patterns:
SELECT * FROM tablename WHERE 'abcdefg' LIKE fieldname
The result is that I can store a "LIKE" pattern as the recipient for each rule, and the processing script and web interface can both use the "LIKE" operator "in reverse" to locate the correct rules for that recipient. For example:
$ psql rules g4web Password for user g4web: Welcome to psql 8.1.23, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit rules=> SELECT phase , seq , recipient , type , sender , ip , rbl , target , delay , accept rules-> FROM rules WHERE 'email@example.com' LIKE recipient ORDER BY phase , seq ; phase | seq | recipient | type | sender | ip | rbl | target | delay | accept -------+-----+-----------------+------+--------------+--------------------+------------------+--------------+-------+-------- 1 | 1 | % | I | | 192.168.5.0/24 | | | | t 2 | 1 | %@domain.xyz | I | | 188.8.131.52 | | | | t 3 | 1 | firstname.lastname@example.org | E | @spammer.com | | | | | f 3 | 2 | email@example.com | E | firstname.lastname@example.org | | | | | t 3 | 3 | email@example.com | T | | | | ^user-alias@ | | t 4 | 1 | %@domain.xyz | A | | | | | | t 5 | 1 | % | R | | | zen.spamhaus.org | | | f 5 | 2 | % | R | | | dnsbl.njabl.org | | | f 5 | 3 | % | R | | | dnsbl.sorbs.net | | | f 5 | 4 | % | R | | | bl.spamcop.net | | | f 5 | 5 | % | G | | | | | 305 | t (11 rows) rules=> SELECT phase , seq , recipient , type , sender , ip , rbl , target , delay , accept rules-> FROM rules WHERE '@domain.xyz' LIKE recipient ORDER BY phase , seq ; phase | seq | recipient | type | sender | ip | rbl | target | delay | accept -------+-----+--------------+------+--------+--------------------+------------------+--------+-------+-------- 1 | 1 | % | I | | 192.168.5.0/24 | | | | t 2 | 1 | %@domain.xyz | I | | 184.108.40.206 | | | | t 4 | 1 | %@domain.xyz | A | | | | | | t 5 | 1 | % | R | | | zen.spamhaus.org | | | f 5 | 2 | % | R | | | dnsbl.njabl.org | | | f 5 | 3 | % | R | | | dnsbl.sorbs.net | | | f 5 | 4 | % | R | | | bl.spamcop.net | | | f 5 | 5 | % | G | | | | | 305 | t (8 rows) rules=> SELECT phase , seq , recipient , type , sender , ip , rbl , target , delay , accept rules-> FROM rules WHERE '@' LIKE recipient ORDER BY phase , seq ; phase | seq | recipient | type | sender | ip | rbl | target | delay | accept -------+-----+------------+------+--------+--------------------+------------------+--------+-------+-------- 1 | 1 | % | I | | 192.168.5.0/24 | | | | t 5 | 1 | % | R | | | zen.spamhaus.org | | | f 5 | 2 | % | R | | | dnsbl.njabl.org | | | f 5 | 3 | % | R | | | dnsbl.sorbs.net | | | f 5 | 4 | % | R | | | bl.spamcop.net | | | f 5 | 5 | % | G | | | | | 305 | t (6 rows) rules=> \q
As you can see, the first query picked up all of the rules for the mailbox, the domain within which the mailbox exists, and the system-wide rules, all in a single query. (The other two queries are shown as examples of how to look at just the "domain and system" rules, or just the "system" rules. This is how the index.cgi script builds the list when the focus is set to a domain or to the system.)
Once I figured out how to store the data in such a way that the database engine would select the rules for each message, I was able to re-use some of the functions from the original script, and write a new "main" function which queried the rules for the recipient, and then processed the rules from the database results.
Once the processing script was written and working, it ended up being very easy to convert the rules which had been encoded into that original Perl script into the equivalent database records. Once I did this, I switched over to the new script. The clients never knew the difference, and when they called up wanting rule changes, I was able to do them in a few minutes rather than an hour and a half.
After a few weeks of letting it run and watching the logs to sure it didn't break anything, I started working on the beginning of the web interface, but then I suddenly found a full-time job which took up all of my time (and ended up being so stressful that it contributed to a heart attack. True story.)
The web interface idea sat "on hold" for a few years, and then last month (2012-06) the year-long contracting job I had been on ended, and I had enough free time (and mental clarity) to finish the web interface. Plus, while working on the web interface and documentation, I added some things which I hadn't originally thought of, such as:
The system described here is the one I'm actually using on my own server. The tarballs I offer for download on this site are produced from the actual scripts in place on my live web server.
2012-08-20 Niamh Holding pointing out an interesting issue... If a recipient address has a "-" character in it, and qmail is also using "-" as the separator charcter for extension addresses, it can cause problems. I need to look into this.
In addition, I just noticed that when I wrote rcptcheck.pl, I have the "-" character hard-coded into the script. That needs to be fixed, but it won't happen today (I'm in the middle of packing up for a move.)