http:// qmail.jms1.net / logfiles.shtml

Working with qmail's log files

A lot of people ask questions about how to "track" emails as they travel through a qmail system, or they want a list of every message sent through the server (who it was from and who it was sent to), or just want counts of how many messages were processed. All of this information is sitting there in the log files, if you know how to read them.

Where are the log files?

This is actually the biggest problem for most people. There is no single standard location for log files, because there is no standard way to run qmail. I will try to describe how to find the log files on your own system, but most of the work is up to you.

If you are running qmail under daemontools (i.e. you have a directory like /service/qmail/ or /service/qmail-send/ and you control qmail by editing a "run" script in this directory) then your service directory (again, your /service/whatever directory) will contain a directory called "log", which will also contain a "run" script. By reading this "log/run" script you can tell where the log files are being sent.

If you are running qmail under some other scheme (such as the "sysvinit" scripts used by Redhat, Solaris, and many other distributions) you should find the script which is responsible for starting qmail-send. (This will probably be in the /etc/init.d directory if you are running "sysvinit" scripts.) This script may have its own commands, or it may just run one of the scripts in /var/qmail/boot.

In any event, once you find the script which actually starts qmail, examine the command line. You will see something like this:

exec env - PATH="/var/qmail/bin:$PATH" \
qmail-start ./Maildir/ splogger qmail

The first term after "qmail-start" is the default delivery location- usually a Maildir within the user's home directory. The next term (which is optional) tells how to handle the logs. qmail-send sends its log messages to its "standard out" channel. The daemontools "multilog" program catches this output and writes it to a "current" file in a given directory, while the "splogger" program catches this output and sends it to the syslog service.

If there is no term which tells where the logs should be sent, the logs will simply "fall through" to whatever is handling the standard out channel for the service. If there is no log service for a given service, any log entries will "fall through" to a program called readproctitle, which adds the messages to a string which is visible on its command line with the ps command. In some cases if a service directory has been damaged to the point where a log service does not exist or cannot start, this is another place to look for log output.

When using the splogger program, you can specify a "tag" to be prepended to each log line. In the example above, this tag is "qmail". You could also specify a numeric "facility" code- the default is 2, which means "mail".

To figure out where the log lines are going, you should check your /etc/syslog.conf file to find out where "mail" messages are being sent. In many cases, they will be going to a file such as "/var/log/mail", but don't blindly trust that- your /etc/syslog.conf will tell you exactly where to look.

If you are in a position to be able to choose how to handle the logs for a new or existing server, you should never use syslog. The syslog mechanism is not reliable- under heavy loads it will drop log entries. It's also very slow. The multilog program does not suffer from either of these problems.


What log files are available?

A qmail system is made up of several different programs, all running at the same time and doing their own little part of the overall "mail server" job. Each of these programs generally has its own log file, although if you are using syslog, the logs may be combined together. By understanding what each program does, you can easily tell which log file to look at when you need to check something.

The qmail-send program (which normally runs as a daemontools service called "qmail-send", or maybe just "qmail") manages the queue, and starts all delivery processes. The delivery programs run as children of qmail-queue, and therefore their output is contained with the qmail-queue output. If you have a problem with messages being "stuck" in the queue, this log file should be the place to look.

The qmail-smtpd program handles incoming SMTP traffic. If you have a problem with messages not being properly accepted or rejected from other machines, this log file should be the first place to look. Note that if you have multiple SMTP services on the machine (for example, a standard SMTP service on port 25, an SSL-enabled service on port 465, and an AUTH-only service on port 587) each service will have its own log file (again, unless they are being combined by the syslog mechanism.)

If you are using qmail-scanner as a way to have your incoming mail scanned for viruses and/or spam content, the qmail-scanner program also generates a log file of its own, which is totally separate from the qmail-smtpd log. You will find this file in your qmail-scanner directory, with the name qmail-queue.log. This will have a line-by-line description of everything the qmail-scanner program does.

If you are using simscan as a way to have your incoming mail scanned for viruses and/or spam content, you will normally find simscan's logs in the qmail-smtpd service log.

If you are using clamav, the "clamd" program keeps its own log file. The /etc/clamd.conf file should have a "LogFile" line which tells you where the log is being written.

If you are using spamassassin, its "spamd" program also generates logs. By default it sends the logs to the syslog with the "mail" facility code, but by adding a "-s" option to spamd's command line it is possible to send the log output to a file, or to the "standard error" channel (which makes it easy to run spamd under daemontools.) You should examine the command line for spamd in order to figure out where its log file will be found.

Any POP3 or IMAP servers will also be generating their own log entries. If you or your users are having a problem with POP3 or IMAP, those would be the place to look.


Timestamps

When dealing with log files generated by multilog, you will encounter strange looking timestamps which look like an "@" character followed by a bunch of hex digits. This format is called TAI64N. The first sixteen hex digits represent a 64-bit number which counts how many seconds have passed since the beginning of 1970, while the remaining eight hex digits are a 32-bit number which counts nanoseconds since the beginning of the second.

At first glance it sounds like a more precise version of the traditional unix time() function, and for most purposes it can be used as such (subject to the resolution of your system clock.) However, instead of interpreting the count of seconds using the UTC notation, it uses TAI notation. The difference is that TAI does not have "leap seconds", which means that when doing arithmetic on TAI timestamps, these "leap seconds" do not have to be taken into account. This web page on djb's web site explains the difference in more detail. The idea is this- if you try to compare TAI and UTC timestamps, there will be a difference of several seconds (because of the leap seconds added in the UTC system but not in the TAI system.)

While these timestamps are very useful for a computer, they're not very friendly to work with from a human standpoint. The daemontools package includes the tai64nlocal program, which reads lines starting with TAI64N timestamps (such as log files) and write the same data with the timestamps converted from TAI to human-readable UTC. Here's an example:

$ cat zzz
@40000000433225833b6e1a8c tcpserver: status: 5/30
@40000000433225833b6e2644 tcpserver: pid 26162 from 194.206.24.40
@40000000433225840c85ba04 tcpserver: ok 26162 a.mx.jms1.net:209.114.200.128:25 :194.206.24.40::1521
@40000000433225840c8f0cbc rblsmtpd: 194.206.24.40 pid 26162: 451 We do not accept mail from IP addresses without reverse DNS.
@40000000433225852a9ada4c tcpserver: status: 6/30
@40000000433225852a9ae604 tcpserver: pid 26163 from 193.123.2.227
@40000000433225852aa997bc tcpserver: ok 26163 a.mx.jms1.net:209.114.200.128:25 pixelwww.pixelpower.com:193.123.2.227::4232
@40000000433225852aa9a374 rblsmtpd: 193.123.2.227 pid 26163: 553 Sent mail to honeypot qmmycemglyiuq@delete.net on 2005-01-04
@400000004332258538eae27c tcpserver: end 26163 status 0
@400000004332258538eaea4c tcpserver: status: 5/30
$ cat zzz | tai64nlocal
2005-09-21 23:31:05.997071500 tcpserver: status: 5/30
2005-09-21 23:31:05.997074500 tcpserver: pid 26162 from 194.206.24.40
2005-09-21 23:31:06.210090500 tcpserver: ok 26162 a.mx.jms1.net:209.114.200.128:25 :194.206.24.40::1521
2005-09-21 23:31:06.210701500 rblsmtpd: 194.206.24.40 pid 26162: 451 We do not accept mail from IP addresses without reverse DNS.
2005-09-21 23:31:07.714791500 tcpserver: status: 6/30
2005-09-21 23:31:07.714794500 tcpserver: pid 26163 from 193.123.2.227
2005-09-21 23:31:07.715757500 tcpserver: ok 26163 a.mx.jms1.net:209.114.200.128:25 pixelwww.pixelpower.com:193.123.2.227::4232
2005-09-21 23:31:07.715760500 rblsmtpd: 193.123.2.227 pid 26163: 553 Sent mail to honeypot qmmycemglyiuq@delete.net on 2005-01-04
2005-09-21 23:31:07.954917500 tcpserver: end 26163 status 0
2005-09-21 23:31:07.954919500 tcpserver: status: 5/30

These are actual log entries taken from my own server at the time I was writing this page. My server does not accept incoming mail from IP addresses without reverse DNS names, and I have my own blacklist which the spammers are kind enough to update for me, by sending mail to addresses in (at that time) the "@delete.net" domain, although I now use the "@dont-spam.us" domain for this purpose. The same blacklist also includes the IP addresses of machines which try to send viruses to my machine.


The multilog program

The multilog program works by reading the output from another program (usually a service running under daemontools) and writing the output to one or more log files. The program is capable of filtering the data, adding tai64n timestamps to each line, and maintaining a directory where log files are automatically created, cut off when they reach a certain size, and old files deleted when too many exist.

The official multilog web page gives a brief explanation of each feature of the program. Below is information about how most people end up using multilog- as part of an automatically created log service in conjunction with qmail or djbdns.

The common invocation of multilog looks like this:

multilog t ./main

The "t" command (which, if present, must be the first option) tells multilog to add a tai64n timestamp to the beginning of each line. If this option is not there, the lines sent to the log file(s) will be exactly as sent by the service being logged.

The "./main" portion specifies a self-rotating directory. multilog will append each line of output to a file called "current" within this directory. When the file approaches a certain maximum size (which defaults to 100,000 bytes) multilog will rename "current" to a file whose name is a tai64n timestamp of the time the rename was done, and create a new "current" file. At this time it also counts how many old log files are in the directory, and if more than a certain number exist (default 10) it will delete the oldest files.

You can also add the "s" command to specify the maximum size of each file before it gets renamed, and the "n"> command to specify the maximum number of old log files which may exist before they are deleted. For example...

multilog t n16 s1048576 ./main

This allows each file to grow up to 1MB (1048576 bytes) and allows up to 16 files to exist before the oldest files are deleted.


Interpreting the logs

Once you've found the logs you're looking for, and understand how the timestamps work, the only remaining step is to actually read the messages being sent to the logs. The biggest problem that most people have is that they don't understand what they're reading, because they don't understand how the programs which are generating the log entries actually work.

Below I will show samples of what typical log entries look like and what they actually mean. Note that the examples will say TIMESTAMP at the beginning of each line- this is because different log files may have different types of timestamps (i.e. TAI64N, the output from tai64nlocal, or the *nix-format timestamps added by most syslog mechanisms) and I don't want to have to keep explaining over and over that "your timestamps may be different."


qmail-send logs

The qmail-send service logs the activities of the qmail-send process, along with any messages generated by the qmail-local and qmail-remote processes it will generate in order to actually process the individual deliveries. These examples will track the lifetime of a particular message on its journey through the queue.

The mtrack script reads the log file, uses the information in each line to correlate which lines correspond to the same messages, and prints those lines back out, grouped by the message they're talking about. Trust me, this script will make your life MUCH easier.

TIMESTAMP new msg 12345

This message means that qmail-send has just become aware that the indicated message number exists in the queue. This usually happens shortly after qmail-queue is finished adding the message to the queue.

The number (12345 in this example) is the "message number". Technically, this is the inode number of the file within the /var/qmail/queue/mess directory which contains the text of the message, and the filename of the various control files within other parts of the /var/qmail/queue directory.

Because inode numbers are guaranteed to be unique within any given filesystem, only one message may have a given "message number" at any given time. However, you will often see cases where the same "message number" is used for multiple messages. This happens when qmail-send is finished with a given message and deletes it from the queue, and another incoming message happens to use the same inode.

TIMESTAMP info msg 12345: bytes 3189 from <sender@domain.xyz> qp 27763 uid 46

Remember that qmail-send has two jobs- classifying each recipient of an incoming message as "local" or "remote", and scheduling deliveries. This message means that the classifier portion of qmail-send (or if you're using the EXT_TODO patch, the qmail-todo process) has finished classifying the recipients and the message is ready to be delivered to each recipient.

The first number (12345 in this example) is the "message number", and will match the "message number" from the "new msg" message above. This number can be used to tie log entries together and follow the progress of any given message in the queue.

The email address is the envelope sender of the message- usually this is the argument of the MAIL FROM command which was given to qmail-smtpd, although if a message originated using some other interface (i.e. qmail-inject) this value can be specified using some other mechanism. Regardless, this is the address to which any bounce messages will be sent if any of the recipients are unable or unwilling to accept the message.

The qp number (27763 in this example) is the process ID of the qmail-queue process which added the message to the queue. This can be used to link the qmail-queue and qmail-send logs together if needed.

The uid number (46 in this exmple) is the numeric uid under which the qmail-queue process was running when the message was added to the queue. In the case of an SMTP server using qmail-scanner, this will be the userid as which qmail-scanner-queue.pl runs.

TIMESTAMP starting delivery 37051: msg 12345 to local recip@domain
TIMESTAMP starting delivery 37051: msg 12345 to remote recip@domain

These messages tell you that a delivery attempt has been started. Each delivery attempt is assigned a number (37051 in this example) which is simply a sequence number which starts with 1 when qmail-send starts. The message tells you whether this is a local or remote delivery, and the recipient's email address.

TIMESTAMP delivery 37051: success: did_1+0+0/
TIMESTAMP delivery 37051: deferral: Sorry,_I_wasn't_able_to_establish_an_SMTP_connection._(#4.4.1)/
TIMESTAMP delivery 37051: failure: 1.2.3.4_does_not_like_recipient./Remote_host_said:_550_recip@domain_User_unknown/Giving_up_on_1.2.3.4./

These messages tell you when a delivery attempt is finished, whether the delivery attempt was successful or not, and any additional information about the delivery. The delivery number (37051 in this example) tells you which delivery is finished, and allows you to correlate this line with the appropriate starting delivery line (above.)

Note that the additional information (the red portion at the end of each line in the examples above) is "cleaned" before being sent to the log, so that it's safe to log without allowing a malicious remote host to falsify log entries. Any spaces in the message are replaced with "_", and any newlines are replaced with "/". For example, the third message above actually started out as three lines of extra information.

The delivery status code will be one of the following:

TIMESTAMP end msg 12345

This message tells you that the indicated "message number" (12345 in this example) is being deleted from the queue. If you are "following" a message through the queue, this message is the end- any messages after this one which may refer to the same "message number" are in fact referring to a new message which happens to be using the same inode number as the old one.

TIMESTAMP status: local 1/30 remote 7/50
TIMESTAMP status: qmail-todo stop processing asap
TIMESTAMP status: exiting

Any time a delivery process (i.e. qmail-local or qmail-remote starts or ends, this message is sent to the log. In this example, there are 30 concurrent local delivery slots available (i.e. the concurrencylocal control file) and 1 of those slots is being used, and there are 50 concurrent remote delivery slots available (the concurrencyremote control file) and 7 of those slots are being used.

TIMESTAMP status: local 1/30 remote 7/50 exitasap

When you tell qmail-send to shut itself down, it will add exitasap to the end of each status message (as shown here.) When all of the deliveries are done, you will see one or two last messages- you will only see the one referring to qmail-todo if you are using the EXT_TODO patch, but the exiting message is the last thing qmail-send does before stopping.

Note that there is no "starting" message when qmail-send starts- the first thing it logs when it starts up is a "status: local 0/x remote 0/x" line.

These status: messages do not have no direct correlation to any individual message, although if you want to track the timestamps they can give you an idea of how busy the server was at the time.


qmail-smtpd logs

The qmail-smtpd logs are more complicated, because many programs are sending log entries into the file. I will cover the most common programs here.

The strack script reads the log file, uses the information in each line to correlate which lines correspond to the same connections, and prints those lines back out, grouped by the connection they're talking about. Trust me, this script will make your life MUCH easier.

tcpserver

tcpserver is normally the process which watches for new client connections, verifies the client's IP address against an access control file, and either accepts or denies (accepts and immediately disconnects) the client's connection. It logs the following types of entries:

TIMESTAMP tcpserver: pid 27730 from 1.2.3.4

When a client initially connects, tcpserver forks, the new child process handles that one connection, and the parent process continues to wait for new connections. This log entry means that a new connection has arrived, the new child process has the listed process ID (27730 in this example), and the client's IP address is listed. At this point, it has not looked at the access control file or decided whether or not to accept the connection- this simply tells you that the client is trying to connect.

TIMESTAMP tcpserver: ok 27730 myhostname:myip:myport hostname:ip:ident:port
TIMESTAMP tcpserver: deny 27730 myhostname:myip:myport hostname:ip:ident:port

After looking up any information specified by the "-h", "-p", and/or "-r" options, and consulting the access control file, tcpserver logs this line to indicate whether the connection was accepted or not, as well as the full information gathered by the options. Any unknown information (such as the ident field, if you are using the "-R" option) will simply be blank (i.e. two "::" characters with nothing between them.)

TIMESTAMP tcpserver: end 27730 status 0

When the actual server program (such as qmail-smtpd) is finished, tcpserver logs this line to show when it finished and to show its return code (0 in this example.) For most processes, a return code of 0 is normal, while anything else is usually an indication of some kind of error.

TIMESTAMP tcpserver: status 3/50

Just as qmail-send logs a status message whenever a delivery starts or finishes, tcpserver logs a status message whenever a new child process is forked to handle a connection, and whenever one of the child processes finishes.

rblsmtpd

rblsmtpd only generates logs when it prevents a client from reaching qmail-smtpd. The log message it generates looks like one of these:

TIMESTAMP rblsmtpd: 1.2.3.4 pid 27730: 451 message
TIMESTAMP rblsmtpd: 1.2.3.4 pid 27730: 553 message

The pid (27730 in this example) will match the process ID from the tcpserver process which launched it. The status code will be 451 or 553, depending on whether rblsmtpd is issuing a "soft error" (meaning "try again later") or a "hard errror" (meaning "don't try again".)

The status code and the message after it are the text which is sent back to the client as an error message. The message normally comes from the TXT record corresponding to the reversed IP address within the RBL zone, and will often identify which blacklist contained the client's IP address.

jgreylist

If you are using my jgreylist program to implement greylisting, the program will log its decision about each IP which connects. The log entries will look like one of the following:

TIMESTAMP jgreylist[27730]: 1.2.3.4 OK known
TIMESTAMP jgreylist[27730]: 1.2.3.4 DENY no reverse DNS
TIMESTAMP jgreylist[27730]: 1.2.3.4 GREY first time
TIMESTAMP jgreylist[27730]: 1.2.3.4 GREY too soon

These log entries indicate that the IP address was allowed to connect, denied because its IP address has no reverse DNS name, or temporarily greylisted- and if so, it tells whether this is the first time the IP has tried to connect, or if this is an additional time and it's still too soon after the first time to be allowed to connect.

If you have configured $show_log = 1 in the script, you will also see a log of the phony SMTP conversation which jgreylist carries out with the client.

If you have configured $log_pid = 0 in the script, you will NOT see the process ID (27730 in this example) in the log lines. If you do log the pid, you will find that the pid will be same as the pid logged by tcpserver for the same connection.

qmail-smtpd patches

Normally, qmail-smtpd doesn't log anything. However, many of the patches that people use with qmail include logging of various things. This is a quick list of the log entries generated by various features in my own combined patch.

When I released vestion "6cd" of the combined patch, I changed every log message generated within qmail-smtpd to use the same "qmail-smtpd[nnn]:" format. This page reflects that change. If you are using an older version, you will see log entries which look similar to these but are not the same format. I highly recommend you upgrade to at least the "current" version of the patch.

MFCHECK

The MFCHECK code checks the domain portion of the envelope sender address (in the MAIL FROM command) to make sure it's a real domain which has at least one MX record. This prevents spammers from being able to use phony domain names in their forged sender addresses.

TIMESTAMP qmail-smtpd[12345]: MFCHECK pass [1.2.3.4] domain.xyz
TIMESTAMP qmail-smtpd[12345]: MFCHECK fail [1.2.3.4] domain.xyz
TIMESTAMP qmail-smtpd[12345]: MFCHECK bypassed [1.2.3.4] sender@domain.xyz

SPF

You need to understand SPF in order to understand what these lines mean. The idea is that I, as the owner of the jms1.net domain, have published a list of the IP addresses which are allowed to send email which claims to be from my domain. Any server which receives a message claiming to be from my domain can look up this record, and if the IP which is sending the message is not found on the list that I have published, then that server knows that the message is forged and can refuse the message.

If your server is checking the SPF records for incoming mail, these log lines will tell you the result of each check. The format is the same as the Received-SPF header which is added to each message which is accepted.

TIMESTAMP qmail-smtpd[12345]: Received-SPF: pass (myhostname: ...)
TIMESTAMP qmail-smtpd[12345]: Received-SPF: none (myhostname: ...)
TIMESTAMP qmail-smtpd[12345]: Received-SPF: unknown (myhostname: ...)
TIMESTAMP qmail-smtpd[12345]: Received-SPF: neutral (myhostname: ...)
TIMESTAMP qmail-smtpd[12345]: Received-SPF: softfail (myhostname: ...)
TIMESTAMP qmail-smtpd[12345]: Received-SPF: fail (myhostname: ...)
TIMESTAMP qmail-smtpd[12345]: Received-SPF: error (myhostname: ...)

QMAILQUEUE_LOG_MAIL and QMAILQUEUE_LOG_RCPT

These entries show you the arguments of every successful MAIL FROM and/or RCPT TO command sent to your server.

TIMESTAMP qmail-smtpd[12345]: MAIL FROM:<email@domain.xyz>
TIMESTAMP qmail-smtpd[12345]: RCPT TO:<email@domain.xyz>

validrcptto.cdb

The validrcptto.cdb patch allows qmail-smtpd to verify that the full email address is valid for messages sent to "rcpthosts" domains, rather than just checking the domain and allowing every mailbox name to come in.

This is useful to prevent your queue from filling up with bounce messages to bogus senders, because their recipient addresses were no good. It's the same idea as the "chkuser" patch, except that instead of tying into vpopmail to verify the addresses, it just checks a cdb file which you can build using any method you like.

TIMESTAMP qmail-smtpd[12345]: validrcptto [1.2.3.4] RCPT TO: mailbox-ext@domain.xyz
TIMESTAMP qmail-smtpd[12345]: validrcptto [1.2.3.4] trying: mailbox-ext@domain.xyz
TIMESTAMP qmail-smtpd[12345]: validrcptto [1.2.3.4] trying: mailbox-default@domain.xyz
TIMESTAMP qmail-smtpd[12345]: validrcptto [1.2.3.4] trying: @domain.xyz
TIMESTAMP qmail-smtpd[12345]: validrcptto [1.2.3.4] found: @domain.xyz
TIMESTAMP qmail-smtpd[12345]: validrcptto [1.2.3.4] reject: badname@domain.xyz

AUTH

These messages show that the client sent an AUTH command, and tell whether or not the command was successful. They also show the client IP and the username they authenticated (or tried to authenticate) with.

TIMESTAMP qmail-smtpd[12345]: AUTH successful [1.2.3.4] user@domain.xyz
TIMESTAMP qmail-smtpd[12345]: AUTH failed [1.2.3.4] user@domain.xyz

GREETDELAY and DROP_PRE_GREET

If you are using the GREETDELAY and DROP_PRE_GREET features of my combined patch, these messages show when a client disconnects before the initial banner was sent, and when a client is forcibly disconnected because they sent data before the server sent the banner (which is a violation of RFC 2821.)

TIMESTAMP qmail-smtpd[12345]: before greeting: [1.2.3.4] client disconnected
TIMESTAMP qmail-smtpd[12345]: before greeting: [1.2.3.4] client sent data

recordio

If you are using recordio with your service, you will see a full log of everything sent between the client and the server. This is a quick sample of what these logs look like:

TIMESTAMP 12345 > 220 a.mx.jms1.net NO UCE ESMTP
TIMESTAMP 12345 < EHLO xyzzy.not
TIMESTAMP 12345 > 250-a.mx.jms1.net NO UCE
TIMESTAMP 12345 > 250-STARTTLS
TIMESTAMP 12345 > 250-SIZE 0
TIMESTAMP 12345 > 250-PIPELINING
TIMESTAMP 12345 > 250 8BITMIME

Note that the first number (12345 in the example) is the pid (process ID) of the service program (i.e. qmail-smtpd), and will match the pid logged by tcpserver for the same connection.


Useful Scripts

mtrack is a script which reads the log output from qmail-send and groups the lines together by the message they correspond with, making it easy to "follow" each message into and out of the queue. It assumes that the log lines will be presented in chronological order, and can deal with either tai64n or human-readable timestamps- it simply copies them to the output.

strack is the same kind of script, but it works on the qmail-smtpd log entries.

convert-multilog is a script which searches "/service/*/log/main" for any "@4*" files (the automatic cut-off files generated by multilog), converts their timestamps from tai64n to human-readable format, and writes them to /var/log/{service}.{date}. Once the lines from a given "@4*" file have been converted, the file is deleted.

I run this as a cron job on all of the servers I own or manage- some once an hour and some once every ten minutes, depending on how busy the server is.