Red Hat Bugzilla – Bug 161785
spamassassin restart fails - functions bug?
Last modified: 2007-11-30 17:11:08 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050524 Fedora/1.0.4-4 Firefox/1.0.4
Description of problem:
When I try and 'restart' spamassassin (SA) it fails with:
service spamassassin restart
Shutting down spamd: [ OK ]
Starting spamd: Could not create INET socket on 127.0.0.1:783: Address already
in use (IO::Socket::INET: Address already in use) [FAILED]
The problem seems to be that whilst a 'restart' does a 'stop' and then 'start', the 'stop' calls 'killproc' to find the SA pids. 'killproc' is a function in /etc/init.d/functions. It first looks for a pid file, /var/run/spamd.pid', but this does not exist. So it then calls the 'pidof' command.
This returns the spamd processes pids, but it seems to return the children pids first, and the parent one last. As such, while the children pids are killed off, the parent sees this and restarts a child proceess. Testing by running 'ps auxww|grep -i spamd' during the restart preocess shows that a new spamd child has appeared before the 'start' executes. When it does execute it fails because of the child process.
I have modified /etc/init.d/functions to reverse the order of pids returned by 'pidof'. As such the parent gets killed off first. Testing this, it works fine every time now.
In /etc/init.d/functions I modified at line 185:
if [ -z "$pid" ]; then
pid=`pidof -c -o $$ -o $PPID -o %PPID -x $1 || \
pidof -c -o $$ -o $PPID -o %PPID -x $base`
# JH - fix for SA 'restart'
pid=`echo $pid | tac -s ' '`
pid=`echo -n $pid`
I used 'tac' because it seemed the easiest way to reverse the list of pids.
Note: SA could write a pid file. But 'killproc' only looks in /var/run, and we start SA as a non-root user. As such that user cannot write a pid file into /var/run. So we must rely on pidof to get the pids.
We start SA with the '-m 15' option because we run busy mail servers. As such there are a lot of child processes, and this in itself probably contributes to the problem. However, the problem exists if we reduce the number to 10, or even to use the default of 5. Our /etc/sysconfig/spamassassin file contains:
SPAMDOPTIONS="-d -x -m 15 -s daemon -u mail --max-conn-per-child=100"
Version-Release number of selected component (if applicable):
Steps to Reproduce:
2.Issue the command 'service spamassassin restart' on a busy server.
Actual Results: The error mentioned above appears - SA fails to restart.
Expected Results: SA should have restarted with no errors.
tac is in /usr/bin - you can't use that in this context.
I'd suspect if pid wraparound has happened since the parent started, this
wouldn't do what you'd want.
I admit 'tac' is not ideal, it was the only thing I could find that would
reverse 'something' for me. Second thought was perhaps modifying 'pidof' to
something like 'pidof -r' where the '-r' reverses the pid order.
Likewise, I agree that pid wrapround would probably cause this to fail. I can,
of course, get SA to write the parent pid out to a file, but then 'functions'
would need to know where to find it - so, again back to modifying 'functions'.
Unfortunately SA itself has some comments/warnings about writing the pid out
before changing to a non-root user. As such this option didn't seem practicable,
despite being the most obvious.
I have thought more about this and have to admit that my first idea was somewhat
nonsense :-) The problem in trying to create a generic solution is that SA can
be run as any user the sysadmin wishes, and can write the pid file wherever
he/she wants to. However, the ISC BIND 'named' process can similarly be run as a
non-root user and write a pid file out.
To that extent I have scrapped the 'functions' changes mentioned initially. I
have modified /etc/sysconfig/spamassassin and /etc/init.d/spamassassin to
recognise the environment variable 'SPAMD_PID'. I created, in our case,
/var/run/spamassassin and 'chmod mail:mail /var/run/spamassassin' to let SA
write the pid file into there.
The /etc/sysconfig/spamassassin file becomes:
# Options to spamd
# Set SPAMD_PID to the PID file path if the SpamAssassin '-r' option is used.
#SPAMDOPTIONS="-d -c -m5 -H"
SPAMDOPTIONS="-d -x -m 15 -s daemon -u mail --max-conn-per-child=100 -r
The /etc/init.d/spamassassin file becomes (relevant bits):
# Start daemon.
echo -n "Starting spamd: "
daemon $NICELEVEL spamd $SPAMDOPTIONS
# [ $RETVAL = 0 ] && touch /var/lock/subsys/spamassassin
if [ $RETVAL = 0 ]; then
[ -n "$SPAMD_PID" ] && ln -s $SPAMD_PID /var/run/spamd.pid
# Stop daemons.
echo -n "Shutting down spamd: "
# [ $RETVAL = 0 ] && rm -f /var/lock/subsys/spamassassin
if [ $RETVAL = 0 ]; then
rm -f /var/lock/subsys/spamassassin
rm -f /var/run/spamd.pid
This is similar to how named is dealt with. As with named, SA 'start' creates a
soft link for /var/run/spamd.pid, and when stopping or restarting this is used
(by killproc in /etc/init.d/functions). Upon testing SA restarts work fine.
OK, assigning to spamasssasin.
*** Bug 141323 has been marked as a duplicate of this bug. ***
The specific implementation suggested in Comment #3 is not good because during
package upgrades, if the /etc/sysconfig/spamassassin file had been previously
modified it will not be replaced and your init.d script would fail. Given this
problem, hardcoding the pid path in the script may be the only supportable solution.
See http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4655 for the solution
that I am currently testing for future spamassassin packages.
Please help me to test this package that will soon go to FC3 and FC4 updates.
It is essentially an upstream 3.0.5 release candidate.
Re comment 8: I have installed onto one of our FC4 mailhubs
spamassassin-3.0.4-2.fc4 from the fedora-updates repo. This seems to work fine;
restarting spamassassin repeatedly using 'service spamassassin restart' worked
every time. Previously this would have failed pretty much immediately.
FWIW, killproc and other initscript functions now support -p to specify the
PID file location, so the symlink hack should no longer be necessary.
This is essentially fixed in FC5 and RHEL4U3, but I want to do a little more
cleaning up of this before closing this bug.
I'm not so sure this is "fixed" in RHEL4U3. We recently ran up2date on our
RHEL4 system, to bring it up to U3. As part of the upgrade, we got a new
version of spamassassin:
spamassassin-3.0.5-3.el4 Thu 16 Mar 2006 05:34:40 PM CST
The install of other packages lasted until 05:49:10 PM, and then up2date
restarted all the daemons. But our boot.log indicates spamd didn't restart
Mar 16 17:51:43 zeus spamassassin: spamd shutdown succeeded
Mar 16 17:51:44 zeus spamd: Could not create INET socket on 127.0.0.1:783:
Address already in use (IO::Socket::INET: Address already in use)
Mar 16 17:51:44 zeus spamassassin: spamd startup failed
The result was that everything worked fine for an hour, until the last child
exited due to the default --max-conn-per-child=200. And then we got the mess of
Mar 16 18:46:30 zeus spamc: connect(AF_INET) to spamd at 127.0.0.1
failed, retrying (#1 of 3): Connection refused
At the time, ps output indicated no spamd processes running, and a
'/etc/init.d/spamassassin restart' worked fine to fix it.
I'm assuming that the restart would have used the newly-installed init script,
so that suggests that this fix was incomplete. Or does the fix only work if
spamd was started using it (due to treatment of a pid file)?
Unfortunately there is nothing we can do to ensure that this works when
upgrading because the old init script didn't generate the pid file. This fix
will only prevent failures in future upgrades, and regular restarts.