Bug 161785 - spamassassin restart fails - functions bug?
spamassassin restart fails - functions bug?
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: spamassassin (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Warren Togami
:
: 141323 (view as bug list)
Depends On:
Blocks: 171491
  Show dependency treegraph
 
Reported: 2005-06-27 06:57 EDT by John Horne
Modified: 2007-11-30 17:11 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-05-09 11:49:16 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description John Horne 2005-06-27 06:57:06 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050524 Fedora/1.0.4-4 Firefox/1.0.4

Description of problem:
When I try and 'restart' spamassassin (SA) it fails with:

  service spamassassin restart
  Shutting down spamd:                                       [  OK  ]
  Starting spamd: Could not create INET socket on 127.0.0.1:783: Address already 
  in use (IO::Socket::INET: Address already in use)          [FAILED]                                                         

The problem seems to be that whilst a 'restart' does a 'stop' and then 'start', the 'stop' calls 'killproc' to find the SA pids. 'killproc' is a function in /etc/init.d/functions. It first looks for a pid file, /var/run/spamd.pid', but this does not exist. So it then calls the 'pidof' command.

This returns the spamd processes pids, but it seems to return the children pids first, and the parent one last. As such, while the children pids are killed off, the parent sees this and restarts a child proceess. Testing by running 'ps auxww|grep -i spamd' during the restart preocess shows that a new spamd child has appeared before the 'start' executes. When it does execute it fails because of the child process.

I have modified /etc/init.d/functions to reverse the order of pids returned by 'pidof'. As such the parent gets killed off first. Testing this, it works fine every time now.

In /etc/init.d/functions I modified at line 185:

===========================================================
  if [ -z "$pid" ]; then
          pid=`pidof -c -o $$ -o $PPID -o %PPID -x $1 || \
                  pidof -c -o $$ -o $PPID -o %PPID -x $base`

# JH - fix for SA 'restart'
          pid=`echo $pid | tac -s ' '`
          pid=`echo -n $pid`
  fi
===========================================================

I used 'tac' because it seemed the easiest way to reverse the list of pids.

Note: SA could write a pid file. But 'killproc' only looks in /var/run, and we start SA as a non-root user. As such that user cannot write a pid file into /var/run. So we must rely on pidof to get the pids.

We start SA with the '-m 15' option because we run busy mail servers. As such there are a lot of child processes, and this in itself probably contributes to the problem. However, the problem exists if we reduce the number to 10, or even to use the default of 5. Our /etc/sysconfig/spamassassin file contains:

  SPAMDOPTIONS="-d -x -m 15 -s daemon -u mail --max-conn-per-child=100"




John.


Version-Release number of selected component (if applicable):
initscripts-8.11.1-1

How reproducible:
Always

Steps to Reproduce:
1.Start spamassaassin
2.Issue the command 'service spamassassin restart' on a busy server.
3.
  

Actual Results:  The error mentioned above appears - SA fails to restart.

Expected Results:  SA should have restarted with no errors.

Additional info:
Comment 1 Bill Nottingham 2005-06-27 11:16:42 EDT
tac is in /usr/bin - you can't use that in this context.

I'd suspect if pid wraparound has happened since the parent started, this
wouldn't do what you'd want.
Comment 2 John Horne 2005-06-27 13:22:00 EDT
I admit 'tac' is not ideal, it was the only thing I could find that would
reverse 'something' for me. Second thought was perhaps modifying 'pidof' to
something like 'pidof -r' where the '-r' reverses the pid order.

Likewise, I agree that pid wrapround would probably cause this to fail. I can,
of course, get SA to write the parent pid out to a file, but then 'functions'
would need to know where to find it - so, again back to modifying 'functions'.

Unfortunately SA itself has some comments/warnings about writing the pid out
before changing to a non-root user. As such this option didn't seem practicable,
despite being the most obvious.
Comment 3 John Horne 2005-06-27 18:28:03 EDT
I have thought more about this and have to admit that my first idea was somewhat
nonsense :-) The problem in trying to create a generic solution is that SA can
be run as any user the sysadmin wishes, and can write the pid file wherever
he/she wants to. However, the ISC BIND 'named' process can similarly be run as a
non-root user and write a pid file out.

To that extent I have scrapped the 'functions' changes mentioned initially. I
have modified /etc/sysconfig/spamassassin and /etc/init.d/spamassassin to
recognise the environment variable 'SPAMD_PID'. I created, in our case,
/var/run/spamassassin and 'chmod mail:mail /var/run/spamassassin' to let SA
write the pid file into there.

The /etc/sysconfig/spamassassin file becomes:

===============================================================
  # Options to spamd
  # Set SPAMD_PID to the PID file path if the SpamAssassin '-r' option is used.
  SPAMD_PID=/var/run/spamassassin/spamd.pid
  #
  #SPAMDOPTIONS="-d -c -m5 -H"
  SPAMDOPTIONS="-d -x -m 15 -s daemon -u mail --max-conn-per-child=100 -r 
    $SPAMD_PID"
===============================================================


The /etc/init.d/spamassassin file becomes (relevant bits):

===============================================================
  start)
        # Start daemon.
        echo -n "Starting spamd: "
        daemon $NICELEVEL spamd $SPAMDOPTIONS
        RETVAL=$?
        echo
#       [ $RETVAL = 0 ] && touch /var/lock/subsys/spamassassin
        if [ $RETVAL = 0 ]; then
                [ -n "$SPAMD_PID" ] && ln -s $SPAMD_PID /var/run/spamd.pid
                touch /var/lock/subsys/spamassassin
        fi
        ;;
  stop)
        # Stop daemons.
        echo -n "Shutting down spamd: "
        killproc spamd
        RETVAL=$?
        echo
#       [ $RETVAL = 0 ] && rm -f /var/lock/subsys/spamassassin
        if [ $RETVAL = 0 ]; then
                rm -f /var/lock/subsys/spamassassin
                rm -f /var/run/spamd.pid
        fi
        ;;
===============================================================


This is similar to how named is dealt with. As with named, SA 'start' creates a
soft link for /var/run/spamd.pid, and when stopping or restarting this is used
(by killproc in /etc/init.d/functions). Upon testing SA restarts work fine.



John.
Comment 4 Bill Nottingham 2005-06-27 23:13:39 EDT
OK, assigning to spamasssasin.
Comment 5 Warren Togami 2005-08-17 00:19:25 EDT
*** Bug 141323 has been marked as a duplicate of this bug. ***
Comment 6 Warren Togami 2005-09-14 03:23:16 EDT
%config(noreplace) %{_sysconfdir}/sysconfig/spamassassin

The specific implementation suggested in Comment #3 is not good because during
package upgrades, if the /etc/sysconfig/spamassassin file had been previously
modified it will not be replaced and your init.d script would fail.  Given this
problem, hardcoding the pid path in the script may be the only supportable solution.

Any objections?
Comment 7 Warren Togami 2005-10-30 22:28:34 EST
See http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4655 for the solution
that I am currently testing for future spamassassin packages.
Comment 8 Warren Togami 2005-11-09 12:41:56 EST
http://people.redhat.com/wtogami/temp/spamassassin/fc3/
http://people.redhat.com/wtogami/temp/spamassassin/fc4/

Please help me to test this package that will soon go to FC3 and FC4 updates. 
It is essentially an upstream 3.0.5 release candidate.
Comment 9 John Horne 2005-11-10 09:50:18 EST
Re comment 8: I have installed onto one of our FC4 mailhubs
spamassassin-3.0.4-2.fc4 from the fedora-updates repo. This seems to work fine;
restarting spamassassin repeatedly using 'service spamassassin restart' worked
every time. Previously this would have failed pretty much immediately.

Many thanks.

John.
Comment 10 Miloslav Trmač 2006-02-03 01:42:07 EST
FWIW, killproc and other initscript functions now support -p to specify the
PID file location, so the symlink hack should no longer be necessary.
Comment 11 Warren Togami 2006-03-08 11:40:59 EST
This is essentially fixed in FC5 and RHEL4U3, but I want to do a little more
cleaning up of this before closing this bug.
Comment 12 Damian Menscher 2006-03-17 13:42:57 EST
I'm not so sure this is "fixed" in RHEL4U3.  We recently ran up2date on our
RHEL4 system, to bring it up to U3.  As part of the upgrade, we got a new
version of spamassassin:

spamassassin-3.0.5-3.el4         Thu 16 Mar 2006 05:34:40 PM CST

The install of other packages lasted until 05:49:10 PM, and then up2date
restarted all the daemons.  But our boot.log indicates spamd didn't restart
properly:

Mar 16 17:51:43 zeus spamassassin: spamd shutdown succeeded
Mar 16 17:51:44 zeus spamd: Could not create INET socket on 127.0.0.1:783:
Address already in use (IO::Socket::INET: Address already in use)
Mar 16 17:51:44 zeus spamassassin: spamd startup failed

The result was that everything worked fine for an hour, until the last child
exited due to the default --max-conn-per-child=200.  And then we got the mess of

Mar 16 18:46:30 zeus spamc[20565]: connect(AF_INET) to spamd at 127.0.0.1
failed, retrying (#1 of 3): Connection refused

At the time, ps output indicated no spamd processes running, and a
'/etc/init.d/spamassassin restart' worked fine to fix it.

I'm assuming that the restart would have used the newly-installed init script,
so that suggests that this fix was incomplete.  Or does the fix only work if
spamd was started using it (due to treatment of a pid file)?
Comment 13 Warren Togami 2006-05-09 11:49:16 EDT
Unfortunately there is nothing we can do to ensure that this works when
upgrading because the old init script didn't generate the pid file.  This fix
will only prevent failures in future upgrades, and regular restarts.

Note You need to log in before you can comment on or make changes to this bug.