Bug 158268 - am-utils init script calls amq -f an insane number of times (when stopping sevice)
am-utils init script calls amq -f an insane number of times (when stopping se...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: am-utils (Show other bugs)
3.0
All Linux
medium Severity low
: ---
: ---
Assigned To: Peter Vrabec
Jay Turner
RHEL3U7NAK
:
Depends On:
Blocks: 170445
  Show dependency treegraph
 
Reported: 2005-05-19 23:34 EDT by Jonathan Peatfield
Modified: 2015-01-07 19:09 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-23 23:21:38 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jonathan Peatfield 2005-05-19 23:34:24 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040616

Description of problem:
The stop part of the script does a whole bunch of amq -uf calls on all mounted file-systems.

Leaving aside that amq -u will take a list of filesystems so the xargs -n 1 is pointless (apart from making things worse), there are worse things.

Clearly the author didn't know what -f would do (if asks amd to "flush" it's cache of maps).

This means that if you have N mounted systems, the stop script asks amd to "flush" itself 10*N times (and very quickly on a fast machine).  If the map data is provided by (say) a nis/yp map, then ypserv will get somewhat stressed (and start refusing to answer in the case of the Solaris8 server).

The net result if that *frequently* it ends up with amd dying (that itself is a bug but we are stressing it here), and not cleaning up after itself (ie leaving *toplvl* mounts which of course are not being serviced any more).

The author clearly expected amd to die in the middle of the amq storm because it actually tests if amd is alive (to cause a sleep), after each set of amq -uf calls.

am-utils comes with a script ctl-amd which works perfectly well on all other platforms we use, and it's stop logic goes (roughly):

  kill amd
  loop (up to 5 times) waiting for amd to die

A simple call to killproc would probably be suffient.  However if you want to ensure that no toplvl mounts are left over, they can be forcably umounted (with the -fl option), e.g.

	killproc $amd
	RETVAL=$?

	for dir in $(/bin/awk '/\w+:\(pid/ {print $2}' </proc/mounts)
	do
          # The -fl forces a "lazy" unmount which can ALWAYS work
          # clearing the mountpoint for later re-use
          /bin/umount -fl $dir
        done

you may want to improve the umount loop or just not do that step at all since amd really shouldn't leave toplvl mounts behind at all (I added it to my script 'cos I'm paranoid).

Version-Release number of selected component (if applicable):
am-utils-6.0.9-2.4

How reproducible:
Always

Steps to Reproduce:
1. read the am-utils.init script
2. run sh -x /etc/rc.d/init.d/amd
3. seem how many calls to amq there are...
  

Actual Results:  Be shocked by the script.  More seriously, it seems to be responsible for machines failing to cleanly go down.  Any stuck toplvl mountpoints will cause any process which touches them to get stuck in a disk-wait (which will never end)...

I've convinced myself (ha) that this is the cause of most of the problems we see 

Expected Results:  amd should just get a signal and do down cleanly (removing it's toplvl mountpoints).  There is little point in doing the amq -u and absolutly no point in the -f at all.

Additional info:

The same script is present as far back as RH62 (that I can easily check), and quite possibly further.  The same code also exists in am-utils-6.0.9-10 in RHEL4.

It may be that when the code was added amq -u didn't take a list of file-systems, but the oldest machines I can easily test certainly do.

I'm told that the "asking amd to umount things" was a piece of voodoo which was an attempt to work round earlier bugs in amd (long since fixed).
Comment 1 Jonathan Peatfield 2005-05-19 23:38:47 EDT
Sorry about the bad formatting.

I'd intended to say:

  I've convinced myself (ha) that this is the cause of most of the problems
  we see with machines occasionally failing to sht down or reboot.

rather than ending part way through the sentence.
Comment 2 Peter Vrabec 2005-10-05 09:46:05 EDT
I suggest:

stop() {
        echo -n $"Stopping $prog: "
        killproc $amd -TERM
        # this part is from wait4amd2die
        delay=3
        count=10
        i=1
        maxcount=`expr $count + 1`
        while [ $i != $maxcount ]; do
                # run amq
                /usr/sbin/amq > /dev/null 2>&1
                RETVAL=$?
                if [ $RETVAL != 0 ]
                then
                        # amq failed to run (because amd is dead)
                        rm -f /var/lock/subsys/amd /var/run/amd.pid
                        echo
                        return $RETVAL
                fi
                sleep $delay
                i=`expr $i + 1`
        done
        failure $"amd shutdown"
        echo
        echo "amd is still up"
        return $RETVAL
}
Comment 3 Jonathan Peatfield 2005-10-05 10:01:14 EDT
wait4amd2die takes arguments for delay/count so it has (half) an excuse to do
the maxcount=`expr $count + 1` nonsense, though to be honest starting i at 0 and
just counting to $count seems more obvious to me anyway.  Can you assume bash
extentions and use $(( ... )) rather then `expr ...`?

So it would become:

stop() {
        echo -n $"Stopping $prog: "
        killproc $amd -TERM
        # this part is from wait4amd2die
        delay=1
        count=10
        i=0
        while [ $i != $count ]; do
                # run amq
                /usr/sbin/amq > /dev/null 2>&1
                RETVAL=$?
                if [ $RETVAL != 0 ]
                then
                        # amq failed to run (because amd is dead)
                        rm -f /var/lock/subsys/amd /var/run/amd.pid
                        echo
                        return $RETVAL
                fi
                sleep $delay
                i=$(( $i + 1 ))
        done
        failure $"amd shutdown"
        echo
        echo "amd is still up"
        return $RETVAL
}

Comment 4 Jonathan Peatfield 2005-10-05 10:11:16 EDT
Oh I forgot to say, am-utils since 6.1.1 (I think) does the force unmount of the
toplvls itself, so if you are thinking of building a newer version that bit of
code I suggested shouldn't be needed any more...

Comment 5 Peter Vrabec 2005-10-05 15:36:39 EDT
fixed in devel, am-utils-6.1.2.1-1
Comment 6 Jonathan Peatfield 2005-10-18 08:35:15 EDT
I assume that you know that am-utils 6.1.3 was released a few days ago...

Note You need to log in before you can comment on or make changes to this bug.