Description of Problem: The /etc/init.d/amd script uses a function 'killproc' from /etc/init.d/functions to terminate amd. The not-so wonderful feature of 'killproc' is that it send a TERM signal to amd, and then about five or six seconds later, sends a KILL signal. This can leave things in a state where amd cannot be restarted. Since the amd process was agressively killed before the unmount of all the toplvl nodes completed, you get mtab entries left behind, and toplvl nodes 'connected' to non-extant processes. When you restart amd, you either get a stale filehandle, or more likely, the formerly toplvl node gets restarted with type 'link' and is useless. The restart as type link happens because the amd process is killed before /etc/mtab is updated. The only way to clear this state is a system reboot. This can make amd look unreliable, when the real culprit is the /etc/init.d/amd script. For example, on our systems with only three toplvl nodes, the invocation of /etc/init.d/amd restart would work about 95% of the time. We recently changed a number of systems such that they now have six toplvl nodes, so it now takes longer for amd to die. In this configuration, the 'last' of the six gets hit with the bug 99% of the time. Version-Release number of selected component (if applicable): How Reproducible: Easily Steps to Reproduce: 1. create an /etc/amd.conf with six or more top level nodes 2. run /etc/init.d/amd start then /etc/init.d/amd stop 3. look in /etc/mtab for nodes linked to dead processes 4. run /etc/init.d/amd start 5. run amq and look for nodes with type 'link' that should be toplvl Actual Results: amq will show nodes of type 'link' that should be of type 'toplvl' Expected Results: immediately after a start, the output of amq should show only the root node and toplvl nodes Additional Information: new /etc/init.d/amd script (fix) attached
Created attachment 33238 [details] /etc/init.d/amd which correctly waits for amd to shut down
I suggest: stop() { echo -n $"Stopping $prog: " killproc $amd -TERM # this part is from wait4amd2die delay=3 count=10 i=1 maxcount=`expr $count + 1` while [ $i != $maxcount ]; do # run amq /usr/sbin/amq > /dev/null 2>&1 RETVAL=$? if [ $RETVAL != 0 ] then # amq failed to run (because amd is dead) rm -f /var/lock/subsys/amd /var/run/amd.pid echo return $RETVAL fi sleep $delay i=`expr $i + 1` done failure $"amd shutdown" echo echo "amd is still up" return $RETVAL }