Description of problem: When automount manages a large number of automounts, the umounting all of them while exiting can take longer then anticipated. As shipped, the autofs5 init script waits 45 seconds for the automount daemon to exit, after which it reports that the daemon failed to exit and itself exits. During this time the daemon itself is still busy umounting NFS mounts before exiting. Version-Release number of selected component (if applicable): autofs5-5.0.1-0.rc2.106.el4_8.2 RHEL4U8 x86_64 How reproducible: Always (for the customer) Steps to Reproduce: 1. Configure a autofs map with wild cards that match a large number of directories on an NFS export. 2. Access the directories. 3. Shutdown automount daemon without waiting for the mounts to expire. Actual results: 'service autofs5 stop' will report FAIL. Expected results: The init script should wait for a longer time interval to allow autofs to exit. Additional info:
Created attachment 404612 [details] autofs5 debug log
The customer works around this bug by running with the line 80 modified in /etc/init.d/autofs5: [ $RETVAL = 0 -a -z "`pidof $DAEMON`" ] || sleep 3 Modified to: [ $RETVAL = 0 -a -z "`pidof $DAEMON`" ] || sleep 60
Created attachment 407970 [details] Patch - improve shutdown wait
A test package with the above patch has been built and is available at: http://people.redhat.com/~ikent/autofs5-debuginfo-5.0.1-0.rc2.109.bz579631.3 Please test this package and report results.
Created attachment 410307 [details] Patch - improve shutdown wait (rev 2)
A test package with the above patch has been built and is available at: http://people.redhat.com/~ikent/autofs5-5.0.1-0.rc2.109.bz579631.4 While this change only increases the time to wait for autofs to exit it still isn't as long as originally requested. So, hopefully it won't attract criticism from those that don't like long waits in the shutdown. But the most recent trace of the init script stop action may be showing a problem with autofs not exiting at all at shutdown so please pay attention to this and report the results of your testing. We have had another report of this but I have been unable to reproduce the issue. If this is happening here I'm keen to get more information. Please test this package and report results.
How many entries are present in /proc/mounts and /etc/mtab when the shutdown is initiated? During the shutdown, when we are seeing umounts take a long time, what is the CPU usage of automount? If it is high, how long does it stay high and what is the reported percentage of CPU usage?
The log from comment #14 shows that one umount occurred after the first 6 seconds and the second umount took 4 minutes 46 seconds. The remaining umounts occurred at between 5 and 8 umounts per second. So the server providing patools3:/t1sw4/sopt7/icds either was down for a time or responded unreasonably slowly. The reason the init script gives up and returns an error is to stop long waits like this from preventing system shutdown. The problem here though is that this behaviour isn't appropriate for a restart, where we need to wait for autofs to exit before starting again. Not sure which way I will resolve this yet. We also see from the log that, after all mounts have gone away it takes a further 2 minutes 37 seconds before the automount process finishes. This seems longer than it should be but it is very difficult to work out what is causing it. It may be due to latency in thread termination together with the state queue manager thread not being responsive enough to thread terminations. I don't think this is worth chasing specifically as it may be more difficult to resolve than we expect and will not yield sufficient benefit to warrant the amount of effort. Also, this may be improved with coming RHEL-4 changes.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.