Bug 197605
Summary: | Initscripts should not try to umount network mounts at the end | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Bastien Nocera <bnocera> |
Component: | initscripts | Assignee: | Bill Nottingham <notting> |
Status: | CLOSED WONTFIX | QA Contact: | Brock Organ <borgan> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.0 | CC: | rvokal, tao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-10-19 18:42:33 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 190430, 200915, 426370, 426371 |
Description
Bastien Nocera
2006-07-04 14:56:56 UTC
Copying the question from #200915... Can you please provide a specific test case? On RHEL4, neither downing the interface used for NFS nor stopping the NFS server causes init.d/halt to hang. Blindly ignoring NFS mounts in init.d/halt can actually prevent other local filesystems from unmounting, so I'd like to be sure the change actually fixes something before potentially breaking other setups. QE will also need a test case to ensure the bug is indeed fixed. The customer was able to track down the problem on his system. This appears to have happened because of a customer daemon used by the customer which tries to write some data to an NFS mounted file when an attempt is made to shut it down. Pasting the complete comment passed by the customer. =============== I believe that I've managed to track down what was happening. We run our system configuration tool (cfengine) on bootup and shutdown. There as a bug in the configuration portion that ensures that the LSF daemons were running (these run out of an NFS-mounted directory). When the system would shut down, the LSF daemons would be stopped (as expected), but then the configuration stuff would run and mistakenly start them back up. Then when autofs tried to stop, it had no way of unmounting the LSF directories out from under them. And since the KILLALL portion of the shutdown script doesn't happen until after all filesystems have been unmounted, the system gets into a deadlock. While this is certainly an example of a misconfiguration on my end, I still believe that RHEL should be smart enough to go ahead and shutdown after some sort of timeout rather than hanging forever. To reproduce: 1. Create an item in /etc/rc.d/rc0.d that brings up a daemon that holds an NFS file open at step K55. 2. Shutdown or reboot the system. 3. The daemons started in K55 will cause autofs in K72 to fail to shutdown (but it will not hang). 4. When S00killall is reached, signals are sent to the daemons, which refuse to shut down because they cannot write their current state to an NFS mounted directory (that was successfully unmmounted when autofs was shut down) 5. System hangs forever, as daemons refuse to die. To clarify step 4 in my reproduction instructions, the daemons are "stupid" and when they receive a KILL or TERM or INT signal, they immediately try to open() an NFS-mounted file (one that still happens to be mounted because of the open file that the daemon created). Since the open() succeeds (as the path exists, according to the kernel), it tries to write some data to the file. Since the network was stopped in K90, the daemons deadlock since they are dumb and never timeout on the write. ...and to further clarify step 5 in my reproduction instructions, the S01halt/S01reboot script has code in it that tries to unmount all filesystems before shutting down. Since there are "dumb" daemons on the system that have refused to die, even after S00killall executed, and are holding NFS filesystems open, the unmounts in the halt script are where the actual system hangs up. So it's like a cascading hang -- an NFS-mounted binary hangs because it managed to sneak an NFS mount past the 'autofs' shutdown process. That binary then causes S01reboot/S01halt to hang because it can't unmount all filesystems. Thus S01reboot/S01halt script never actually reaches the reboot/poweroff command. ======== This event sent from IssueTracker by sprabhu issue 95591 This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you. |