Bug 197605

Summary:	Initscripts should not try to umount network mounts at the end
Product:	Red Hat Enterprise Linux 3	Reporter:	Bastien Nocera <bnocera>
Component:	initscripts	Assignee:	Bill Nottingham <notting>
Status:	CLOSED WONTFIX	QA Contact:	Brock Organ <borgan>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.0	CC:	rvokal, tao
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-10-19 18:42:33 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	190430, 200915, 426370, 426371

Description Bastien Nocera 2006-07-04 14:56:56 UTC

initscripts-7.31.30.EL-1

1. Launch a reboot on a running machine
2. Find some way to make sure the network is brought down before an
NFS/autofs/other network filesystem can be umounted
3. See that the last umount -a doesn't ignore network filesystems, and sits
there without finishing the reboot

/etc/init.d/halt could ignore autofs/nfs/etc. filesystems instead of stalling

Comment 4 Miloslav Trmač 2007-01-15 09:16:52 UTC

Copying the question from #200915...

Can you please provide a specific test case?  On RHEL4, neither downing the
interface used for NFS nor stopping the NFS server causes init.d/halt to hang.

Blindly ignoring NFS mounts in init.d/halt can actually prevent other local
filesystems from unmounting, so I'd like to be sure the change actually fixes
something before potentially breaking other setups.  QE will also need a test
case to ensure the bug is indeed fixed.

Comment 22 Issue Tracker 2007-08-21 10:02:23 UTC

The customer was able to track down the problem on his system. This appears
to have happened because of a customer daemon used by the customer which
tries to write some data to an NFS mounted file when an attempt is made to
shut it down.

Pasting the complete comment passed by the customer.

===============
I believe that I've managed to track down what was happening.

We run our system configuration tool (cfengine) on bootup and shutdown. 
There as a bug in the configuration portion that ensures that the LSF
daemons were running (these run out of an NFS-mounted directory).  When
the system would shut down, the LSF daemons would be stopped (as
expected), but then the configuration stuff would run and mistakenly start
them back up.  Then when autofs tried to stop, it had no way of unmounting
the LSF directories out from under them.  And since the KILLALL portion of
the shutdown script doesn't happen until after all filesystems have been
unmounted, the system gets into a deadlock.

While this is certainly an example of a misconfiguration on my end, I
still believe that RHEL should be smart enough to go ahead and shutdown
after some sort of timeout rather than hanging forever.  To reproduce:

1. Create an item in /etc/rc.d/rc0.d that brings up a daemon that holds an
NFS file open at step K55.
2. Shutdown or reboot the system.
3. The daemons started in K55 will cause autofs in K72 to fail to shutdown
(but it will not hang).
4. When S00killall is reached, signals are sent to the daemons, which
refuse to shut down because they cannot write their current state to an
NFS mounted directory (that was successfully unmmounted when autofs was
shut down)
5. System hangs forever, as daemons refuse to die.

To clarify step 4 in my reproduction instructions, the daemons are
"stupid" and when they receive a KILL or TERM or INT signal, they
immediately try to open() an NFS-mounted file (one that still happens to
be mounted because of the open file that the daemon created).  Since the
open() succeeds (as the path exists, according to the kernel), it tries to
write some data to the file.  Since the network was stopped in K90, the
daemons deadlock since they are dumb and never timeout on the write.

...and to further clarify step 5 in my reproduction instructions, the
S01halt/S01reboot script has code in it that tries to unmount all
filesystems before shutting down.  Since there are "dumb" daemons on the
system that have refused to die, even after S00killall executed, and are
holding NFS filesystems open, the unmounts in the halt script are where
the actual system hangs up.

So it's like a cascading hang -- an NFS-mounted binary hangs because it
managed to sneak an NFS mount past the 'autofs' shutdown process.  That
binary then causes S01reboot/S01halt to hang because it can't unmount all
filesystems.  Thus S01reboot/S01halt script never actually reaches the
reboot/poweroff command.
========




This event sent from IssueTracker by sprabhu 
 issue 95591

Comment 23 RHEL Program Management 2007-10-19 18:42:33 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.