Red Hat Bugzilla – Bug 100537
loopback-mounted NFS hangs shutdown
Last modified: 2014-03-16 22:37:33 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030703
Description of problem:
Since NFS servers are stopped before NFS filesystems are unmounted, a
loopback-mounted NFS filesystem, such as that created by amd when referencing
/net/localhost, causes shutdown to hang indefinitely, while umount waits for the
server to come back. It obviously never will before shutdown completes => deadlock.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.export some local filesystem
2.start nfs and amd
3.access /net/localhost to get it mounted
4.shut down or reboot
Actual Results: NFS server stops before amd and before netfs, so we can't
umount it, and we hang indefinitely.
Expected Results: It should give up at some point. nfs server should probably
be stopped very late in the game, such that, even in the case of cross-mounted
NFS servers shutting down, both of them could succeed.
Getting a process to keep an open file in /net/localhost could be used as a
denial of service attack: if the server is to reboot or shutdown immediately, it
won't, requiring manual intervention to power it down, which may cause a lot of
inconvenience or even loss of data (consider a raid 5 system that has to be
powered down, and a disk is lost between the update of a block and the update of
its checksum block).
The patch in bug 63602 could help solve this problem, even though it's not the
ideal solution, as we'd better umount the filesystem before the server goes
down, otherwise we might lose data.
Fixed in Fedora Core test3. Even if I start a screen session, cd to
/net/locallhost/<dir> in it and disconnect, then request a reboot, the machine
comes down, even though there are RPC sendmsg errors logged to the console just
before the machine goes down. This is probably as good as it gets.
Whatever fix it was, it didn't make it to RHEL 3 :-(
There actually aren't any changes in that area between Taroon and Cambridge.
I noticed there hadn't been changes to initscripts, so fuser was my prime
suspect of having fixed it, but now I see fuser is unchanged, but the problem is
definitely gone. Something must have fixed it, even if it's just because the
kernel is reponding differently to accesses to broken NFS mounts.
Are you still seeing this? You may also want to see bug 138788.
Not on Fedora devel, no. The problem is gone there. Dunno about RHEL3.
Closing as DEFERRED for a later RHEL release, then.