Red Hat Bugzilla – Bug 63602
NFS client won't shut down if server is down
Last modified: 2007-04-18 12:42:02 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020408
Description of problem:
I shut down an internal NFS server, and then shut down its client. The client
was left 4 hours trying to shut down, but it wouldn't. I had to power it off.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.automount (with amd) a filesystem from a remote nfs server
2.shut the server down
3.when the server is down, shut the client down
Actual Results: the client's /var/log/messages says:
Apr 15 06:53:53 free ntpd: ntpd shutdown succeeded
Apr 15 06:53:53 free umount: umount: can't get address for libero
Apr 15 06:53:53 free umount: umount2: Device or resource busy
Apr 15 06:53:53 free umount: umount: /.automount/libero/root/l: device is busy
Apr 15 06:53:53 free umount: Cannot MOUNTPROG RPC: RPC: Program not registered
Apr 15 06:53:54 free netfs: Unmounting NFS filesystems: failed
Apr 15 06:54:06 free kernel: nfs: server libero not responding, still trying
When I woke up, I noticed the machine had not shut down, and rebooted it:
Apr 15 10:51:45 free shutdown: shutting down for system reboot
Apr 15 10:51:45 free init: Switching to runlevel: 6
Apr 15 10:51:47 free umount: umount: can't get address for libero
Apr 15 10:51:47 free umount: umount2: Device or resource busy
Apr 15 10:51:47 free umount: umount: /.automount/libero/root/l: device is busy
Apr 15 10:51:47 free netfs: Unmounting NFS filesystems: failed
Apr 15 10:52:00 free kernel: nfs: server libero not responding, still trying
Apr 15 10:52:22 free shutdown: shutting down for system reboot
and it remained like that for a few more minutes. I gave up, powered the
machine off and went back to bed for a while longer :-)
Expected Results: I'd expected the shutdown to time out and give up on waiting
for the server to come back.
AFAIK, there were no pending writes to the NFS server that might have caused the
kernel to play safe and not reboot. In any case, it would still be nice to have
some form to tell it to really ``shut down, the server is not coming back.'' In
general, when you get to that point, you can't get a shell or log in remotely
any longer, which makes this tricky.
I don't know whether this makes any difference, but at the time the client was
going down, the only DNS server configured to resolve names for it (127.0.0.1)
had already gone down.
*** Bug 69802 has been marked as a duplicate of this bug. ***
Created attachment 90275 [details]
Patch to /etc/init.d/netfs that fixes the problem
This patch seems to fix the problem for me. It pretty much waits for fuser to
complete, but if fuser remains blocked in disk wait for about 5 seconds, it
gives up on waiting for it to complete.
*** Bug 82795 has been marked as a duplicate of this bug. ***
Correct me if I am wrong, but isn't this part of the point of Hard
mounting over Soft mounting?
The purpose of hard mounting is that the file system stays up
indefinetly waiting for the server to come back online.
If the server has dissapeared for what ever reason, then it is a
"serious" situation. Do we want to alter the scripts to just drop this
Would it not be better, that if this problem is frequent that the
users mounts them softly, so that the kernel can receive the failure
messages and give up on the mount point?
Just my two pence worth, but I am a mear amature :)