Red Hat Bugzilla – Bug 655726
/etc/init.d/nfs doesn't create proper subsys lock file
Last modified: 2010-12-01 12:07:35 EST
Description of problem:
During the shutdown process it is impossible to unmount /var.
It seems that with the following fstab:
/dev/sda3 / ext4 defaults,noatime 1 1
/dev/sda1 /boot ext4 defaults 1 2
/dev/sda7 /home ext4 defaults 1 2
/dev/sda2 /var ext4 defaults 1 2
/dev/sda6 /var/lib ext4 defaults 1 2
/dev/sda5 /var/log ext4 defaults 1 2
/dev/sda8 swap swap defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
tmpfs /tmp tmpfs defaults 0 0
During the shutdown the system can't umount /var because it is busy having /var/lib and /var/log still mounted.
Maybe that in /etc/init.d/halt the awk script should be rewritten: it seems that the script just try to unmount sequentially /dev/sda1, /dev/sda2, ... ignoring that /dev/sda2 can't be unmounted untill /dev/sda6 and /dev/sda5 are unmounted.
Version-Release number of selected component (if applicable):
The system doesn't umount filesystems during shutdown
The system umount filesystems during shutdown
The same fstab on Fedora 10 allow a regular shutdown umounting all filesystems.
I've added some debug commands to /etc/init.d/functions checking if /var/lib and /var/log are umounted and if something is still using /var using lsof.
It seems that /var/lib and /var/log are correctly umounted so the awk scripts seems to work fine. However lsof doesn't show anything still using /var but umount keep saying it's busy.
Mmm... I've just added the line:
fuser -mv /var >/root/out.txt
and after the reboot in out.txt there's just "kernel".
After a deeper investigation it seems that this only appen if something in /var is exported in NFS. The user space programs are all terminated but knfsd still try to mount the exported paths even during shutdown.
So in the end it doesn't seems to be a initscript issue but a kernel issue.
for reference, kernel-126.96.36.199-48.fc14.i686
Reproducible also switching to init 1.
after calling telinit 1, I had to stop cups, nfs, nfslock, rpcbind, rpcidmapd, rsyslog.
I've umounted /var/lib and /var/log.
fuser -mv /var shows PID:kernel ACCESS:mount
umount -v /var says /var is busy.
waiting some minutes I've read a message about last nfs server thread exited, then umount /var worked fine.
lsof and fuser can't detect any process that claim the access to any file in /var.
Just to be sure I've tryed to repeat the test calling export -u -a before umounting /var with the same result.
In my exportfs I have /var/lib exported through nfs but /var/lib umounted successfully. I can't explain why /var is still busy.
I've extended the test exporting other paths in other partitions.
The problem exists on every single partition that contains a NFS exported path.
However, adding the command:
exportfs -u -a just before __umount_loop in /etc/init.d/halt allow a clean umount of all the partitions with the only exception of /var.
maybe it could be caused by something at kernel nfsd level that hold a reference to something in /var/lib/nfs ?
Looking with attention at the shutdown procedure, it seems that the service NFS is never stopped before /etc/init.d/halt is executed.
I have added an explicit call to /etc/init.d/nfs stop as first action in /etc/init.d/halt.
Now I can see NFS service stopping but nfsd kernel server is still up when the umount loop begins.
I've tried also adding
exportfs -u -a
rpc.nfsd -- 0
just before killproc nfsd -2 in /etc/init.d/nfs
with no effect.
So it seems there are 2 bugs:
- the shutdown procedure doesn't stop all the services before calling halt
- the nfsd kernel server doesn't stop.
Ok, part 1: NFS service is not stopped during shutdown procedure:
for i in /etc/rc$runlevel.d/K* ; do
[ -f /var/lock/subsys/$subsys ] || [ -f /var/lock/subsys/$subsys.init ] || continue
So during shutdown the script find /etc/rc0.d/K60nfs and assign subsys to nfs
then check for /var/lock/subsys/nfs
now examining nfs-utils-1.2.3-1.fc14.i686
file /etc/init.d/nfs, the script creates.
but don't create /var/lock/subsys/nfs, so the service nfs could not be stopped.
Just add the following line in the start case:
touch /var/lock/subsys/nfs allow the service stop properly.
Just add in the stop case
for removing the lock file.
I hope this will be fixed as soon as possible.
Verifyng the patch of comment #8 : now the shutdown sequence is properly restored having nfs service stopped at step 60 and rpc at step 87. /var is not more busy and can be umounted properly.
In the end this is only a nfs-util bug that could be fixed in some minutes.
Please push the update as soon as possible.
Any additional info needed?
Added keyword Regression since this bug is not present for example in Fedora 10.
*** Bug 656003 has been marked as a duplicate of this bug. ***
I see this on my F14 machines as well. A workmate told me of similar problems with his F13 box. It looks like this hard coded /var/lock/subsys/XXX stuff and some generic code that makes assumptions on the names does not play well together. Probably the init-scripts should not deal with this lock files altogether as it seems to be error-prone and abstract this into functions.
I'm not sure if this bug is correctly assigned. Can anyone assign this to the correct person?
*** This bug has been marked as a duplicate of bug 652786 ***