Bug 655726

Summary: /etc/init.d/nfs doesn't create proper subsys lock file
Product: [Fedora] Fedora Reporter: Sandro Bonazzola <sandro.bonazzola>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: low    
Version: 14CC: dougsland, gansalmon, iarlyy, itamar, jlayton, jonathan, kernel-maint, madhu.chinakonda, notting, plautrba, rwahl, steved
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-12-01 17:07:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sandro Bonazzola 2010-11-22 10:02:52 UTC
Description of problem:
During the shutdown process it is impossible to unmount /var.
It seems that with the following fstab:
/dev/sda3		/                       ext4    defaults,noatime	1 1
/dev/sda1		/boot                   ext4    defaults        1 2
/dev/sda7		/home                   ext4    defaults        1 2
/dev/sda2		/var                    ext4    defaults        1 2
/dev/sda6		/var/lib                ext4    defaults        1 2
/dev/sda5		/var/log                ext4    defaults        1 2
/dev/sda8		swap                    swap    defaults        0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts			/dev/pts		devpts	gid=5,mode=620	0 0
tmpfs			/tmp			tmpfs	defaults	0 0

During the shutdown the system can't umount /var because it is busy having /var/lib and /var/log still mounted.

Maybe that in /etc/init.d/halt the awk script should be rewritten: it seems that the script just try to unmount sequentially /dev/sda1, /dev/sda2, ... ignoring that /dev/sda2 can't be unmounted untill /dev/sda6 and /dev/sda5 are unmounted.


Version-Release number of selected component (if applicable):
initscripts-9.20.1-1.fc14.i686

  
Actual results:
The system doesn't umount filesystems during shutdown

Expected results:
The system umount filesystems during shutdown

Additional info:
The same fstab on Fedora 10 allow a regular shutdown umounting all filesystems.

Comment 1 Sandro Bonazzola 2010-11-22 10:50:27 UTC
I've added some debug commands to /etc/init.d/functions checking if /var/lib and /var/log are umounted and if something is still using /var using lsof.

It seems that /var/lib and /var/log are correctly umounted so the awk scripts seems to work fine. However lsof doesn't show anything still using /var but umount keep saying it's busy.

Comment 2 Sandro Bonazzola 2010-11-22 11:17:57 UTC
Mmm... I've just added the line:
fuser -mv /var >/root/out.txt

and after the reboot in out.txt there's just "kernel".

Comment 3 Sandro Bonazzola 2010-11-22 14:05:28 UTC
After a deeper investigation it seems that this only appen if something in /var is exported in NFS. The user space programs are all terminated but knfsd still try to mount the exported paths even during shutdown.
So in the end it doesn't seems to be a initscript issue but a kernel issue.

Comment 4 Sandro Bonazzola 2010-11-22 14:07:08 UTC
for reference, kernel-2.6.35.6-48.fc14.i686

Comment 5 Sandro Bonazzola 2010-11-22 15:42:45 UTC
Reproducible also switching to init 1.
after calling telinit 1, I had to stop cups, nfs, nfslock, rpcbind, rpcidmapd, rsyslog.
I've umounted /var/lib and /var/log.
fuser -mv /var shows PID:kernel ACCESS:mount
umount -v /var says /var is busy.
waiting some minutes I've read a message about  last nfs server thread exited, then umount /var worked fine.

lsof and fuser can't detect any process that claim the access to any file in /var.

Just to be sure I've tryed to repeat the test calling export -u -a before umounting /var with the same result.

In my exportfs I have /var/lib exported through nfs but /var/lib umounted successfully. I can't explain why /var is still busy.

Comment 6 Sandro Bonazzola 2010-11-24 09:37:54 UTC
I've extended the test exporting other paths in other partitions.
The problem exists on every single partition that contains a NFS exported path.
However, adding the command:
exportfs -u -a just before __umount_loop in /etc/init.d/halt allow a clean umount of all the partitions with the only exception of /var.

maybe it could be caused by something at kernel nfsd level that hold a reference to something in /var/lib/nfs ?

Comment 7 Sandro Bonazzola 2010-11-24 11:18:21 UTC
Looking with attention at the shutdown  procedure, it seems that the service NFS is never stopped before /etc/init.d/halt is executed.
I have added an explicit call to /etc/init.d/nfs stop as first action in /etc/init.d/halt.
Now I can see NFS service stopping but nfsd kernel server is still up when the umount loop begins.

nfs-utils-1.2.3-1.fc14.i686
initscripts-9.20.1-1.fc14.i686

I've tried also adding
exportfs -u -a
rpc.nfsd -- 0

just before killproc nfsd -2 in /etc/init.d/nfs
with no effect.

So it seems there are 2 bugs:
 - the shutdown procedure doesn't stop all the services before calling halt
 - the nfsd kernel server doesn't stop.

Comment 8 Sandro Bonazzola 2010-11-24 13:17:38 UTC
Ok, part 1: NFS service is not stopped during shutdown procedure:
initscripts-9.20.1-1.fc14.i686
file /etc/rc:

for i in /etc/rc$runlevel.d/K* ; do
  subsys=${i#/etc/rc$runlevel.d/K??}
  [ -f /var/lock/subsys/$subsys ] || [ -f /var/lock/subsys/$subsys.init ] || continue
 ...


So during shutdown the script find /etc/rc0.d/K60nfs and assign subsys to nfs
then check for /var/lock/subsys/nfs

now examining nfs-utils-1.2.3-1.fc14.i686
file /etc/init.d/nfs, the script creates.
/var/lock/subsys/rpc.mountd
/var/lock/subsys/nfsd

but don't create /var/lock/subsys/nfs, so the service nfs could not be stopped.
Just add the following line in the start case:
touch /var/lock/subsys/nfs allow the service stop properly.

Just add in the stop case
rm /var/lock/subsys/nfs

for removing the lock file.

I hope this will be fixed as soon as possible.

Comment 9 Sandro Bonazzola 2010-11-24 13:27:17 UTC
Verifyng the patch of comment #8 : now the shutdown sequence is properly restored having nfs service stopped at step 60 and rpc at step 87. /var is not more busy and can be umounted properly.
In the end this is only a nfs-util bug that could be fixed in some minutes.
Please push the update as soon as possible.

Comment 10 Sandro Bonazzola 2010-11-25 07:05:44 UTC
Any additional info needed?

Comment 11 Sandro Bonazzola 2010-11-26 08:59:32 UTC
Added keyword Regression since this bug is not present for example in Fedora 10.

Comment 12 Ronald Wahl 2010-11-27 23:27:27 UTC
*** Bug 656003 has been marked as a duplicate of this bug. ***

Comment 13 Ronald Wahl 2010-11-27 23:35:00 UTC
I see this on my F14 machines as well. A workmate told me of similar problems with his F13 box. It looks like this hard coded /var/lock/subsys/XXX stuff and some generic code that makes assumptions on the names does not play well together. Probably the init-scripts should not deal with this lock files altogether as it seems to be error-prone and abstract this into functions.

Comment 14 Ronald Wahl 2010-11-27 23:38:46 UTC
I'm not sure if this bug is correctly assigned. Can anyone assign this to the correct person?

Comment 15 Steve Dickson 2010-12-01 17:07:35 UTC

*** This bug has been marked as a duplicate of bug 652786 ***