Bug 454286 - problems bringing up lockd after it has been taken down
Summary: problems bringing up lockd after it has been taken down
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jeff Layton
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-07-07 14:52 UTC by Jeff Layton
Modified: 2009-05-20 19:29 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-09-29 12:02:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeff Layton 2008-07-07 14:52:16 UTC
Do this on 2.6.18-92.1.6.el5debug kernel after a fresh reboot:

1) mount a tcp NFSv3 filesystem
2) unmount it
3) service nfs start

...nfsd will fail to start because lockd_up fails. From dmesg:

FS-Cache: Loaded
FS-Cache: netfs 'nfs' registered for caching
SELinux: initialized (dev 0:17, type nfs), uses genfs_contexts
Installing knfsd (copyright (C) 1996 okir.de).
SELinux: initialized (dev nfsd, type nfsd), uses genfs_contexts
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
lockd_up: makesock failed, error=-98
lockd_down: no lockd running.
nfsd: last server has exited
nfsd: unexporting all filesystems

...then if you do a "service nfs restart":

lockd_up: no pid, 2 users??
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
nfsd: last server has exited
nfsd: unexporting all filesystems

...so I think we have a couple of bugs here. Something is causing the makesock
to fail and when this occurs, lockd_up isn't handling the error condition
appropriately and it's throwing off the nlmsvc_users counter.

I suspect this is a regression from 5.1, but I need to confirm it.

Comment 1 Jeff Layton 2008-07-07 16:27:14 UTC
Actually, this doesn't appear to be a regression. When I do the same test on
-8.el5, then I get these messages:

lockd_up: makesock failed, error=-98
lockd_up: no pid, 2 users??
lockd_up: no pid, 3 users??
lockd_up: no pid, 4 users??
lockd_up: no pid, 5 users??
lockd_up: no pid, 6 users??
lockd_up: no pid, 7 users??
lockd_up: no pid, 8 users??


...and lockd isn't started. Since no one has complained about this, I'll put
this on 5.4 proposed for now. If the fix turns out to be simple I may move it to
5.3...



Comment 2 Jeff Layton 2008-07-08 15:46:03 UTC
This problem has strangely "fixed itself". Yesterday, I could reliably reproduce
this. Today, I can't make it happen.

The host where I saw this was a RHEL5 FV xen guest. It looked like the power
blinked at the office and the xen dom0 rebooted. I brought my RHEL5 image back
up and now this isn't happening anymore. It seems unlikely, but maybe this is
something to do with being a guest on a long running dom0?

I'll leave this open for now in case it happens again...


Comment 3 Jeff Layton 2008-09-29 12:01:44 UTC
Closing this out. I've not seen this problem since, though it still worries me that I saw it at all. I'll reopen it if it returns.

Comment 4 Ram Kesavan 2009-05-20 19:29:53 UTC
I am not sure if this is important, but you will get this error if the portmapper is not running, start portmapper /etc/init.d/portmapper and try the mount and it will work properly.


Note You need to log in before you can comment on or make changes to this bug.