Bug 454286 - problems bringing up lockd after it has been taken down
problems bringing up lockd after it has been taken down
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
medium Severity medium
: rc
: ---
Assigned To: Jeff Layton
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-07-07 10:52 EDT by Jeff Layton
Modified: 2009-05-20 15:29 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-09-29 08:02:02 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jeff Layton 2008-07-07 10:52:16 EDT
Do this on 2.6.18-92.1.6.el5debug kernel after a fresh reboot:

1) mount a tcp NFSv3 filesystem
2) unmount it
3) service nfs start

...nfsd will fail to start because lockd_up fails. From dmesg:

FS-Cache: Loaded
FS-Cache: netfs 'nfs' registered for caching
SELinux: initialized (dev 0:17, type nfs), uses genfs_contexts
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
SELinux: initialized (dev nfsd, type nfsd), uses genfs_contexts
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
lockd_up: makesock failed, error=-98
lockd_down: no lockd running.
nfsd: last server has exited
nfsd: unexporting all filesystems

...then if you do a "service nfs restart":

lockd_up: no pid, 2 users??
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
nfsd: last server has exited
nfsd: unexporting all filesystems

...so I think we have a couple of bugs here. Something is causing the makesock
to fail and when this occurs, lockd_up isn't handling the error condition
appropriately and it's throwing off the nlmsvc_users counter.

I suspect this is a regression from 5.1, but I need to confirm it.
Comment 1 Jeff Layton 2008-07-07 12:27:14 EDT
Actually, this doesn't appear to be a regression. When I do the same test on
-8.el5, then I get these messages:

lockd_up: makesock failed, error=-98
lockd_up: no pid, 2 users??
lockd_up: no pid, 3 users??
lockd_up: no pid, 4 users??
lockd_up: no pid, 5 users??
lockd_up: no pid, 6 users??
lockd_up: no pid, 7 users??
lockd_up: no pid, 8 users??


...and lockd isn't started. Since no one has complained about this, I'll put
this on 5.4 proposed for now. If the fix turns out to be simple I may move it to
5.3...

Comment 2 Jeff Layton 2008-07-08 11:46:03 EDT
This problem has strangely "fixed itself". Yesterday, I could reliably reproduce
this. Today, I can't make it happen.

The host where I saw this was a RHEL5 FV xen guest. It looked like the power
blinked at the office and the xen dom0 rebooted. I brought my RHEL5 image back
up and now this isn't happening anymore. It seems unlikely, but maybe this is
something to do with being a guest on a long running dom0?

I'll leave this open for now in case it happens again...
Comment 3 Jeff Layton 2008-09-29 08:01:44 EDT
Closing this out. I've not seen this problem since, though it still worries me that I saw it at all. I'll reopen it if it returns.
Comment 4 Ram Kesavan 2009-05-20 15:29:53 EDT
I am not sure if this is important, but you will get this error if the portmapper is not running, start portmapper /etc/init.d/portmapper and try the mount and it will work properly.

Note You need to log in before you can comment on or make changes to this bug.