454286 – problems bringing up lockd after it has been taken down

Bug 454286 - problems bringing up lockd after it has been taken down

Summary: problems bringing up lockd after it has been taken down

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Jeff Layton
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-07-07 14:52 UTC by Jeff Layton
Modified:	2009-05-20 19:29 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-09-29 12:02:02 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jeff Layton 2008-07-07 14:52:16 UTC

Do this on 2.6.18-92.1.6.el5debug kernel after a fresh reboot:

1) mount a tcp NFSv3 filesystem
2) unmount it
3) service nfs start

...nfsd will fail to start because lockd_up fails. From dmesg:

FS-Cache: Loaded
FS-Cache: netfs 'nfs' registered for caching
SELinux: initialized (dev 0:17, type nfs), uses genfs_contexts
Installing knfsd (copyright (C) 1996 okir.de).
SELinux: initialized (dev nfsd, type nfsd), uses genfs_contexts
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
lockd_up: makesock failed, error=-98
lockd_down: no lockd running.
nfsd: last server has exited
nfsd: unexporting all filesystems

...then if you do a "service nfs restart":

lockd_up: no pid, 2 users??
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
nfsd: last server has exited
nfsd: unexporting all filesystems

...so I think we have a couple of bugs here. Something is causing the makesock
to fail and when this occurs, lockd_up isn't handling the error condition
appropriately and it's throwing off the nlmsvc_users counter.

I suspect this is a regression from 5.1, but I need to confirm it.

Comment 1 Jeff Layton 2008-07-07 16:27:14 UTC

Actually, this doesn't appear to be a regression. When I do the same test on
-8.el5, then I get these messages:

lockd_up: makesock failed, error=-98
lockd_up: no pid, 2 users??
lockd_up: no pid, 3 users??
lockd_up: no pid, 4 users??
lockd_up: no pid, 5 users??
lockd_up: no pid, 6 users??
lockd_up: no pid, 7 users??
lockd_up: no pid, 8 users??


...and lockd isn't started. Since no one has complained about this, I'll put
this on 5.4 proposed for now. If the fix turns out to be simple I may move it to
5.3...

Comment 2 Jeff Layton 2008-07-08 15:46:03 UTC

This problem has strangely "fixed itself". Yesterday, I could reliably reproduce
this. Today, I can't make it happen.

The host where I saw this was a RHEL5 FV xen guest. It looked like the power
blinked at the office and the xen dom0 rebooted. I brought my RHEL5 image back
up and now this isn't happening anymore. It seems unlikely, but maybe this is
something to do with being a guest on a long running dom0?

I'll leave this open for now in case it happens again...

Comment 3 Jeff Layton 2008-09-29 12:01:44 UTC

Closing this out. I've not seen this problem since, though it still worries me that I saw it at all. I'll reopen it if it returns.

Comment 4 Ram Kesavan 2009-05-20 19:29:53 UTC

I am not sure if this is important, but you will get this error if the portmapper is not running, start portmapper /etc/init.d/portmapper and try the mount and it will work properly.

Note You need to log in before you can comment on or make changes to this bug.