199620 – nfslock script starts statd before lockd is up so lock recovery fails

Bug 199620 - nfslock script starts statd before lockd is up so lock recovery fails

Summary: nfslock script starts statd before lockd is up so lock recovery fails

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	nfs-utils
Sub Component:
Version:	9
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jeff Layton
QA Contact:	Ben Levenson
Docs Contact:
URL:
Whiteboard:	bzcl34nup
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-07-20 19:36 UTC by Jeff Layton
Modified:	2008-05-20 15:22 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-05-20 15:22:15 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
trivial patch to init script (460 bytes, patch) 2006-07-20 19:40 UTC, Jeff Layton	no flags	Details \| Diff
View All

Description Jeff Layton 2006-07-20 19:36:08 UTC

+++ This bug was initially created as a clone of Bug #199586 +++

Description of problem:

The default chkconfig line in the nfslock init script starts it long before
lockd is up. This causes clients to try to recover their locks too early. Here's
a sample network trace, that shows PROGRAM_NOT_AVAILABLE after the client
attempted to recover locks on a reboot:

163.999543 172.16.57.30 -> 172.16.57.138 Portmap V2 GETPORT Call STAT(100024)
V:1 UDP
164.000380 172.16.57.138 -> 172.16.57.30 Portmap V2 GETPORT Reply (Call In 73)
Port:32768
164.000832 172.16.57.30 -> 172.16.57.138 STAT V1 NOTIFY Call
164.003574 172.16.57.138 -> 172.16.57.30 STAT V1 NOTIFY Reply (Call In 75)
164.004416 172.16.57.138 -> 172.16.57.30 TCP 32866 > sunrpc [SYN] Seq=0 Len=0
MSS=1460 TSV=407443 TSER=0 WS=0
164.004700 172.16.57.30 -> 172.16.57.138 TCP sunrpc > 32866 [SYN, ACK] Seq=0
Ack=1 Win=5792 Len=0 MSS=1460 TSV=4294700213 TSER=407443 WS=2
164.004765 172.16.57.138 -> 172.16.57.30 TCP 32866 > sunrpc [ACK] Seq=1 Ack=1
Win=5840 Len=0 TSV=407443 TSER=4294700213164.004947 172.16.57.138 ->
172.16.57.30 Portmap V2 GETPORT Call NLM(100021) V:1 TCP
164.005222 172.16.57.30 -> 172.16.57.138 TCP sunrpc > 32866 [ACK] Seq=1 Ack=61
Win=5792 Len=0 TSV=4294700214 TSER=407443
164.005832 172.16.57.30 -> 172.16.57.138 Portmap V2 GETPORT Reply (Call In 80)
PROGRAM_NOT_AVAILABLE

Changing nfslock.init chkconfig line to this:

# chkconfig: 345 61 19

seems to fix the problem. Opening this for RHEL4, since that's where I
originally noticed the problem, but it looks like FC has the same issue.

-- Additional comment from jlayton on 2006-07-20 12:27 EST --
Going ahead and adding this to the 4.5 proposed list. Should be a pretty trivial
fix and bad lock recovery can cause data corruption. I've not seen any customer
complaints about this particular problem yet, but with the work happening on
lock recovery, it's probably just a matter of time.

Comment 1 Jeff Layton 2006-07-20 19:40:09 UTC

Created attachment 132769 [details]
trivial patch to init script

A solution is to make the nfslock script run after the nfs script. This trivial
fix should fix the chkconfig line so that that happens by default.

Comment 2 Steve Dickson 2007-03-09 13:33:42 UTC

This client should continue to retry when trying to reclaim a lock.... 
regardless of the error that was return.... and if the client does
not continue to retry... its a bug in the client... imho...

Comment 3 Bug Zapper 2008-04-03 17:49:41 UTC

Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

Comment 4 Bug Zapper 2008-05-07 00:41:21 UTC

This bug has been in NEEDINFO for more than 30 days since feedback was
first requested. As a result we are closing it.

If you can reproduce this bug in the future against a maintained Fedora
version please feel free to reopen it against that version.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

Comment 5 Jeff Layton 2008-05-07 00:55:21 UTC

This one slipped through the cracks. I'll have a look at it again when I get the
chance and see if this is a bug in the client like Steve suggests...

Comment 6 Bug Zapper 2008-05-14 02:14:54 UTC

Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Jeff Layton 2008-05-20 15:22:15 UTC

Ok, looks like Steve was right on this, and this seems to work properly on
rawhide (at least). Closing as NOTABUG.

Note You need to log in before you can comment on or make changes to this bug.