Bug 139101 - NFS lock reclaim not working
NFS lock reclaim not working
Status: CLOSED DUPLICATE of bug 182137
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: nfs-utils (Show other bugs)
4.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
Ben Levenson
:
Depends On:
Blocks: 135876
  Show dependency treegraph
 
Reported: 2004-11-12 17:04 EST by marc eshel
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-04-13 09:38:32 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Test program used to debug this problem (737 bytes, text/plain)
2004-11-30 05:58 EST, Steve Dickson
no flags Details
Patch for client side (2.47 KB, patch)
2004-12-16 06:59 EST, Steve Dickson
no flags Details | Diff
Patch for server side (6.62 KB, patch)
2004-12-16 07:01 EST, Steve Dickson
no flags Details | Diff
Patch for statd (3.35 KB, patch)
2004-12-16 07:02 EST, Steve Dickson
no flags Details | Diff
binary tethereal network trace of what the client is seeing (2.88 KB, application/octet-stream)
2006-02-03 02:11 EST, Asha Ramamurthy Yarangatta
no flags Details

  None (edit)
Description marc eshel 2004-11-12 17:04:12 EST
Description of problem:
The problem is that after the NFS sever machine reboots its statd 
sends a
notification to all NFS clients that had locking activity but the 
clients
fail to reclaim their locks.

I tried it with RedHat ES, 2.6.8 kernel, and nfs utils 1.0.6; and 
also with
RedHat Fedora, 2.6.5 kernel and nfs utils 1.0.6

It did work when I mount with '-o nfsvers=2' which used lockd version 
1
instead of lockd version 4

Here is the debug messages on the NFS client:
The debug messages with 'xxx' were added by me.
as you can see in the 4th line the protocol and version are both 0
(p=0, v=0)
in the following 2 lines you can see valid protocol and version
but because the don't match with the input protocol and version the
host
is not found and the client will not claim its locks.

Nov 11 11:35:03 hiper53 kernel: lockd: request from 7f000001
Nov 11 11:35:03 hiper53 kernel: lockd: nlmsvc_dispatch vers 4 proc 16
Nov 11 11:35:03 hiper53 kernel: lockd: SM_NOTIFY     called
Nov 11 11:35:03 hiper53 kernel: lockd: nlm_lookup_host(09018c42, p=0, 
v=0)
Nov 11 11:35:03 hiper53 kernel: lockd: xxx1 nlm_lookup_host(server 0 
s=0
p=17, v=4)
Nov 11 11:35:03 hiper53 kernel: lockd: xxx2 nlm_lookup_host(server 0 
s=0
p=17, v=1)
Nov 11 11:35:03 hiper53 kernel: lockd: creating host entry
Nov 11 11:35:03 hiper53 kernel: lockd: rebind host 9.1.140.66
Nov 11 11:35:03 hiper53 kernel: NLM: reclaiming locks for host 
9.1.140.66
lockd: xxx2 nlmclnt_recovery h_reclaiming 1
Nov 11 11:35:03 hiper53 kernel: lockd: get host 9.1.140.66
Nov 11 11:35:03 hiper53 kernel: lockd: release host 9.1.140.66
Nov 11 11:35:03 hiper53 kernel: nlmsvc_retry_blocked(00000000, when=0)
Nov 11 11:35:03 hiper53 kernel: nlmsvc_retry_blocked(00000000, when=0)
Nov 11 11:35:03 hiper53 kernel: lockd: xxx3 reclaimer start
Nov 11 11:35:03 hiper53 kernel: lockd: xxx4 reclaimer magic 6969 6969
Nov 11 11:35:03 hiper53 kernel: lockd: xxx5 reclaimer host
f7d43d00(9.1.140.66) f744bb80(9.1.140.66)
Nov 11 11:35:04 hiper53 kernel: lockd: release host 9.1.140.66


Version-Release number of selected component (if applicable):


How reproducible: Every time


Steps to Reproduce:
1.mount a file system from an NFS server
2.get some NLM locks using fcntl on files from the NFS server 
3.reboot the server
4.check that the locks got reclaimed on the server 
using 'cat /proc/locks'
  
Actual results: the locks are not reclaimed (see above description) 


Expected results: the locks should be reclaimed by the client after 
the server rebooted 


Additional info:
Comment 1 Kiersten (Kerri) Anderson 2004-11-16 14:09:35 EST
Per Steve's request, adding this to the blocking list for RHEL4
Comment 3 Steve Dickson 2004-11-30 05:56:52 EST
Fixed in nfs-utils-1.0.6-45 and in kernel-smp-2.6.9-1.785_EL
Comment 4 Steve Dickson 2004-11-30 05:58:20 EST
Created attachment 107613 [details]
Test program used to debug this problem
Comment 5 Steve Dickson 2004-12-16 06:59:56 EST
Created attachment 108686 [details]
Patch for client side
Comment 6 Steve Dickson 2004-12-16 07:01:12 EST
Created attachment 108687 [details]
Patch for server side
Comment 7 Steve Dickson 2004-12-16 07:02:35 EST
Created attachment 108688 [details]
Patch for statd
Comment 8 Jay Turner 2005-01-06 10:38:19 EST
IBM, can we close this issue out?
Comment 9 marc eshel 2005-01-06 12:17:05 EST
(In reply to comment #8)
> IBM, can we close this issue out?

Yes you can. Thank you for the quick fix.
Comment 10 Jay Turner 2005-02-08 10:02:06 EST
Closing out based on comment 9.
Comment 11 Asha Ramamurthy Yarangatta 2006-01-31 01:37:50 EST
(In reply to comment #3)
> Fixed in nfs-utils-1.0.6-45 and in kernel-smp-2.6.9-1.785_EL

Hi,

I am using RedHat4 Update3. The problem is that after the NFS sever machine 
reboots,its statd sends a notification to all NFS clients that had locking 
activity but the clients fail to reclaim their locks. I am using NFS v3 for 
mounting NFS clients.

Is NFS lock reclaiming issue resolved in RedHat4 Update3? Is patch still needed 
to resolve the NFS lock reclaiming issue?

Comment 12 Steve Dickson 2006-02-02 09:44:09 EST
I just looked and both patches are in nfs-utils and the RHEL4 U3 kernel,
but I am concern about your assertion that lock recovery is not working,
since is seems to work in my testing...

So to begin, could you please post a bzip2 binary tethereal network
trace of what the client is seeing, by doing the following (on the client):

tethereal -w /tmp/data.pcap host <servername>
bzip2 /tmp/data.pcap

Comment 13 Asha Ramamurthy Yarangatta 2006-02-03 02:11:53 EST
Created attachment 124097 [details]
binary tethereal network trace of what the client is seeing

I am posting binary tethereal network trace of what the client is seeing.
From this I could see that client is using NLM V4 LOCK. Is this creating a
problem?
Comment 14 Steve Dickson 2006-03-08 09:31:04 EST
Please try the patch in bz182137, that should take care of the problem.
Comment 15 Steve Dickson 2006-04-13 09:38:32 EDT

*** This bug has been marked as a duplicate of 182137 ***

Note You need to log in before you can comment on or make changes to this bug.