Bug 139101

Summary: NFS lock reclaim not working
Product: Red Hat Enterprise Linux 4 Reporter: marc eshel <eshel>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED DUPLICATE QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: asha.yarangatta, chrisw, dff, jturner
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-04-13 13:38:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 135876    
Attachments:
Description Flags
Test program used to debug this problem
none
Patch for client side
none
Patch for server side
none
Patch for statd
none
binary tethereal network trace of what the client is seeing none

Description marc eshel 2004-11-12 22:04:12 UTC
Description of problem:
The problem is that after the NFS sever machine reboots its statd 
sends a
notification to all NFS clients that had locking activity but the 
clients
fail to reclaim their locks.

I tried it with RedHat ES, 2.6.8 kernel, and nfs utils 1.0.6; and 
also with
RedHat Fedora, 2.6.5 kernel and nfs utils 1.0.6

It did work when I mount with '-o nfsvers=2' which used lockd version 
1
instead of lockd version 4

Here is the debug messages on the NFS client:
The debug messages with 'xxx' were added by me.
as you can see in the 4th line the protocol and version are both 0
(p=0, v=0)
in the following 2 lines you can see valid protocol and version
but because the don't match with the input protocol and version the
host
is not found and the client will not claim its locks.

Nov 11 11:35:03 hiper53 kernel: lockd: request from 7f000001
Nov 11 11:35:03 hiper53 kernel: lockd: nlmsvc_dispatch vers 4 proc 16
Nov 11 11:35:03 hiper53 kernel: lockd: SM_NOTIFY     called
Nov 11 11:35:03 hiper53 kernel: lockd: nlm_lookup_host(09018c42, p=0, 
v=0)
Nov 11 11:35:03 hiper53 kernel: lockd: xxx1 nlm_lookup_host(server 0 
s=0
p=17, v=4)
Nov 11 11:35:03 hiper53 kernel: lockd: xxx2 nlm_lookup_host(server 0 
s=0
p=17, v=1)
Nov 11 11:35:03 hiper53 kernel: lockd: creating host entry
Nov 11 11:35:03 hiper53 kernel: lockd: rebind host 9.1.140.66
Nov 11 11:35:03 hiper53 kernel: NLM: reclaiming locks for host 
9.1.140.66
lockd: xxx2 nlmclnt_recovery h_reclaiming 1
Nov 11 11:35:03 hiper53 kernel: lockd: get host 9.1.140.66
Nov 11 11:35:03 hiper53 kernel: lockd: release host 9.1.140.66
Nov 11 11:35:03 hiper53 kernel: nlmsvc_retry_blocked(00000000, when=0)
Nov 11 11:35:03 hiper53 kernel: nlmsvc_retry_blocked(00000000, when=0)
Nov 11 11:35:03 hiper53 kernel: lockd: xxx3 reclaimer start
Nov 11 11:35:03 hiper53 kernel: lockd: xxx4 reclaimer magic 6969 6969
Nov 11 11:35:03 hiper53 kernel: lockd: xxx5 reclaimer host
f7d43d00(9.1.140.66) f744bb80(9.1.140.66)
Nov 11 11:35:04 hiper53 kernel: lockd: release host 9.1.140.66


Version-Release number of selected component (if applicable):


How reproducible: Every time


Steps to Reproduce:
1.mount a file system from an NFS server
2.get some NLM locks using fcntl on files from the NFS server 
3.reboot the server
4.check that the locks got reclaimed on the server 
using 'cat /proc/locks'
  
Actual results: the locks are not reclaimed (see above description) 


Expected results: the locks should be reclaimed by the client after 
the server rebooted 


Additional info:

Comment 1 Kiersten (Kerri) Anderson 2004-11-16 19:09:35 UTC
Per Steve's request, adding this to the blocking list for RHEL4

Comment 3 Steve Dickson 2004-11-30 10:56:52 UTC
Fixed in nfs-utils-1.0.6-45 and in kernel-smp-2.6.9-1.785_EL

Comment 4 Steve Dickson 2004-11-30 10:58:20 UTC
Created attachment 107613 [details]
Test program used to debug this problem

Comment 5 Steve Dickson 2004-12-16 11:59:56 UTC
Created attachment 108686 [details]
Patch for client side

Comment 6 Steve Dickson 2004-12-16 12:01:12 UTC
Created attachment 108687 [details]
Patch for server side

Comment 7 Steve Dickson 2004-12-16 12:02:35 UTC
Created attachment 108688 [details]
Patch for statd

Comment 8 Jay Turner 2005-01-06 15:38:19 UTC
IBM, can we close this issue out?

Comment 9 marc eshel 2005-01-06 17:17:05 UTC
(In reply to comment #8)
> IBM, can we close this issue out?

Yes you can. Thank you for the quick fix.


Comment 10 Jay Turner 2005-02-08 15:02:06 UTC
Closing out based on comment 9.

Comment 11 Asha Ramamurthy Yarangatta 2006-01-31 06:37:50 UTC
(In reply to comment #3)
> Fixed in nfs-utils-1.0.6-45 and in kernel-smp-2.6.9-1.785_EL

Hi,

I am using RedHat4 Update3. The problem is that after the NFS sever machine 
reboots,its statd sends a notification to all NFS clients that had locking 
activity but the clients fail to reclaim their locks. I am using NFS v3 for 
mounting NFS clients.

Is NFS lock reclaiming issue resolved in RedHat4 Update3? Is patch still needed 
to resolve the NFS lock reclaiming issue?



Comment 12 Steve Dickson 2006-02-02 14:44:09 UTC
I just looked and both patches are in nfs-utils and the RHEL4 U3 kernel,
but I am concern about your assertion that lock recovery is not working,
since is seems to work in my testing...

So to begin, could you please post a bzip2 binary tethereal network
trace of what the client is seeing, by doing the following (on the client):

tethereal -w /tmp/data.pcap host <servername>
bzip2 /tmp/data.pcap



Comment 13 Asha Ramamurthy Yarangatta 2006-02-03 07:11:53 UTC
Created attachment 124097 [details]
binary tethereal network trace of what the client is seeing

I am posting binary tethereal network trace of what the client is seeing.
From this I could see that client is using NLM V4 LOCK. Is this creating a
problem?

Comment 14 Steve Dickson 2006-03-08 14:31:04 UTC
Please try the patch in bz182137, that should take care of the problem.

Comment 15 Steve Dickson 2006-04-13 13:38:32 UTC

*** This bug has been marked as a duplicate of 182137 ***