Bug 802885

Summary: when nfs server is restarted, reclaim locks held by write operations on a file from nfs mount.
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: nfsAssignee: Vinayaga Raman <vraman>
Status: CLOSED CURRENTRELEASE QA Contact: Saurabh <saujain>
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, mzywusko, rfortier, rwheeler, saujain, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:26:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: 3.3.0qa35 Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    
Attachments:
Description Flags
nfs server log none

Description Shwetha Panduranga 2012-03-13 16:54:37 UTC
Created attachment 569729 [details]
nfs server log

Description of problem:
When volume is restarted, nfs server restarts and currently nlm server is not reclaiming locks previously held by applications on files before the restart. Hence an unlock doesn't find a corresponding lock and getting the error : 

[2012-03-13 23:09:43.879988] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.880096] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume

Version-Release number of selected component (if applicable):
3.3.0qa27

How reproducible:
often

Steps to Reproduce:
1.create a distribute-replicate volume(2 X 3). Start the volume. 
2.create nfs mounts from client
3.start "locktest -n 500 -f file1" 
4.Bring down one brick
5.Bring back the brick while locktest still in progress.

Actual results:
[2012-03-13 23:00:33.369591] I [client-handshake.c:1533:select_server_supported_programs] 0-dstore1-client-3: Using Program GlusterFS 3.3.0qa27, Num (1298437), Version (330)
[2012-03-13 23:00:33.370580] I [client-handshake.c:1308:client_setvolume_cbk] 0-dstore1-client-3: clnt-lk-version = 1, server-lk-version = 0
[2012-03-13 23:00:33.370620] I [client-handshake.c:1334:client_setvolume_cbk] 0-dstore1-client-3: Connected to 192.168.2.35:24010, attached to remote volume '/export2/dstore1'.
[2012-03-13 23:01:27.189309] I [afr-common.c:1313:afr_launch_self_heal] 0-dstore1-replicate-1: background  data self-heal triggered. path: <gfid:00000000-0000-0000-0000-000000000000>, reason: lookup detected pending operations
[2012-03-13 23:09:19.731434] I [afr-self-heal-algorithm.c:131:sh_loop_driver_done] 0-dstore1-replicate-1: diff self-heal on <gfid:00000000-0000-0000-0000-000000000000>: completed. (1 blocks of 81920 were different (0.00%))
[2012-03-13 23:09:19.890119] I [afr-self-heal-common.c:2037:afr_self_heal_completion_cbk] 0-dstore1-replicate-1: background  data self-heal completed on <gfid:00000000-0000-0000-0000-000000000000>
[2012-03-13 23:09:43.879988] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.880096] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-03-13 23:09:43.880252] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.880305] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-03-13 23:09:43.880597] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.880653] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-03-13 23:09:43.880789] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.880854] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-03-13 23:09:43.880992] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.881054] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-03-13 23:09:43.881184] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.881237] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume

Comment 1 Krishna Srinivas 2012-04-03 08:00:05 UTC
*** Bug 802767 has been marked as a duplicate of this bug. ***

Comment 2 Saurabh 2012-04-16 12:12:15 UTC
Tests executed are,
1. gnfs restart while locks are held and unlock happened.
2. client restart while locks are held and fresh lock request passed.
3. server reboot while locks are held and the unlock happened after the system comes back.

Comment 3 Anand Avati 2012-04-17 15:26:16 UTC
CHANGE: http://review.gluster.com/3096 (nlm: send sm-notify to clients whenever the nfs server is restarted so that clients reclaim the locks.) merged in master by Vijay Bellur (vijay)