Bug 852570

Summary:	nlm: server reboot gives lock to second application
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Vidya Sakar <vinaraya>
Component:	glusterd	Assignee:	Rajesh <rajesh>
Status:	CLOSED NOTABUG	QA Contact:	Sudhir D <sdharane>
Severity:	high	Docs Contact:
Priority:	high
Version:	2.0	CC:	gluster-bugs, rfortier, rhs-bugs, saujain, vagarwal, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	815330	Environment:
Last Closed:	2013-04-15 07:41:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	815330
Bug Blocks:

Description Vidya Sakar 2012-08-29 00:53:28 UTC

+++ This bug was initially created as a clone of Bug #815330 +++

Description of problem:
the issue happens withe second application gets the lock, while first application waiting for server to finish reboot and get unlock message.


Version-Release number of selected component (if applicable):
3.3.0qa37

How reproducible:
always

Steps to Reproduce:
1. create volume 
2. start an application for putting a lock for a file.
3. after this put let the application go for a sleep
4. let the server reboot
5. now try to get a lock from some other server on same file, as mentioned in step 2.(while server rebooting)
  
Actual results:
1. the step fails, as it provides the lock to the second application.
2. the first application, the nfs log from server reboot is this,
   [2012-04-23 07:21:57.606389] I [client-handshake.c:456:client_set_lk_version_cbk] 0-dist-client-1: Server lk version = 1
[2012-04-23 07:23:47.209142] E [nlm4.c:1649:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-04-23 07:23:47.209233] E [nlm4.c:1661:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
/export/dist^C



Expected results:
 1. the first application should get the unlock message.
 2. the second application should wait till the unlock does not happen for first application.

Additional info:

--- Additional comment from saujain on 2012-04-23 08:00:36 EDT ---

sosreport are placed at this place 10.16.156.3:/opt/qa/sosreport/bug815330

10.16.156.3:/opt is nfs(kernel) mountable

Comment 4 Rajesh 2013-04-15 07:41:20 UTC

this happens only if the two systems give the same "caller name" in the lock request to the server. E.g, Mostly, the RHS/RHEL systems give the caller name as `hostname` or "localhost.localdomain". In the latter case, this behaviour is observed, while in a properly configured environment wherein clients send their proper hostnames as caller_name, this works fine.
If in doubt, one can always put this line: 

diff --git a/xlators/nfs/server/src/nlm4.c b/xlators/nfs/server/src/nlm4.c
index 595738b..ee1ae80 100644
--- a/xlators/nfs/server/src/nlm4.c
+++ b/xlators/nfs/server/src/nlm4.c
@@ -1438,6 +1438,7 @@ nlm4svc_lock_common (rpcsvc_request_t *req, int mon)
         nlm4_volume_started_check (nfs3, vol, ret, rpcerr);
 
         ret = nlm_add_nlmclnt (cs->args.nlm4_lockargs.alock.caller_name);
+        gf_log ("debuggy", GF_LOG_CRITICAL, "lock requestor: %s", cs->args.nlm4_lockargs.alock.caller_name);
 
         ret = nfs3_fh_resolve_and_resume (cs, &fh,
                                           NULL, nlm4_lock_resume);

and confirm the behaviour(and the caller names sent by the clients).