Description of problem:
the issue happens withe second application gets the lock, while first application waiting for server to finish reboot and get unlock message.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. create volume
2. start an application for putting a lock for a file.
3. after this put let the application go for a sleep
4. let the server reboot
5. now try to get a lock from some other server on same file, as mentioned in step 2.(while server rebooting)
1. the step fails, as it provides the lock to the second application.
2. the first application, the nfs log from server reboot is this,
[2012-04-23 07:21:57.606389] I [client-handshake.c:456:client_set_lk_version_cbk] 0-dist-client-1: Server lk version = 1
[2012-04-23 07:23:47.209142] E [nlm4.c:1649:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-04-23 07:23:47.209233] E [nlm4.c:1661:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
1. the first application should get the unlock message.
2. the second application should wait till the unlock does not happen for first application.
sosreport are placed at this place 10.16.156.3:/opt/qa/sosreport/bug815330
10.16.156.3:/opt is nfs(kernel) mountable
this happens only if the two systems give the same "caller name" in the lock request to the server. E.g, Mostly, the RHS/RHEL systems give the caller name as `hostname` or "localhost.localdomain". In the latter case, this behaviour is observed, while in a properly configured environment wherein clients send their proper hostnames as caller_name, this works fine.
If in doubt, one can always put this line:
diff --git a/xlators/nfs/server/src/nlm4.c b/xlators/nfs/server/src/nlm4.c
index 595738b..ee1ae80 100644
@@ -1438,6 +1438,7 @@ nlm4svc_lock_common (rpcsvc_request_t *req, int mon)
nlm4_volume_started_check (nfs3, vol, ret, rpcerr);
ret = nlm_add_nlmclnt (cs->args.nlm4_lockargs.alock.caller_name);
+ gf_log ("debuggy", GF_LOG_CRITICAL, "lock requestor: %s", cs->args.nlm4_lockargs.alock.caller_name);
ret = nfs3_fh_resolve_and_resume (cs, &fh,
and confirm the behaviour(and the caller names sent by the clients).