Description of problem: the issue happens withe second application gets the lock, while first application waiting for server to finish reboot and get unlock message. Version-Release number of selected component (if applicable): 3.3.0qa37 How reproducible: always Steps to Reproduce: 1. create volume 2. start an application for putting a lock for a file. 3. after this put let the application go for a sleep 4. let the server reboot 5. now try to get a lock from some other server on same file, as mentioned in step 2.(while server rebooting) Actual results: 1. the step fails, as it provides the lock to the second application. 2. the first application, the nfs log from server reboot is this, [2012-04-23 07:21:57.606389] I [client-handshake.c:456:client_set_lk_version_cbk] 0-dist-client-1: Server lk version = 1 [2012-04-23 07:23:47.209142] E [nlm4.c:1649:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL [2012-04-23 07:23:47.209233] E [nlm4.c:1661:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume /export/dist^C Expected results: 1. the first application should get the unlock message. 2. the second application should wait till the unlock does not happen for first application. Additional info:
sosreport are placed at this place 10.16.156.3:/opt/qa/sosreport/bug815330 10.16.156.3:/opt is nfs(kernel) mountable
this happens only if the two systems give the same "caller name" in the lock request to the server. E.g, Mostly, the RHS/RHEL systems give the caller name as `hostname` or "localhost.localdomain". In the latter case, this behaviour is observed, while in a properly configured environment wherein clients send their proper hostnames as caller_name, this works fine. If in doubt, one can always put this line: diff --git a/xlators/nfs/server/src/nlm4.c b/xlators/nfs/server/src/nlm4.c index 595738b..ee1ae80 100644 --- a/xlators/nfs/server/src/nlm4.c +++ b/xlators/nfs/server/src/nlm4.c @@ -1438,6 +1438,7 @@ nlm4svc_lock_common (rpcsvc_request_t *req, int mon) nlm4_volume_started_check (nfs3, vol, ret, rpcerr); ret = nlm_add_nlmclnt (cs->args.nlm4_lockargs.alock.caller_name); + gf_log ("debuggy", GF_LOG_CRITICAL, "lock requestor: %s", cs->args.nlm4_lockargs.alock.caller_name); ret = nfs3_fh_resolve_and_resume (cs, &fh, NULL, nlm4_lock_resume); and confirm the behaviour(and the caller names sent by the clients).