Bug 802885 - when nfs server is restarted, reclaim locks held by write operations on a file from nfs mount.
Summary: when nfs server is restarted, reclaim locks held by write operations on a fil...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Vinayaga Raman
QA Contact: Saurabh
URL:
Whiteboard:
: 802767 (view as bug list)
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-13 16:54 UTC by Shwetha Panduranga
Modified: 2016-01-19 06:10 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:26:48 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 3.3.0qa35
Embargoed:


Attachments (Terms of Use)
nfs server log (349.19 KB, text/x-log)
2012-03-13 16:54 UTC, Shwetha Panduranga
no flags Details

Description Shwetha Panduranga 2012-03-13 16:54:37 UTC
Created attachment 569729 [details]
nfs server log

Description of problem:
When volume is restarted, nfs server restarts and currently nlm server is not reclaiming locks previously held by applications on files before the restart. Hence an unlock doesn't find a corresponding lock and getting the error : 

[2012-03-13 23:09:43.879988] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.880096] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume

Version-Release number of selected component (if applicable):
3.3.0qa27

How reproducible:
often

Steps to Reproduce:
1.create a distribute-replicate volume(2 X 3). Start the volume. 
2.create nfs mounts from client
3.start "locktest -n 500 -f file1" 
4.Bring down one brick
5.Bring back the brick while locktest still in progress.

Actual results:
[2012-03-13 23:00:33.369591] I [client-handshake.c:1533:select_server_supported_programs] 0-dstore1-client-3: Using Program GlusterFS 3.3.0qa27, Num (1298437), Version (330)
[2012-03-13 23:00:33.370580] I [client-handshake.c:1308:client_setvolume_cbk] 0-dstore1-client-3: clnt-lk-version = 1, server-lk-version = 0
[2012-03-13 23:00:33.370620] I [client-handshake.c:1334:client_setvolume_cbk] 0-dstore1-client-3: Connected to 192.168.2.35:24010, attached to remote volume '/export2/dstore1'.
[2012-03-13 23:01:27.189309] I [afr-common.c:1313:afr_launch_self_heal] 0-dstore1-replicate-1: background  data self-heal triggered. path: <gfid:00000000-0000-0000-0000-000000000000>, reason: lookup detected pending operations
[2012-03-13 23:09:19.731434] I [afr-self-heal-algorithm.c:131:sh_loop_driver_done] 0-dstore1-replicate-1: diff self-heal on <gfid:00000000-0000-0000-0000-000000000000>: completed. (1 blocks of 81920 were different (0.00%))
[2012-03-13 23:09:19.890119] I [afr-self-heal-common.c:2037:afr_self_heal_completion_cbk] 0-dstore1-replicate-1: background  data self-heal completed on <gfid:00000000-0000-0000-0000-000000000000>
[2012-03-13 23:09:43.879988] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.880096] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-03-13 23:09:43.880252] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.880305] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-03-13 23:09:43.880597] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.880653] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-03-13 23:09:43.880789] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.880854] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-03-13 23:09:43.880992] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.881054] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-03-13 23:09:43.881184] E [nlm4.c:1595:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL
[2012-03-13 23:09:43.881237] E [nlm4.c:1607:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume

Comment 1 Krishna Srinivas 2012-04-03 08:00:05 UTC
*** Bug 802767 has been marked as a duplicate of this bug. ***

Comment 2 Saurabh 2012-04-16 12:12:15 UTC
Tests executed are,
1. gnfs restart while locks are held and unlock happened.
2. client restart while locks are held and fresh lock request passed.
3. server reboot while locks are held and the unlock happened after the system comes back.

Comment 3 Anand Avati 2012-04-17 15:26:16 UTC
CHANGE: http://review.gluster.com/3096 (nlm: send sm-notify to clients whenever the nfs server is restarted so that clients reclaim the locks.) merged in master by Vijay Bellur (vijay)


Note You need to log in before you can comment on or make changes to this bug.