Bug 1462121

Summary:	[GNFS+EC] Unable to release the lock when the other client tries to acquire the lock on the same file
Product:	[Community] GlusterFS	Reporter:	Pranith Kumar K <pkarampu>
Component:	disperse	Assignee:	bugs <bugs>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	urgent
Version:	3.11	CC:	amukherj, aspandey, bugs, jthottan, msaini, pkarampu, rhinduja, sheggodu
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.11.1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1455049	Environment:
Last Closed:	2017-06-28 18:32:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1455049
Bug Blocks:	1411338, 1444515

Comment 1 Pranith Kumar K 2017-06-16 08:59:48 UTC

Steps to Reproduce:
1.Create a 6 Node gluster
2.Create an EC volume 2 x (4 + 2).Enable GNFS on the volume
3.Mount the volume to 2 clients.
4.Create a file say file1 of 512 bytes from client 1
5.Now take the lock on the same file from client1
6.Try taking the lock on the same file from client2.(Lock will not be granted for client 2 because it is already held by client 1)
7.Now release the lock from client 1

Client 1:
-----
[root@dhcp37-192 home]# ./a.out /mnt/disperse/file1 
opening /mnt/disperse/file1
opened; hit Enter to lock... 
locking
locked; hit Enter to write... 
Write succeeeded 
locked; hit Enter to unlock... 
unlocking
-----

Client 2
-----
[root@dhcp37-142 home]# ./a.out /mnt/disperse1/file1 
opening /mnt/disperse1/file1
opened; hit Enter to lock... 
locking
-----

Actual results:
It unable to release the lock from file and gets hang

Expected results:
It should able to release the lock from client1

Comment 2 Worker Ant 2017-06-16 09:35:01 UTC

REVIEW: https://review.gluster.org/17556 (cluster/ec: lk shouldn't be a transaction) posted (#1) for review on release-3.11 by Pranith Kumar Karampuri (pkarampu)

Comment 3 Worker Ant 2017-06-19 15:49:54 UTC

COMMIT: https://review.gluster.org/17556 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit 6e377faf4490f20a63634c8baecb76886c0dac8a
Author: Pranith Kumar K <pkarampu>
Date:   Tue Jun 13 23:35:40 2017 +0530

    cluster/ec: lk shouldn't be a transaction
    
    Problem:
    When application sends a blocking lock, the lk fop actually waits under
    inodelk.  This can lead to a dead-lock.
    1) Let's say app-1 takes exculsive-fcntl-lock on the file
    2) app-2 attempts an exclusive-fcntl-lock on the file which goes to blocking
       stage note: app-2 is blocked inside transaction which holds an inode-lock
    3) app-1 tries to perform write which needs inode-lock so it gets blocked on
       app-2 to unlock inodelk and app-2 is blocked on app-1 to unlock fcntl-lock
    
    Fix:
    Correct way to fix this issue and make fcntl locks perform well would be to
    introduce
    2-phase locking for fcntl lock:
    1) Implement a try-lock phase where locks xlator will not merge lk call with
       existing calls until a commit-lock phase.
    2) If in try-lock phase we get quorum number of success without any EAGAIN
       error, then send a commit-lock which will merge locks.
    3) In case there are any errors, unlock should just delete the lock-object
       which was tried earlier and shouldn't touch the committed locks.
    
    Unfortunately this is a sizeable feature and need to be thought through for any
    corner cases.  Until then remove transaction from lk call.
    
     >BUG: 1455049
     >Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159
     >Signed-off-by: Pranith Kumar K <pkarampu>
     >Reviewed-on: https://review.gluster.org/17542
     >Smoke: Gluster Build System <jenkins.org>
     >NetBSD-regression: NetBSD Build System <jenkins.org>
     >CentOS-regression: Gluster Build System <jenkins.org>
     >Reviewed-by: Ashish Pandey <aspandey>
     >Reviewed-by: Xavier Hernandez <xhernandez>
    
    BUG: 1462121
    Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: https://review.gluster.org/17556
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 4 Shyamsundar 2017-06-28 18:32:55 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.1, please open a new bug report.

glusterfs-3.11.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-June/000074.html
[2] https://www.gluster.org/pipermail/gluster-users/