Bug 1462121

Summary: [GNFS+EC] Unable to release the lock when the other client tries to acquire the lock on the same file
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: disperseAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: urgent    
Version: 3.11CC: amukherj, aspandey, bugs, jthottan, msaini, pkarampu, rhinduja, sheggodu
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.11.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1455049 Environment:
Last Closed: 2017-06-28 18:32:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1455049    
Bug Blocks: 1411338, 1444515    

Comment 1 Pranith Kumar K 2017-06-16 08:59:48 UTC
Steps to Reproduce:
1.Create a 6 Node gluster
2.Create an EC volume 2 x (4 + 2).Enable GNFS on the volume
3.Mount the volume to 2 clients.
4.Create a file say file1 of 512 bytes from client 1
5.Now take the lock on the same file from client1
6.Try taking the lock on the same file from client2.(Lock will not be granted for client 2 because it is already held by client 1)
7.Now release the lock from client 1

Client 1:
-----
[root@dhcp37-192 home]# ./a.out /mnt/disperse/file1 
opening /mnt/disperse/file1
opened; hit Enter to lock... 
locking
locked; hit Enter to write... 
Write succeeeded 
locked; hit Enter to unlock... 
unlocking
-----

Client 2
-----
[root@dhcp37-142 home]# ./a.out /mnt/disperse1/file1 
opening /mnt/disperse1/file1
opened; hit Enter to lock... 
locking
-----

Actual results:
It unable to release the lock from file and gets hang

Expected results:
It should able to release the lock from client1

Comment 2 Worker Ant 2017-06-16 09:35:01 UTC
REVIEW: https://review.gluster.org/17556 (cluster/ec: lk shouldn't be a transaction) posted (#1) for review on release-3.11 by Pranith Kumar Karampuri (pkarampu)

Comment 3 Worker Ant 2017-06-19 15:49:54 UTC
COMMIT: https://review.gluster.org/17556 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit 6e377faf4490f20a63634c8baecb76886c0dac8a
Author: Pranith Kumar K <pkarampu>
Date:   Tue Jun 13 23:35:40 2017 +0530

    cluster/ec: lk shouldn't be a transaction
    
    Problem:
    When application sends a blocking lock, the lk fop actually waits under
    inodelk.  This can lead to a dead-lock.
    1) Let's say app-1 takes exculsive-fcntl-lock on the file
    2) app-2 attempts an exclusive-fcntl-lock on the file which goes to blocking
       stage note: app-2 is blocked inside transaction which holds an inode-lock
    3) app-1 tries to perform write which needs inode-lock so it gets blocked on
       app-2 to unlock inodelk and app-2 is blocked on app-1 to unlock fcntl-lock
    
    Fix:
    Correct way to fix this issue and make fcntl locks perform well would be to
    introduce
    2-phase locking for fcntl lock:
    1) Implement a try-lock phase where locks xlator will not merge lk call with
       existing calls until a commit-lock phase.
    2) If in try-lock phase we get quorum number of success without any EAGAIN
       error, then send a commit-lock which will merge locks.
    3) In case there are any errors, unlock should just delete the lock-object
       which was tried earlier and shouldn't touch the committed locks.
    
    Unfortunately this is a sizeable feature and need to be thought through for any
    corner cases.  Until then remove transaction from lk call.
    
     >BUG: 1455049
     >Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159
     >Signed-off-by: Pranith Kumar K <pkarampu>
     >Reviewed-on: https://review.gluster.org/17542
     >Smoke: Gluster Build System <jenkins.org>
     >NetBSD-regression: NetBSD Build System <jenkins.org>
     >CentOS-regression: Gluster Build System <jenkins.org>
     >Reviewed-by: Ashish Pandey <aspandey>
     >Reviewed-by: Xavier Hernandez <xhernandez>
    
    BUG: 1462121
    Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: https://review.gluster.org/17556
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 4 Shyamsundar 2017-06-28 18:32:55 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.1, please open a new bug report.

glusterfs-3.11.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-June/000074.html
[2] https://www.gluster.org/pipermail/gluster-users/