Bug 1462121 - [GNFS+EC] Unable to release the lock when the other client tries to acquire the lock on the same file
[GNFS+EC] Unable to release the lock when the other client tries to acquire t...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: disperse (Show other bugs)
3.11
Unspecified Unspecified
urgent Severity unspecified
: ---
: ---
Assigned To: bugs@gluster.org
: Triaged
Depends On: 1455049
Blocks: 1411338 1444515
  Show dependency treegraph
 
Reported: 2017-06-16 04:58 EDT by Pranith Kumar K
Modified: 2017-07-19 13:06 EDT (History)
8 users (show)

See Also:
Fixed In Version: glusterfs-3.11.1
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1455049
Environment:
Last Closed: 2017-06-28 14:32:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 1 Pranith Kumar K 2017-06-16 04:59:48 EDT
Steps to Reproduce:
1.Create a 6 Node gluster
2.Create an EC volume 2 x (4 + 2).Enable GNFS on the volume
3.Mount the volume to 2 clients.
4.Create a file say file1 of 512 bytes from client 1
5.Now take the lock on the same file from client1
6.Try taking the lock on the same file from client2.(Lock will not be granted for client 2 because it is already held by client 1)
7.Now release the lock from client 1

Client 1:
-----
[root@dhcp37-192 home]# ./a.out /mnt/disperse/file1 
opening /mnt/disperse/file1
opened; hit Enter to lock... 
locking
locked; hit Enter to write... 
Write succeeeded 
locked; hit Enter to unlock... 
unlocking
-----

Client 2
-----
[root@dhcp37-142 home]# ./a.out /mnt/disperse1/file1 
opening /mnt/disperse1/file1
opened; hit Enter to lock... 
locking
-----

Actual results:
It unable to release the lock from file and gets hang

Expected results:
It should able to release the lock from client1
Comment 2 Worker Ant 2017-06-16 05:35:01 EDT
REVIEW: https://review.gluster.org/17556 (cluster/ec: lk shouldn't be a transaction) posted (#1) for review on release-3.11 by Pranith Kumar Karampuri (pkarampu@redhat.com)
Comment 3 Worker Ant 2017-06-19 11:49:54 EDT
COMMIT: https://review.gluster.org/17556 committed in release-3.11 by Shyamsundar Ranganathan (srangana@redhat.com) 
------
commit 6e377faf4490f20a63634c8baecb76886c0dac8a
Author: Pranith Kumar K <pkarampu@redhat.com>
Date:   Tue Jun 13 23:35:40 2017 +0530

    cluster/ec: lk shouldn't be a transaction
    
    Problem:
    When application sends a blocking lock, the lk fop actually waits under
    inodelk.  This can lead to a dead-lock.
    1) Let's say app-1 takes exculsive-fcntl-lock on the file
    2) app-2 attempts an exclusive-fcntl-lock on the file which goes to blocking
       stage note: app-2 is blocked inside transaction which holds an inode-lock
    3) app-1 tries to perform write which needs inode-lock so it gets blocked on
       app-2 to unlock inodelk and app-2 is blocked on app-1 to unlock fcntl-lock
    
    Fix:
    Correct way to fix this issue and make fcntl locks perform well would be to
    introduce
    2-phase locking for fcntl lock:
    1) Implement a try-lock phase where locks xlator will not merge lk call with
       existing calls until a commit-lock phase.
    2) If in try-lock phase we get quorum number of success without any EAGAIN
       error, then send a commit-lock which will merge locks.
    3) In case there are any errors, unlock should just delete the lock-object
       which was tried earlier and shouldn't touch the committed locks.
    
    Unfortunately this is a sizeable feature and need to be thought through for any
    corner cases.  Until then remove transaction from lk call.
    
     >BUG: 1455049
     >Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159
     >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
     >Reviewed-on: https://review.gluster.org/17542
     >Smoke: Gluster Build System <jenkins@build.gluster.org>
     >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
     >CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
     >Reviewed-by: Ashish Pandey <aspandey@redhat.com>
     >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
    
    BUG: 1462121
    Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159
    Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
    Reviewed-on: https://review.gluster.org/17556
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Comment 4 Shyamsundar 2017-06-28 14:32:55 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.1, please open a new bug report.

glusterfs-3.11.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-June/000074.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.