Red Hat Bugzilla – Bug 1462121
[GNFS+EC] Unable to release the lock when the other client tries to acquire the lock on the same file
Last modified: 2017-07-19 13:06:26 EDT
Steps to Reproduce:
1.Create a 6 Node gluster
2.Create an EC volume 2 x (4 + 2).Enable GNFS on the volume
3.Mount the volume to 2 clients.
4.Create a file say file1 of 512 bytes from client 1
5.Now take the lock on the same file from client1
6.Try taking the lock on the same file from client2.(Lock will not be granted for client 2 because it is already held by client 1)
7.Now release the lock from client 1
[root@dhcp37-192 home]# ./a.out /mnt/disperse/file1
opened; hit Enter to lock...
locked; hit Enter to write...
locked; hit Enter to unlock...
[root@dhcp37-142 home]# ./a.out /mnt/disperse1/file1
opened; hit Enter to lock...
It unable to release the lock from file and gets hang
It should able to release the lock from client1
REVIEW: https://review.gluster.org/17556 (cluster/ec: lk shouldn't be a transaction) posted (#1) for review on release-3.11 by Pranith Kumar Karampuri (email@example.com)
COMMIT: https://review.gluster.org/17556 committed in release-3.11 by Shyamsundar Ranganathan (firstname.lastname@example.org)
Author: Pranith Kumar K <email@example.com>
Date: Tue Jun 13 23:35:40 2017 +0530
cluster/ec: lk shouldn't be a transaction
When application sends a blocking lock, the lk fop actually waits under
inodelk. This can lead to a dead-lock.
1) Let's say app-1 takes exculsive-fcntl-lock on the file
2) app-2 attempts an exclusive-fcntl-lock on the file which goes to blocking
stage note: app-2 is blocked inside transaction which holds an inode-lock
3) app-1 tries to perform write which needs inode-lock so it gets blocked on
app-2 to unlock inodelk and app-2 is blocked on app-1 to unlock fcntl-lock
Correct way to fix this issue and make fcntl locks perform well would be to
2-phase locking for fcntl lock:
1) Implement a try-lock phase where locks xlator will not merge lk call with
existing calls until a commit-lock phase.
2) If in try-lock phase we get quorum number of success without any EAGAIN
error, then send a commit-lock which will merge locks.
3) In case there are any errors, unlock should just delete the lock-object
which was tried earlier and shouldn't touch the committed locks.
Unfortunately this is a sizeable feature and need to be thought through for any
corner cases. Until then remove transaction from lk call.
>Signed-off-by: Pranith Kumar K <firstname.lastname@example.org>
>Smoke: Gluster Build System <email@example.com>
>NetBSD-regression: NetBSD Build System <firstname.lastname@example.org>
>CentOS-regression: Gluster Build System <email@example.com>
>Reviewed-by: Ashish Pandey <firstname.lastname@example.org>
>Reviewed-by: Xavier Hernandez <email@example.com>
Signed-off-by: Pranith Kumar K <firstname.lastname@example.org>
Smoke: Gluster Build System <email@example.com>
NetBSD-regression: NetBSD Build System <firstname.lastname@example.org>
CentOS-regression: Gluster Build System <email@example.com>
Reviewed-by: Shyamsundar Ranganathan <firstname.lastname@example.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.1, please open a new bug report.
glusterfs-3.11.1 has been announced on the Gluster mailinglists , packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update infrastructure for your distribution.