Bug 1455049
Summary: | [GNFS+EC] Unable to release the lock when the other client tries to acquire the lock on the same file | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Ashish Pandey <aspandey> | ||||
Component: | disperse | Assignee: | bugs <bugs> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | mainline | CC: | amukherj, aspandey, bugs, jthottan, msaini, pkarampu, rhinduja, sheggodu | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.12.0 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1444515 | ||||||
: | 1462121 (view as bug list) | Environment: | |||||
Last Closed: | 2017-08-10 11:18:56 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1411338, 1444515, 1462121 | ||||||
Attachments: |
|
Comment 1
Ashish Pandey
2017-05-24 07:27:54 UTC
Created attachment 1281848 [details]
Lock script
[xlator.features.locks.testvol-locks.inode] path=/file mandatory=0 inodelk-count=1 lock-dump.domain.domain=testvol-disperse-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 20283, owner=70e500f8957f0000, client=0x7f6ff80b08b0, connection-id=apandey-20134-2017/05/24-07:57:36:665823-testvol-client-0-0-0, granted at 2017-05-24 07:59:24 <<<<<<<<<<<<< EC lock taken by second lock request posixlk-count=2 posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=10, pid = 20266, owner=2f33c5d2250866c3, client=0x7f6ff80b08b0, connection-id=(null), granted at 2017-05-24 07:58:18 <<<<<<<<<<<<<< Posix lock taken by first lock request posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=10, pid = 20283, owner=9f4f2ac611372d51, client=0x7f6ff80b08b0, connection-id=(null), blocked at 2017-05-24 07:59:24 <<<<<<<<<<<<<< Second posix lock request BLOCKED Now, to unlock the first posix lock, we have to take EC lock which can not be taken as it is taken by second request. That caused deadlock. With https://review.gluster.org/#/c/17542/ Term-1: root@dhcp35-190 - /mnt/ec2 17:48:27 :) ⚡ /root/a.out a opening a opened; hit Enter to lock... locking locked; hit Enter to write... Write succeeeded locked; hit Enter to unlock... unlocking Term-2: root@dhcp35-190 - /mnt/ec2 17:49:02 :) ⚡ /root/a.out a opening a opened; hit Enter to lock... locking locked; hit Enter to write... Write succeeeded locked; hit Enter to unlock... unlocking Will also do cthon tests. ** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total). ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total). Congratulations, you passed the locking tests! All tests completed REVIEW: https://review.gluster.org/17542 (cluster/ec: lk shouldn't be a transaction) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu) COMMIT: https://review.gluster.org/17542 committed in master by Xavier Hernandez (xhernandez) ------ commit 26ca39ccf0caf0d55c88b05396883dd10ab66dc4 Author: Pranith Kumar K <pkarampu> Date: Tue Jun 13 23:35:40 2017 +0530 cluster/ec: lk shouldn't be a transaction Problem: When application sends a blocking lock, the lk fop actually waits under inodelk. This can lead to a dead-lock. 1) Let's say app-1 takes exculsive-fcntl-lock on the file 2) app-2 attempts an exclusive-fcntl-lock on the file which goes to blocking stage note: app-2 is blocked inside transaction which holds an inode-lock 3) app-1 tries to perform write which needs inode-lock so it gets blocked on app-2 to unlock inodelk and app-2 is blocked on app-1 to unlock fcntl-lock Fix: Correct way to fix this issue and make fcntl locks perform well would be to introduce 2-phase locking for fcntl lock: 1) Implement a try-lock phase where locks xlator will not merge lk call with existing calls until a commit-lock phase. 2) If in try-lock phase we get quorum number of success without any EAGAIN error, then send a commit-lock which will merge locks. 3) In case there are any errors, unlock should just delete the lock-object which was tried earlier and shouldn't touch the committed locks. Unfortunately this is a sizeable feature and need to be thought through for any corner cases. Until then remove transaction from lk call. BUG: 1455049 Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: https://review.gluster.org/17542 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Ashish Pandey <aspandey> Reviewed-by: Xavier Hernandez <xhernandez> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/ |