Bug 1413062
Summary: | [Arbiter] IO's Halted and heal info command hung | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Pranith Kumar K <pkarampu> |
Component: | arbiter | Assignee: | Pranith Kumar K <pkarampu> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.9 | CC: | amukherj, bugs, ksandha, pkarampu, ravishankar, rhinduja, rhs-bugs, storage-qa-internal |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.9.1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1401404 | Environment: | |
Last Closed: | 2017-03-08 10:24:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1398188, 1401404 | ||
Bug Blocks: | 1412909 |
Comment 1
Worker Ant
2017-01-13 14:41:33 UTC
COMMIT: http://review.gluster.org/16413 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) ------ commit 042d4aaa55706332d5390d8ee01c6a747c6c3118 Author: Pranith Kumar K <pkarampu> Date: Mon Dec 5 13:20:51 2016 +0530 cluster/afr: Remove backward compatibility for locks with v1 When we have cascading locks with same lk-owner there is a possibility for a deadlock to happen. One example is as follows: self-heal takes a lock in data-domain for big name with 256 chars of "aaaa...a" and starts heal in a 3-way replication when brick-0 is offline and healing from brick-1 to brick-2 is in progress. So this lock is active on brick-1 and brick-2. Now brick-0 comes online and an operation wants to take full lock and the lock is granted at brick-0 and it is waiting for lock on brick-1. As part of entry healing it takes full locks on all the available bricks and then proceeds with healing the entry. Now this lock will start waiting on brick-0 because some other operation already has a granted lock on it. This leads to a deadlock. Operation is waiting for unlock on "aaaa..." by heal where as heal is waiting for the operation to unlock on brick-0. Initially I thought this is happening because healing is trying to take a lock on all the available bricks instead of just the bricks that are participating in heal. But later realized that same kind of deadlock can happen if a brick goes down after the heal starts but comes back before it completes. So the essential problem is the cascading locks with same lk-owner which were added for backward compatibility with afr-v1 which can be safely removed now that versions with afr-v1 are already EOL. This patch removes the compatibility with v1 which requires cascading locks with same lk-owner. In the next version we can make locking-scheme option a dummy and switch completely to v2. >BUG: 1401404 >Change-Id: Ic9afab8260f5ff4dff5329eb0429811bcb879079 >Signed-off-by: Pranith Kumar K <pkarampu> >Reviewed-on: http://review.gluster.org/16024 >Smoke: Gluster Build System <jenkins.org> >Reviewed-by: Ravishankar N <ravishankar> >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> BUG: 1413062 Change-Id: I4f5d485d9e0646ad3dc384e5ec36682b0933c9d3 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/16413 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report. glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html [2] https://www.gluster.org/pipermail/gluster-users/ |