| Summary: | [Arbiter] IO's Halted and heal info command hung | |||
|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Pranith Kumar K <pkarampu> | |
| Component: | arbiter | Assignee: | bugs <bugs> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | mainline | CC: | amukherj, bugs, ksandha, pkarampu, ravishankar, rhinduja, rhs-bugs, storage-qa-internal | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.10.0 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1398188 | |||
| : | 1412909 1413062 (view as bug list) | Environment: | ||
| Last Closed: | 2017-03-06 17:37:56 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | 1398188 | |||
| Bug Blocks: | 1412909, 1413062 | |||
|
Comment 1
Worker Ant
2016-12-05 08:43:25 UTC
COMMIT: http://review.gluster.org/16024 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit c4b39198df40535f589c9304fd07b06d948df2f5 Author: Pranith Kumar K <pkarampu> Date: Mon Dec 5 13:20:51 2016 +0530 cluster/afr: Remove backward compatibility for locks with v1 When we have cascading locks with same lk-owner there is a possibility for a deadlock to happen. One example is as follows: self-heal takes a lock in data-domain for big name with 256 chars of "aaaa...a" and starts heal in a 3-way replication when brick-0 is offline and healing from brick-1 to brick-2 is in progress. So this lock is active on brick-1 and brick-2. Now brick-0 comes online and an operation wants to take full lock and the lock is granted at brick-0 and it is waiting for lock on brick-1. As part of entry healing it takes full locks on all the available bricks and then proceeds with healing the entry. Now this lock will start waiting on brick-0 because some other operation already has a granted lock on it. This leads to a deadlock. Operation is waiting for unlock on "aaaa..." by heal where as heal is waiting for the operation to unlock on brick-0. Initially I thought this is happening because healing is trying to take a lock on all the available bricks instead of just the bricks that are participating in heal. But later realized that same kind of deadlock can happen if a brick goes down after the heal starts but comes back before it completes. So the essential problem is the cascading locks with same lk-owner which were added for backward compatibility with afr-v1 which can be safely removed now that versions with afr-v1 are already EOL. This patch removes the compatibility with v1 which requires cascading locks with same lk-owner. In the next version we can make locking-scheme option a dummy and switch completely to v2. BUG: 1401404 Change-Id: Ic9afab8260f5ff4dff5329eb0429811bcb879079 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/16024 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Ravishankar N <ravishankar> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html [2] https://www.gluster.org/pipermail/gluster-users/ |