Bug 1398888
Summary: | self-heal info command hangs after triggering self-heal | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Krutika Dhananjay <kdhananj> |
Component: | replicate | Assignee: | Krutika Dhananjay <kdhananj> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.9 | CC: | amukherj, bugs, rhs-bugs, sasundar, storage-qa-internal |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.9.1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1398566 | Environment: |
RHV-RHGS HCI
|
Last Closed: | 2017-03-08 10:19:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1396166, 1398566 | ||
Bug Blocks: |
Description
Krutika Dhananjay
2016-11-27 02:20:55 UTC
REVIEW: http://review.gluster.org/15932 (cluster/afr: Fix deadlock due to compound fops) posted (#1) for review on release-3.9 by Krutika Dhananjay (kdhananj) COMMIT: http://review.gluster.org/15932 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) ------ commit 13725b3f30f90a11771602c546875eb70831ae5d Author: Krutika Dhananjay <kdhananj> Date: Fri Nov 25 15:54:30 2016 +0530 cluster/afr: Fix deadlock due to compound fops Backport of: http://review.gluster.org/15929 When an afr data transaction is eligible for using eager-lock, this information is represented in local->transaction.eager_lock_on. However, if non-blocking inodelk attempt (which is a full lock) fails, AFR falls back to blocking locks which are range locks. At this point, local->transaction.eager_lock[] per brick is reset but local->transaction.eager_lock_on is still true. When AFR decides to compound post-op and unlock, it is after confirming that the transaction did not use eager lock (well, except for a small bug where local->transaction.locks_acquired[] is not considered). But within afr_post_op_unlock_do(), afr again incorrectly sets the lock range to full-lock based on local->transaction.eager_lock_on value. This is a bug and can lead to deadlock since the locks acquired were range locks and a full unlock is being sent leading to unlock failure and thereby every other lock request (be it from SHD or other clients or glfsheal) getting blocked forever and the user perceives a hang. FIX: Unconditionally rely on the range locks in inodelk object for unlocking when using compounded post-op + unlock. Big thanks to Pranith for helping with the debugging. Change-Id: I2edcc13ac00bc1ba2e3558891ba98d0cd410b47a BUG: 1398888 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/15932 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report. glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html [2] https://www.gluster.org/pipermail/gluster-users/ |