Bug 847214
Summary: | glusterd operations hang if the other peers are down | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Pranith Kumar K <pkarampu> | |
Component: | glusterd | Assignee: | krishnan parthasarathi <kparthas> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | unspecified | Docs Contact: | ||
Priority: | medium | |||
Version: | mainline | CC: | amarts, gluster-bugs, jdarcy, nsathyan | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.4.0 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 852147 (view as bug list) | Environment: | ||
Last Closed: | 2013-07-24 17:43:19 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 852147, 918917 |
Description
Pranith Kumar K
2012-08-10 06:39:51 UTC
The infinite loop'ing state transitions in the operation state machine was fixed in http://review.gluster.org/4043. This was happening because the notify function was queuing events into the operation state machine, on every invocation (triggered on reconnect, once every 3 secs). Concurrently, the operation state machine processes all the events in the queue. So, it is possible for the state machine to be dequeue'ing the events, ad infinitum. This is perceived as a hang, since the epoll thread could be executing glusterd_op_sm(), which processes all the events, at any point in time, in the op_sm queue. Moving it to ON_DEV, since this is fixed on both master and release-3.4 master, release-3.4: http://review.gluster.org/4043 - fixed before release-3.4 was branched from master. REVIEW: http://review.gluster.org/4869 (glusterd: Removed 'proactive' failing of volume op) posted (#1) for review on master by Krishnan Parthasarathi (kparthas) COMMIT: http://review.gluster.org/4869 committed in master by Vijay Bellur (vbellur) ------ commit 3b1ecc6a7fd961c709e82862fd4760b223365863 Author: Krishnan Parthasarathi <kparthas> Date: Mon Apr 22 12:27:07 2013 +0530 glusterd: Removed 'proactive' failing of volume op Volume operations were failed 'proactively', on the first disconnect of a peer that was participating in the transaction. The reason behind having this kludgey code in the first place was to 'abort' an ongoing volume operation as soon as we perceive the first disconnect. But the rpc call backs themselves are capable of injecting appropriate state machine events, which would set things in motion for an eventual abort of the transaction. Change-Id: Iad7cb2bd076f22d89a793dfcd08c2d208b39c4be BUG: 847214 Signed-off-by: Krishnan Parthasarathi <kparthas> Reviewed-on: http://review.gluster.org/4869 Reviewed-by: Jeff Darcy <jdarcy> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur> |