Bug 1123733
Summary: | 'gluster volume status' looks like its hung, when there is no response from one of glusterd | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> |
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> |
Status: | CLOSED WONTFIX | QA Contact: | SATHEESARAN <sasundar> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.0 | CC: | amukherj, asriram, mzywusko, nlevinki, smohan, vbellur |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Executing a command which involves glusterd-glusterd communication, 'Example: gluster volume status', immediately after one of the nodes is down hangs and fails after 2 minutes with cli-timeout message. The subsequent command fails with the error message 'Another transaction in progress' for 10 mins (frame timeout). Workaround: Set a non-zero value for 'ping-timeout' in "/etc/glusterfs/glusterd.vol" file and restart glusterd
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-02-07 11:23:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
SATHEESARAN
2014-07-28 07:07:28 UTC
There are 2 workarounds with its own cost : 1. Wait more than 10 mins ( ~10mins ), for any gluster command to work without error, "Another transaction in progress" Cost : User need to wait for atleast 10mins before executing any gluster command. This happens only one time. Once the network disconnect is identified, then the subsequent commands to ignore the node that is not reachable 2. Enable ping-timer. This could be done by doing the following : i) Edit glusterd volfile to have ping-timeout option as 30 ii) Restart glusterd on that node Cost : volume snapshot fails with ping-timer enabled. Refer BZ 1096729 The ideal solution would be to have ping timer work in a separate e-poll thread and then enable ping timer, with that we would get rid of both this and snapshot related issues. Can we mark this as a known issue for denali? (In reply to Atin Mukherjee from comment #3) > The ideal solution would be to have ping timer work in a separate e-poll > thread and then enable ping timer, with that we would get rid of both this > and snapshot related issues. > Can we mark this as a known issue for denali? Marked this bug for known-issue for Denali We have a patch for Multi-threaded epoll. We have two approaches we need to choose one of them: http://review.gluster.org/#/c/8098/ http://review.gluster.org/#/c/3842/ It is risk to take this patch in to Denali as it requires complete testing to be done. It is always good to enable ping-timer in the file '/etc/glusterfs/glusterd.vol'. Set ping-timeout to 30+ Disable this only if multiple snapshot operations are performed simultaneously from different nodes. Please review and sign-off edited doc text. Doc text looks good to me There is no future plan to enable ping time out for glusterd to glusterd communication, we'd not be fixing this in GlusterD 1.0 |