Bug 1241336
Summary: | When one of RHGS node in the cluster, abruptly goes down, then all gluster cli commands fails | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> |
Component: | glusterd | Assignee: | Satish Mohan <smohan> |
Status: | CLOSED WORKSFORME | QA Contact: | Bala Konda Reddy M <bmekala> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.1 | CC: | amukherj, asriram, asrivast, nlevinki, sasundar, smohan, vbellur |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | glusterd | ||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
When an Red Hat Gluster Storage node is shut down due to power failure or hardware failure, or when the network interface on a node goes down abruptly, subsequent Gluster commands may time out. This happens because the corresponding TCP connection remains in the ESTABLISHED state. You can confirm this by executing the following command: "ss -tap state established '( dport = :24007 )' dst <IP-addr-of-powered-off-RHGS-node>"
Workaround: Restart glusterd service on all other nodes.
|
Story Points: | --- |
Clone Of: | Environment: |
RHEL6
|
|
Last Closed: | 2018-01-30 02:04:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1250809 | ||
Bug Blocks: | 1216951 |
Description
SATHEESARAN
2015-07-09 03:42:31 UTC
With the latest testing, I had only 2 nodes in the cluster and did the following steps: 1. Created a 2 node 'Trusted Storage Pool' 2. Created a plain distribute volume with a single brick on node1 3. Powered off node2 ( as this RHGS node was a VM, I did 'virsh destroy rhsvm' ) Result - All gluster cli commands started to error out. [root@ ~]# gluster v status Error : Request timed out Proposing this bug as a BLOCKER based on following thoughts, Any node in the cluster could go down abruptly ( hardware failure can't be predicted ) and that leads to all gluster cli commands failing I have tried the same case with baremetal machines and I see the same behaviour of gluster cli commands hanging after one of the machines is shutdown forcefully. Here I performed 'Power off server - Immediate', through supermicro console Doc text is edited. Please sign off to be included in Known Issues. Updated the doc text, please review and sign off. Anjana, The updated documentation looks good to me. Thanks for editing it. |