Description of problem: ----------------------- When one of RHGS node in the cluster goes down abruptly ( due to forced shutdown, power failure, hardware failure, network disconnect ), then gluster was unable to detect that the host is down. The consequence is that all the gluster cli commands are failing with "Error: Request timed out" Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHGS 3.1 Nightly build How reproducible: ----------------- Consistent Steps to Reproduce: -------------------- 1. Poweroff one of the RHGS node in the 'Trusted Storage Pool' 2. Execute 'gluster volume status' Actual results: --------------- All gluster cli commands fail with error "Error : Request timed out" Expected results: ----------------- Atleast after sometime, gluster should detect that the RHGS node is down, and should not block/fail subsequent gluster cli commands Additional info: ---------------- In RHGS 3.0.4, this issue was not there and gluster could able to detect that when the RHGS node is down. I tried the testcase, by blocking all network traffic from the particular RHGS node to all other node ( both incoming & outgoing ), and again I could hit this problem.
With the latest testing, I had only 2 nodes in the cluster and did the following steps: 1. Created a 2 node 'Trusted Storage Pool' 2. Created a plain distribute volume with a single brick on node1 3. Powered off node2 ( as this RHGS node was a VM, I did 'virsh destroy rhsvm' ) Result - All gluster cli commands started to error out. [root@ ~]# gluster v status Error : Request timed out Proposing this bug as a BLOCKER based on following thoughts, Any node in the cluster could go down abruptly ( hardware failure can't be predicted ) and that leads to all gluster cli commands failing
I have tried the same case with baremetal machines and I see the same behaviour of gluster cli commands hanging after one of the machines is shutdown forcefully. Here I performed 'Power off server - Immediate', through supermicro console
Doc text is edited. Please sign off to be included in Known Issues.
Updated the doc text, please review and sign off.
Anjana, The updated documentation looks good to me. Thanks for editing it.