Description of problem: When the quorum kicks in, it sometimes doesn't kill the servers. And sometimes it doesn't start the killed servers. The behavior is totally random. Mentioning below the steps to reproduce the same. Enable quorum: gluster volume set dist cluster.server-quorum-type server Set quorum ratio: gluster volume set all cluster.server-quorum-ratio 80 On one of the servers, disable all network traffic except that of ssh, this is done so that we have the ssh access to the machine. ----------------- iptables -A INPUT -p tcp --dport 22 -j ACCEPT iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT iptables -A INPUT -j DROP; iptables -A OUTPUT -j DROP ----------------- Check the glusterfsd processes on all the machines, on some machines the glusterfsd processes are not killed. Now restore the network settings... iptables -F At this point the expectation is all the killed servers are brought back up. This doesn't happen sometimes. The following steps have to be repeated a few times to notice the behavior, this behaviour is a bit random.
Created attachment 684900 [details] sosreport for the failed instance
This is tested on update 4. And the installed rpms: glusterfs-fuse-3.3.0.5rhs-40.el6rhs.x86_64 vdsm-gluster-4.9.6-17.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-account-1.4.8-4.el6.noarch glusterfs-3.3.0.5rhs-40.el6rhs.x86_64 gluster-swift-doc-1.4.8-4.el6.noarch glusterfs-server-3.3.0.5rhs-40.el6rhs.x86_64 glusterfs-rdma-3.3.0.5rhs-40.el6rhs.x86_64 gluster-swift-object-1.4.8-4.el6.noarch gluster-swift-container-1.4.8-4.el6.noarch org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch gluster-swift-1.4.8-4.el6.noarch gluster-swift-proxy-1.4.8-4.el6.noarch glusterfs-geo-replication-3.3.0.5rhs-40.el6rhs.x86_64
Per Feb-06 bug triage meeting, targeting for 2.1.0.
The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version. [1] https://rhn.redhat.com/errata/RHSA-2014-0821.html