Description of problem: ======================== In a cluster, If i restarted a one of the nodes, I find that the peer status shown in all the other nodes for the downed node as State: Peer in Cluster (Disconnected) This remains for ever whereas on the node which is rebooted, it shows all nodes as connected, but I cannot perform any operation as the glusterd is unable to communicate(like gluster v delete, etc) Version-Release number of selected component (if applicable): ============================================= [root@network glusterfs]# rpm -qa|grep gluster glusterfs-client-xlators-3.7.9-1.el7rhgs.x86_64 glusterfs-server-3.7.9-1.el7rhgs.x86_64 python-gluster-3.7.5-19.el7rhgs.noarch gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64 vdsm-gluster-4.16.30-1.3.el7rhgs.noarch glusterfs-3.7.9-1.el7rhgs.x86_64 glusterfs-api-3.7.9-1.el7rhgs.x86_64 glusterfs-cli-3.7.9-1.el7rhgs.x86_64 glusterfs-geo-replication-3.7.9-1.el7rhgs.x86_64 glusterfs-debuginfo-3.7.9-1.el7rhgs.x86_64 gluster-nagios-common-0.2.3-1.el7rhgs.noarch glusterfs-libs-3.7.9-1.el7rhgs.x86_64 glusterfs-fuse-3.7.9-1.el7rhgs.x86_64 glusterfs-rdma-3.7.9-1.el7rhgs.x86_64 How reproducible: ================ always Steps to Reproduce: 1.had a 6 node cluster 2.created a 4+2 ec volume 3.wanted to check IOs when one node is down, hence brought rebooted a node On reboot, I see that cluster peer status from any of the nodes which were up shows as connected, but for the node which is rebooted as disconnected [root@dhcp35-228 ~]# gluster peer status Number of Peers: 5 Hostname: dhcp35-70.lab.eng.blr.redhat.com Uuid: a1089307-8628-4927-8016-dbf8a5e25370 State: Peer in Cluster (Connected) Other names: 10.70.35.70 Hostname: 10.70.35.71 Uuid: 64828171-ee80-4b2c-b0ad-4db040c77778 State: Peer in Cluster (Connected) Hostname: 10.70.35.108 Uuid: 16c133d3-7e32-4518-84e3-59d65e3fae3b State: Peer in Cluster (Connected) Hostname: 10.70.35.209 Uuid: c27b1ffb-382c-4771-94a1-4d972fa4263f State: Peer in Cluster (Disconnected) ---------------->rebooted node Hostname: 10.70.35.140 Uuid: 5e69cedc-7070-4f6d-be92-5254fff17064 State: Peer in Cluster (Connected)
From the sosreport of the downnode ( 10.70.37.209 ), I observed that the iptables rules are enabled, and no rules to allow glusterd port (24007) was added. While the iptables rules on upnode were flushed, instead of opening up glusterd ports. So the RCA for this issue is - Trusted storage pool was formed by flushing iptables rules on all the nodes, and post reboot these rules again came in to effect causing the node to go disconnected from other nodes. Its highly recommended to open up ports as glusterfs firewalld service file is now available ( from RHGS 3.1.1 ), # firewall-cmd --zone=public --add-service=glusterfs # firewall-cmd --zone=public --add-service=glusterfs --permanent