Created attachment 1542818 [details] Cluster dashboard Description of problem: Grafana sometimes reports wrong number of hosts down when all nodes are shut down. When I stop services tendrl-node-agent, collectd and tendrl-gluster-integration then for the first time grafana usually shows correctly that all nodes are down but if I start them and after a while I stop these services again then grafana reports that 4 hosts are down and 2 are up. This is happening consistently with multiple installation with 6 nodes. Version-Release number of selected component (if applicable): tendrl-ansible-1.6.3-11.el7rhgs.noarch tendrl-api-1.6.3-13.el7rhgs.noarch tendrl-api-httpd-1.6.3-13.el7rhgs.noarch tendrl-commons-1.6.3-17.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-21.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-3.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-21.el7rhgs.noarch tendrl-node-agent-1.6.3-18.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-3.el7rhgs.noarch tendrl-ui-1.6.3-15.el7rhgs.noarch How reproducible: 60% Steps to Reproduce: 1. Import cluster with 6 nodes into Tendrl. 2. Stop services tendrl-node-agent, collectd and tendrl-gluster-integration on all nodes. 3. Wait for 5 minutes. 4. Check Cluster dashboard. 5. Start services tendrl-node-agent, collectd and tendrl-gluster-integration on all nodes. 6. Wait for all nodes to start. 7. Repeat steps 2-6 multiple times Actual results: In most of the times it reports 4 nodes are down and 2 nodes are up. Expected results: There should be always reported that all nodes are down. Additional info: Stopping/starting of services is automated by: https://github.com/usmqe/usmqe-setup/blob/master/test_setup.tendrl_services_stopped_on_nodes.yml https://github.com/usmqe/usmqe-setup/blob/master/test_teardown.tendrl_services_stopped_on_nodes.yml
PR: https://github.com/Tendrl/commons/pull/1077 Modified tendrl-node-agent watcher thread logic to report node status correctly
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3251