Created attachment 1360864 [details] gl1 is down Description of problem: This bug is probably related to bug 1508041 and I've found it during testing of bug 1517468. See screenshots. Version-Release number of selected component (if applicable): etcd-3.2.7-1.el7.x86_64 glusterfs-3.8.4-52.el7_4.x86_64 glusterfs-client-xlators-3.8.4-52.el7_4.x86_64 glusterfs-fuse-3.8.4-52.el7_4.x86_64 glusterfs-libs-3.8.4-52.el7_4.x86_64 python-etcd-0.4.5-1.el7rhgs.noarch rubygem-etcd-0.3.0-1.el7rhgs.noarch tendrl-ansible-1.5.4-2.el7rhgs.noarch tendrl-api-1.5.4-3.el7rhgs.noarch tendrl-api-httpd-1.5.4-3.el7rhgs.noarch tendrl-commons-1.5.4-5.el7rhgs.noarch tendrl-grafana-plugins-1.5.4-8.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-1.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-8.el7rhgs.noarch tendrl-node-agent-1.5.4-8.el7rhgs.noarch tendrl-notifier-1.5.4-5.el7rhgs.noarch tendrl-selinux-1.5.4-1.el7rhgs.noarch tendrl-ui-1.5.4-4.el7rhgs.noarch How reproducible: 100% Steps to Reproduce: 1. install and setup WA and gluster, import gluster cluster into WA, wait for about hour 2. shut down all gluster nodes (this can be real situation in case of big failure) 3. after about 30 minutes(to be completely sure that shown data is correct) check Grafana dashboards and WA UI Actual results: Data shown in Grafana and WA UI is not correct and doesn't reflect reality. There is difference between some of info in UI (in one list node "gl1" is down and in another one node "gl1" is up). Expected results: All charts related to status are in red and there are related alerts about this situation.
Created attachment 1360866 [details] gl1 is up
Created attachment 1360867 [details] some charts don't reflect status of nodes
I haven't requested screenshot because I haven't expected that I find new bug. I've run "shutdown -h now" on all Gluster nodes.
I am not able to reproduce this issue with latest builds. After the reboot I could see that host status information is up-to-date in tendrl UI and grafana dashboard. Having said that, there are issues around the updates of volumes, bricks etc on the grafana dashbord(which is not discussed as part of this bug) when all the nodes are shut-down. This is because the all the agents(which is responsible for this updates) running on the nodes are down. This needs to be tackled differently. I don't think this is something which can be taken in for this release. Also this scenario is very rare in a production environment. Even if happens, the host down status is correctly indicated on the dashboard and that's a good enough indication for the administrator to take action . Having discussed it with QE(Sweta), it has been agreed to document this bug as a known_issue for this release.
Updated, pls check
*** Bug 1583724 has been marked as a duplicate of this bug. ***
*** Bug 1583727 has been marked as a duplicate of this bug. ***
I tested several times the scenario. The current status: * All nodes in Hosts page are Down as expected. * There remains at least one (the last shut down node) as Up in Grafana. * Volume disappears from UI (BZ 1588436). * Not all bricks are Down in UI and in Grafana. --> ASSIGNED Tested with: tendrl-ansible-1.6.3-4.el7rhgs.noarch tendrl-api-1.6.3-3.el7rhgs.noarch tendrl-api-httpd-1.6.3-3.el7rhgs.noarch tendrl-commons-1.6.3-6.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-4.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-4.el7rhgs.noarch tendrl-node-agent-1.6.3-6.el7rhgs.noarch tendrl-notifier-1.6.3-3.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-3.el7rhgs.noarch
@anmol, please take a look at https://bugzilla.redhat.com/show_bug.cgi?id=1588436#c8
PRs are under review https://github.com/Tendrl/commons/pull/989 https://github.com/Tendrl/gluster-integration/pull/660
Looks ok. All status panels in Grafana and UI reflect the status of hosts and bricks correctly and alerts are raised. --> VERIFIED Tested with: tendrl-ansible-1.6.3-5.el7rhgs.noarch tendrl-api-1.6.3-3.el7rhgs.noarch tendrl-api-httpd-1.6.3-3.el7rhgs.noarch tendrl-commons-1.6.3-7.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch tendrl-node-agent-1.6.3-7.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-4.el7rhgs.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616