Description of problem: There are still some issues mentioned in upstream issue https://github.com/Tendrl/monitoring-integration/issues/145 Not correct data in charts: *At glance*: - Volumes - Bricks *bricks dashboard*: - Status in 'N/A' even if brick is up on last running node. - Capacity utilization is 53.2% even if it is 46.8% according 'gluster get-state' command output, correct value is shown in chart Capacity utilization trend - there is no data in "Disk Load" section even if brick is up *Hosts* dashboard for host which is down: - most of charts shows 'Zero' value even if they should show 'N/A' instead of 'Zero' *Hosts* dashboard for last host which is up: - bricks info in couple of charts is not correct, it is not shown that bricks are up on this last node. - see screenshots for other issues at this dashboard Version-Release number of selected component (if applicable): etcd-3.2.7-1.el7.x86_64 glusterfs-3.8.4-18.4.el7.x86_64 glusterfs-3.8.4-50.el7rhgs.x86_64 glusterfs-api-3.8.4-50.el7rhgs.x86_64 glusterfs-cli-3.8.4-50.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-18.4.el7.x86_64 glusterfs-client-xlators-3.8.4-50.el7rhgs.x86_64 glusterfs-events-3.8.4-50.el7rhgs.x86_64 glusterfs-fuse-3.8.4-18.4.el7.x86_64 glusterfs-fuse-3.8.4-50.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-50.el7rhgs.x86_64 glusterfs-libs-3.8.4-18.4.el7.x86_64 glusterfs-libs-3.8.4-50.el7rhgs.x86_64 glusterfs-server-3.8.4-50.el7rhgs.x86_64 python-etcd-0.4.5-1.noarch rubygem-etcd-0.3.0-1.el7.noarch tendrl-ansible-1.5.3-2.el7rhgs.noarch tendrl-api-1.5.3-2.el7rhgs.noarch tendrl-api-httpd-1.5.3-2.el7rhgs.noarch tendrl-commons-1.5.3-1.el7rhgs.noarch tendrl-gluster-integration-1.5.3-2.el7rhgs.noarch tendrl-grafana-plugins-1.5.3-2.el7rhgs.noarch tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch tendrl-monitoring-integration-1.5.3-2.el7rhgs.noarch tendrl-node-agent-1.5.3-3.el7rhgs.noarch tendrl-notifier-1.5.3-1.el7rhgs.noarch tendrl-selinux-1.5.3-2.el7rhgs.noarch tendrl-ui-1.5.3-2.el7rhgs.noarch How reproducible: 100% Steps to Reproduce: 1. install gluster cluster with one arbiter or disperse volume, import it to Tendrl 2. wait for couple of minutes 3. shutdown 5 from 6 nodes in cluster Actual results: There are charts in grafana which don't reflect nodes statuses and info. Expected results: All charts in grafana will reflect that 5 from 6 nodes are down.
Based on latest changes I see the grafana dashboards as below in case few nodes are down (shutdwon) in the cluster *At glance*: - Volumes - no of partial/down volumes shown if few volumes have bricks from down node - Bricks - down count shown for the brick from the down nodes *bricks dashboard*: - Status - Stopped for the bricks from the down nodes - Capacity utilization - based on current changes utilization %tage should be fine - "Disk Load" section - I see charts populated for the bricks from UP node *Hosts* dashboard for host which is down: - for down nodes values are shown as NA and for UP nodes values are populated *Hosts* dashboard for last host which is up: - on up node bricks are shown as started (green color) Request verifying the dashboards with next build.
Tested with etcd-3.2.7-1.el7.x86_64 glusterfs-3.8.4-52.el7_4.x86_64 glusterfs-3.8.4-52.el7rhgs.x86_64 glusterfs-api-3.8.4-52.el7rhgs.x86_64 glusterfs-cli-3.8.4-52.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-52.el7_4.x86_64 glusterfs-client-xlators-3.8.4-52.el7rhgs.x86_64 glusterfs-events-3.8.4-52.el7rhgs.x86_64 glusterfs-fuse-3.8.4-52.el7_4.x86_64 glusterfs-fuse-3.8.4-52.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-52.el7rhgs.x86_64 glusterfs-libs-3.8.4-52.el7_4.x86_64 glusterfs-libs-3.8.4-52.el7rhgs.x86_64 glusterfs-rdma-3.8.4-52.el7rhgs.x86_64 glusterfs-server-3.8.4-52.el7rhgs.x86_64 gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.3.x86_64 python-etcd-0.4.5-1.el7rhgs.noarch python-gluster-3.8.4-52.el7rhgs.noarch rubygem-etcd-0.3.0-1.el7rhgs.noarch tendrl-ansible-1.5.4-1.el7rhgs.noarch tendrl-api-1.5.4-2.el7rhgs.noarch tendrl-api-httpd-1.5.4-2.el7rhgs.noarch tendrl-collectd-selinux-1.5.3-2.el7rhgs.noarch tendrl-commons-1.5.4-2.el7rhgs.noarch tendrl-gluster-integration-1.5.4-2.el7rhgs.noarch tendrl-grafana-plugins-1.5.4-3.el7rhgs.noarch tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-3.el7rhgs.noarch tendrl-node-agent-1.5.4-2.el7rhgs.noarch tendrl-notifier-1.5.4-2.el7rhgs.noarch tendrl-selinux-1.5.3-2.el7rhgs.noarch tendrl-ui-1.5.4-2.el7rhgs.noarch vdsm-gluster-4.17.33-1.2.el7rhgs.noarch and it works. --> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3478