1508041 – 5 from 6 nodes are down and some chart don't reflect it

Bug 1508041 - 5 from 6 nodes are down and some chart don't reflect it

Summary: 5 from 6 nodes are down and some chart don't reflect it

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-monitoring-integration
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Ankush Behl
QA Contact:	Martin Kudlej
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-31 17:53 UTC by Martin Kudlej
Modified:	2017-12-18 04:39 UTC (History)
CC List:	5 users (show)
Fixed In Version:	tendrl-monitoring-integration-1.5.4-3.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-18 04:39:36 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	https://github.com/Tendrl monitoring-integration issues 145	None	None	None	2017-11-07 08:12:17 UTC
Red Hat Bugzilla	1509873	unspecified	CLOSED	Errors Per Second panel doesn't reflect errors on hosts	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1519201	unspecified	CLOSED	WA doesn't reflect that all gluster nodes are down	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHEA-2017:3478	normal	SHIPPED_LIVE	RHGS Web Administration packages	2017-12-18 09:34:49 UTC

Internal Links: 1509873 1519201

Description Martin Kudlej 2017-10-31 17:53:24 UTC

Description of problem:
There are still some issues mentioned in upstream issue https://github.com/Tendrl/monitoring-integration/issues/145

Not correct data in charts:
*At glance*:
- Volumes
- Bricks

*bricks dashboard*:
- Status in 'N/A' even if brick is up on last running node.
- Capacity utilization is 53.2% even if it is 46.8% according 'gluster get-state' command output, correct value is shown in chart Capacity utilization trend
- there is no data in "Disk Load" section even if brick is up

*Hosts* dashboard for host which is down:
- most of charts shows 'Zero' value even if they should show 'N/A' instead of 'Zero' 

*Hosts* dashboard for last host which is up:
- bricks info in couple of charts is not correct, it is not shown that bricks are up on this last node.
- see screenshots for other issues at this dashboard

Version-Release number of selected component (if applicable):
etcd-3.2.7-1.el7.x86_64
glusterfs-3.8.4-18.4.el7.x86_64
glusterfs-3.8.4-50.el7rhgs.x86_64
glusterfs-api-3.8.4-50.el7rhgs.x86_64
glusterfs-cli-3.8.4-50.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-18.4.el7.x86_64
glusterfs-client-xlators-3.8.4-50.el7rhgs.x86_64
glusterfs-events-3.8.4-50.el7rhgs.x86_64
glusterfs-fuse-3.8.4-18.4.el7.x86_64
glusterfs-fuse-3.8.4-50.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-50.el7rhgs.x86_64
glusterfs-libs-3.8.4-18.4.el7.x86_64
glusterfs-libs-3.8.4-50.el7rhgs.x86_64
glusterfs-server-3.8.4-50.el7rhgs.x86_64
python-etcd-0.4.5-1.noarch
rubygem-etcd-0.3.0-1.el7.noarch
tendrl-ansible-1.5.3-2.el7rhgs.noarch
tendrl-api-1.5.3-2.el7rhgs.noarch
tendrl-api-httpd-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.3-1.el7rhgs.noarch
tendrl-gluster-integration-1.5.3-2.el7rhgs.noarch
tendrl-grafana-plugins-1.5.3-2.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.3-2.el7rhgs.noarch
tendrl-node-agent-1.5.3-3.el7rhgs.noarch
tendrl-notifier-1.5.3-1.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-ui-1.5.3-2.el7rhgs.noarch

How reproducible:
100%

Steps to Reproduce:
1. install gluster cluster with one arbiter or disperse volume, import it to Tendrl
2. wait for couple of minutes
3. shutdown 5 from 6 nodes in cluster

Actual results:
There are charts in grafana which don't reflect nodes statuses and info.

Expected results:
All charts in grafana will reflect that 5 from 6 nodes are down.

Comment 1 Shubhendu Tripathi 2017-11-10 07:30:31 UTC

Based on latest changes I see the grafana dashboards as below in case few nodes are down (shutdwon) in the cluster

*At glance*:
- Volumes - no of partial/down volumes shown if few volumes have bricks from down node
- Bricks - down count shown for the brick from the down nodes

*bricks dashboard*:
- Status - Stopped for the bricks from the down nodes
- Capacity utilization - based on current changes utilization %tage should be fine
- "Disk Load" section - I see charts populated for the bricks from UP node

*Hosts* dashboard for host which is down:
- for down nodes values are shown as NA  and for UP nodes values are populated

*Hosts* dashboard for last host which is up:
- on up node bricks are shown as started (green color)

Request verifying the dashboards with next build.

Comment 3 Martin Kudlej 2017-11-16 13:10:28 UTC

Tested with 
etcd-3.2.7-1.el7.x86_64
glusterfs-3.8.4-52.el7_4.x86_64
glusterfs-3.8.4-52.el7rhgs.x86_64
glusterfs-api-3.8.4-52.el7rhgs.x86_64
glusterfs-cli-3.8.4-52.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-52.el7_4.x86_64
glusterfs-client-xlators-3.8.4-52.el7rhgs.x86_64
glusterfs-events-3.8.4-52.el7rhgs.x86_64
glusterfs-fuse-3.8.4-52.el7_4.x86_64
glusterfs-fuse-3.8.4-52.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-52.el7rhgs.x86_64
glusterfs-libs-3.8.4-52.el7_4.x86_64
glusterfs-libs-3.8.4-52.el7rhgs.x86_64
glusterfs-rdma-3.8.4-52.el7rhgs.x86_64
glusterfs-server-3.8.4-52.el7rhgs.x86_64
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.3.x86_64
python-etcd-0.4.5-1.el7rhgs.noarch
python-gluster-3.8.4-52.el7rhgs.noarch
rubygem-etcd-0.3.0-1.el7rhgs.noarch
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-collectd-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-2.el7rhgs.noarch
tendrl-gluster-integration-1.5.4-2.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-3.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-3.el7rhgs.noarch
tendrl-node-agent-1.5.4-2.el7rhgs.noarch
tendrl-notifier-1.5.4-2.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-ui-1.5.4-2.el7rhgs.noarch
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch

and it works. --> VERIFIED

Comment 5 errata-xmlrpc 2017-12-18 04:39:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3478

Note You need to log in before you can comment on or make changes to this bug.