1585116 – Grafana alert dashboard does not raise alerts when nodes have string "tendrl" in hostname

Bug 1585116 - Grafana alert dashboard does not raise alerts when nodes have string "tendrl" in hostname

Summary: Grafana alert dashboard does not raise alerts when nodes have string "tendrl"...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-monitoring-integration
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Anmol Sachan
QA Contact:	Daniel Horák
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-06-01 10:11 UTC by Anmol Sachan
Modified:	2018-09-04 07:08 UTC (History)
CC List:	5 users (show)
Fixed In Version:	tendrl-monitoring-integration-1.6.3-6.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 07:07:10 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Alerts not showing (119.82 KB, image/png) 2018-06-04 12:44 UTC, Anmol Sachan	no flags	Details
metrics column not appearing properly (888.81 KB, image/x-xcf) 2018-06-04 12:45 UTC, Anmol Sachan	no flags	Details
Failed verification 1: some Bricks are utilized for more than 90% (111.79 KB, image/png) 2018-06-29 06:14 UTC, Daniel Horák	no flags	Details
Failed verification 3: no alerts related to nearly full bricks are shown at all (48.50 KB, image/png) 2018-06-29 06:15 UTC, Daniel Horák	no flags	Details
Failed verification 2: capacity utilization graphs are shown correctly, but no alerts visible (101.60 KB, image/png) 2018-06-29 06:16 UTC, Daniel Horák	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	Tendrl monitoring-integration issues 480	0	None	None	None	2018-06-01 10:11:41 UTC
Red Hat Product Errata	RHSA-2018:2616	0	None	None	None	2018-09-04 07:08:05 UTC

Description Anmol Sachan 2018-06-01 10:11:42 UTC

Description of problem:

Grafana alert dashboard does not raise alerts when nodes have string "tendrl" in hostname

Version-Release number of selected component (if applicable):


How reproducible: Can be reproduced

Steps to Reproduce:
1. Create tendrl storage nodes with hostname similar to : tendrl-node-1, tendrl-node-2 , etc
2. Let the cluster run for sometime.
3.Check for alerts in Grafana alert dashboard.

Actual results: Alerts are not raised


Expected results: Alerts should be raised, and should not be dependent on hostname.

Additional info:

Comment 2 Anmol Sachan 2018-06-04 12:44:47 UTC

Created attachment 1447425 [details]
Alerts not showing

Comment 3 Anmol Sachan 2018-06-04 12:45:46 UTC

Created attachment 1447426 [details]
metrics column not appearing properly

Comment 6 Daniel Horák 2018-06-28 19:50:47 UTC

Reproduced on:
# rpm -qa | grep -e tendrl -e grafana | sort
grafana-4.3.2-3.el7rhgs.x86_64
tendrl-ansible-1.6.3-3.el7rhgs.noarch
tendrl-api-1.6.3-2.el7rhgs.noarch
tendrl-api-httpd-1.6.3-2.el7rhgs.noarch
tendrl-commons-1.6.3-3.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-1.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-1.el7rhgs.noarch
tendrl-node-agent-1.6.3-3.el7rhgs.noarch
tendrl-notifier-1.6.3-2.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-1.el7rhgs.noarch

with Gluster storage nodes:
  tendrl-usm1-gl1.usmqe.lab.eng.brq.redhat.com
  tendrl-usm1-gl2.usmqe.lab.eng.brq.redhat.com
  tendrl-usm1-gl3.usmqe.lab.eng.brq.redhat.com
  tendrl-usm1-gl4.usmqe.lab.eng.brq.redhat.com
  tendrl-usm1-gl5.usmqe.lab.eng.brq.redhat.com
  tendrl-usm1-gl6.usmqe.lab.eng.brq.redhat.com

Comment 7 Daniel Horák 2018-06-29 06:14:27 UTC

Created attachment 1455435 [details]
Failed verification 1: some Bricks are utilized for more than 90%

Comment 8 Daniel Horák 2018-06-29 06:15:18 UTC

Created attachment 1455436 [details]
Failed verification 3: no alerts related to nearly full bricks are shown at all

Comment 9 Daniel Horák 2018-06-29 06:16:09 UTC

Created attachment 1455437 [details]
Failed verification 2: capacity utilization graphs are shown correctly, but no alerts visible

Comment 10 Daniel Horák 2018-06-29 06:22:50 UTC

The issue seems to be partially fixed in the new packages, but there is still
problem with the alerts.

I've tried to utilize one Gluster Volume and correspondingly the underlying
Bricks to more than 90% with following result:
* the utilization data are properly visible in the graphs
* but no alerts related to capacity utilization are raised at all

(see attachment 1455435 [details], attachment 1455436 [details] and attachment 1455437 [details])

When I retested the same scenario on cluster with Storage nodes named
differently, Brick capacity utilization alerts were properly populated and
visible both on RHGS WA Events page and in Grafana Alerts - Brick Dashboard.

Version-Release number of selected component:
  grafana-4.3.2-3.el7rhgs.x86_64
  tendrl-ansible-1.6.3-5.el7rhgs.noarch
  tendrl-api-1.6.3-3.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
  tendrl-commons-1.6.3-7.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch
  tendrl-node-agent-1.6.3-7.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-4.el7rhgs.noarch

>> ASSIGNED

Comment 11 Daniel Horák 2018-07-09 14:58:45 UTC

I've tried to reproduce it with the newest packages and it seems to work...

I'll have to retest it again and I'll update it here later this week.

# rpm -qa | grep -e tendrl -e grafana| sort
  grafana-4.3.2-3.el7rhgs.x86_64
  tendrl-ansible-1.6.3-5.el7rhgs.noarch
  tendrl-api-1.6.3-4.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
  tendrl-commons-1.6.3-8.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-6.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-6.el7rhgs.noarch
  tendrl-node-agent-1.6.3-8.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-6.el7rhgs.noarch

Comment 12 Anmol Sachan 2018-07-12 11:01:55 UTC

@daniel
Even I was not able to reproduce it.
Please give an update if you are able to reproduce this.

Comment 13 Daniel Horák 2018-07-12 13:13:27 UTC

My observation is, that with the older version of tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch, the result of the tested scenario is not consistent and that was the reason why I returned it back to assigned in comment 10.

With the newer version of tendrl-monitoring-integration-1.6.3-6.el7rhgs.noarch, it seems to work as expected.

Base on this, please move it to ON_QA and update the Fixed In Version to the latest version and I'll verify it.

Comment 14 Daniel Horák 2018-07-13 08:07:10 UTC

Tested on cluster consisting of nodes with following hostnames:
  [tendrl_server]
  tendrl-usm1-server

  [gluster_servers]
  tendrl-usm1-gl1
  tendrl-usm1-gl2
  tendrl-usm1-gl3
  tendrl-usm1-gl4
  tendrl-usm1-gl5
  tendrl-usm1-gl6

# rpm -qa | grep -e tendrl -e grafana | sort
  grafana-4.3.2-3.el7rhgs.x86_64
  tendrl-ansible-1.6.3-5.el7rhgs.noarch
  tendrl-api-1.6.3-4.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
  tendrl-commons-1.6.3-8.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-6.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-6.el7rhgs.noarch
  tendrl-node-agent-1.6.3-8.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-6.el7rhgs.noarch

Grafana properly raise utilization alerts.

>> VERIFIED

Comment 16 errata-xmlrpc 2018-09-04 07:07:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616

Note You need to log in before you can comment on or make changes to this bug.