1531133 – Brick Utilization: threshold breached Alert needs to reference gluster volume name

Bug 1531133 - Brick Utilization: threshold breached Alert needs to reference gluster volume name

Summary: Brick Utilization: threshold breached Alert needs to reference gluster volume...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-gluster-integration
Sub Component:
Version:	rhgs-3.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Nishanth Thomas
QA Contact:	Filip Balák
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-01-04 16:07 UTC by Annette Clewett
Modified:	2018-09-04 07:01 UTC (History)
CC List:	11 users (show)
Fixed In Version:	tendrl-monitoring-integration-1.6.1-3.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 07:00:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	Tendrl monitoring-integration issues 384	0	None	None	None	2018-03-20 12:38:39 UTC
Red Hat Bugzilla	1614334	1	None	None	None	2024-09-18 00:48:38 UTC
Red Hat Product Errata	RHSA-2018:2616	0	None	None	None	2018-09-04 07:01:24 UTC

Internal Links: 1614334

Description Annette Clewett 2018-01-04 16:07:29 UTC

Description of problem:
When a gluster volume usage goes above 75% an Alert is generated with only the brick volume group with no easy way to reference back to the gluster volume name. This is a big issue for CRS and OpenShift. If a gluster volume goes above 75% the only way to map to OCP Persistent Volume name is via the gluster volume name, not the brick or volume group name. 

Version-Release number of selected component (if applicable):
tendrl-gluster-integration-1.5.4-8.el7rhgs.noarch.rpm

How reproducible:
Always

Steps to Reproduce:
1.Identify gluster volume and write data into volume to exceed 75% threshold. 
2.Tendrl Alert should be generated for all bricks in the volumes
3.Alert will look something like below (have removed actual gluster server hostname because Alert is from customer tendrl installation). In the case of CRS 3 Alerts will be generated and sent to specified email address, one for each brick.
Brick utilization of <gluster_server_hostname>:|var|lib|heketi|mounts|vg_0bfd0da65ef15a9d75692a67b838cfc9|brick_c5e7ee1e0704c91888f04cfb4cb50017|brick in cluster 7bc6aa73-0c97-404b-88a2-077b5c77656a is 82.29 % which is above WARNING threshold (75 %)
4. No easy way to track to what gluster volume these Alerts are associated with. 

Actual results:


Expected results:


Additional info:

Comment 7 Filip Balák 2018-05-23 12:29:52 UTC

Warnings for utilization are now in format:
```
Brick utilization on <host>:<brick> in <volume> at 75.31 % and nearing full capacity
```

The information about gluster volume is there but the information about cluster was dropped. Is this expected?


Tested with:
tendrl-notifier-1.6.3-3.el7rhgs.noarch

Comment 8 Ju Lim 2018-06-13 20:11:17 UTC

In reading through the bug, it appears this (not showing the volume name vs. the vg name) is happening in a CRS deployment with heketi.

This works fine in a RHGS standalone; however, we've not tried WA in a CNS or CRS environment.

I suspect that with CRS + heketi, something is happening at the heketi layer.  Note: WA does not currently support CNS/CRS.

Comment 9 Nishanth Thomas 2018-06-14 07:22:38 UTC

@Ju Please look at the patch, basically the volume name was missing which got added.
@filip, As part of enhancing the log/alert messages(based on suggestions from UX team) this message was enhanced via upstream PR - https://github.com/Tendrl/monitoring-integration/pull/407 . So whatever message you are seeing now is good to go.

Comment 10 Ju Lim 2018-06-15 17:36:02 UTC

Ack @nthomas that the log/alert message fix.  However, we still need to verify how this behaves in a CRS scenario.

Comment 11 Filip Balák 2018-06-29 15:16:11 UTC

During log/alert messages enhancing was dropped information about cluster that contains the brick [1]. This information should be present in brick utilization alert as volume with same volume name can be present in more managed clusters.
--> ASSIGNED

[1] https://github.com/Tendrl/monitoring-integration/pull/407/files#diff-1deee8133d7438510cd75699ed55c591L61

Comment 12 gowtham 2018-06-30 14:06:12 UTC

Filip this should be a different bug, this bug contains discussion about we need volume name in brick alert or not, about cluster name we have to create new Bugzilla issue and start discussion there.

Comment 13 Martin Bukatovic 2018-07-02 09:15:17 UTC

(In reply to gowtham from comment #12)
> Filip this should be a different bug, this bug contains discussion about we
> need volume name in brick alert or not, about cluster name we have to create
> new Bugzilla issue and start discussion there.

Exactly. *This BZ is about volume name in brick alert*, and for this reason, fix
of this BZ should do just that.

But Filip noticed that we have for some reason dropped cluster name while adding
volume name in the alert, which *is not expected* to be part of a BZ dealing with
*volume name in brick alert*.

We can't tweak (or drop) features like that, without any reasoning and agreement.
And for this reason, I agree with you, that we should have a separate BZ for
discussion about cluster name in the alert.

So we need to:

 * reintroduce the cluster name, so that this BZ can be verified
 * propose removal of cluster name from the alert in the separate BZ, and if
   approved, we will remove it

You can avoid reintroducing and removing the cluster name by having the BZ for
cluster name removal in the alert approved and acked first, before you move this
one to ON QE stage again.

Comment 14 gowtham 2018-07-02 09:43:59 UTC

I agree with Martin

Comment 18 Filip Balák 2018-08-09 12:42:09 UTC

BZ for adding cluster name was created: BZ 1614334 --> VERIFIED

Tested with:
tendrl-ansible-1.6.3-6.el7rhgs.noarch
tendrl-api-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
tendrl-commons-1.6.3-11.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-8.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-8.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-9.el7rhgs.noarch

Comment 20 errata-xmlrpc 2018-09-04 07:00:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616

Note You need to log in before you can comment on or make changes to this bug.