Description of problem: When a gluster volume usage goes above 75% an Alert is generated with only the brick volume group with no easy way to reference back to the gluster volume name. This is a big issue for CRS and OpenShift. If a gluster volume goes above 75% the only way to map to OCP Persistent Volume name is via the gluster volume name, not the brick or volume group name. Version-Release number of selected component (if applicable): tendrl-gluster-integration-1.5.4-8.el7rhgs.noarch.rpm How reproducible: Always Steps to Reproduce: 1.Identify gluster volume and write data into volume to exceed 75% threshold. 2.Tendrl Alert should be generated for all bricks in the volumes 3.Alert will look something like below (have removed actual gluster server hostname because Alert is from customer tendrl installation). In the case of CRS 3 Alerts will be generated and sent to specified email address, one for each brick. Brick utilization of <gluster_server_hostname>:|var|lib|heketi|mounts|vg_0bfd0da65ef15a9d75692a67b838cfc9|brick_c5e7ee1e0704c91888f04cfb4cb50017|brick in cluster 7bc6aa73-0c97-404b-88a2-077b5c77656a is 82.29 % which is above WARNING threshold (75 %) 4. No easy way to track to what gluster volume these Alerts are associated with. Actual results: Expected results: Additional info:
Warnings for utilization are now in format: ``` Brick utilization on <host>:<brick> in <volume> at 75.31 % and nearing full capacity ``` The information about gluster volume is there but the information about cluster was dropped. Is this expected? Tested with: tendrl-notifier-1.6.3-3.el7rhgs.noarch
In reading through the bug, it appears this (not showing the volume name vs. the vg name) is happening in a CRS deployment with heketi. This works fine in a RHGS standalone; however, we've not tried WA in a CNS or CRS environment. I suspect that with CRS + heketi, something is happening at the heketi layer. Note: WA does not currently support CNS/CRS.
@Ju Please look at the patch, basically the volume name was missing which got added. @filip, As part of enhancing the log/alert messages(based on suggestions from UX team) this message was enhanced via upstream PR - https://github.com/Tendrl/monitoring-integration/pull/407 . So whatever message you are seeing now is good to go.
Ack @nthomas that the log/alert message fix. However, we still need to verify how this behaves in a CRS scenario.
During log/alert messages enhancing was dropped information about cluster that contains the brick [1]. This information should be present in brick utilization alert as volume with same volume name can be present in more managed clusters. --> ASSIGNED [1] https://github.com/Tendrl/monitoring-integration/pull/407/files#diff-1deee8133d7438510cd75699ed55c591L61
Filip this should be a different bug, this bug contains discussion about we need volume name in brick alert or not, about cluster name we have to create new Bugzilla issue and start discussion there.
(In reply to gowtham from comment #12) > Filip this should be a different bug, this bug contains discussion about we > need volume name in brick alert or not, about cluster name we have to create > new Bugzilla issue and start discussion there. Exactly. *This BZ is about volume name in brick alert*, and for this reason, fix of this BZ should do just that. But Filip noticed that we have for some reason dropped cluster name while adding volume name in the alert, which *is not expected* to be part of a BZ dealing with *volume name in brick alert*. We can't tweak (or drop) features like that, without any reasoning and agreement. And for this reason, I agree with you, that we should have a separate BZ for discussion about cluster name in the alert. So we need to: * reintroduce the cluster name, so that this BZ can be verified * propose removal of cluster name from the alert in the separate BZ, and if approved, we will remove it You can avoid reintroducing and removing the cluster name by having the BZ for cluster name removal in the alert approved and acked first, before you move this one to ON QE stage again.
I agree with Martin
BZ for adding cluster name was created: BZ 1614334 --> VERIFIED Tested with: tendrl-ansible-1.6.3-6.el7rhgs.noarch tendrl-api-1.6.3-5.el7rhgs.noarch tendrl-api-httpd-1.6.3-5.el7rhgs.noarch tendrl-commons-1.6.3-11.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-8.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-8.el7rhgs.noarch tendrl-node-agent-1.6.3-9.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-9.el7rhgs.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616