Bug 1519742
Summary: | Split Brain data is not reflected in grafana | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Vijay Avuthu <vavuthu> | ||||
Component: | web-admin-tendrl-monitoring-integration | Assignee: | Nishanth Thomas <nthomas> | ||||
Status: | CLOSED ERRATA | QA Contact: | Lubos Trilety <ltrilety> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | unspecified | CC: | fbalak, ltrilety, rhs-bugs, sanandpa, sankarshan, shtripat, ssaha | ||||
Target Milestone: | --- | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-12-18 04:38:26 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Vijay Avuthu
2017-12-01 10:59:18 UTC
@Vijay, we have modified the logic to use `gluster volume heal <volname> info` and `gluster volume heal <volname> info split-brain` now. If the last command `gluster volume heal <volname> info split-brain` returns a value, it should be reported in grafana. Yes it might take sometime to reflect in grafana as next sync would reflect the same. Also, please check if you are using the latest builds of tendrl as there have been changes around heal info commands. If not I would suggest to migrate to latest builds and verify the scenario once. I followed reproducer for setting split brain file from https://usmqe-testdoc.readthedocs.io/en/latest/web/alerting/glusternative_subvolume.html I ended up with: ```` # gluster volume heal volume_alpha_distrep_6x2 info split-brain ... Brick fbalak-usm1-gl5.usmqe.com:/mnt/brick_alpha_distrep_2/2 / Status: Connected Number of entries in split-brain: 1 Brick fbalak-usm1-gl6.usmqe.com:/mnt/brick_alpha_distrep_2/2 / Status: Connected Number of entries in split-brain: 1 ```` But even after hour in `Volume` dashboard is: `Split Brain: -` --> ASSIGNED Tested with: tendrl-gluster-integration-1.5.4-8.el7rhgs.noarch tendrl-ansible-1.5.4-2.el7rhgs.noarch tendrl-ui-1.5.4-5.el7rhgs.noarch tendrl-grafana-plugins-1.5.4-11.el7rhgs.noarch tendrl-selinux-1.5.4-1.el7rhgs.noarch tendrl-commons-1.5.4-6.el7rhgs.noarch tendrl-api-1.5.4-4.el7rhgs.noarch tendrl-api-httpd-1.5.4-4.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-1.el7rhgs.noarch tendrl-node-agent-1.5.4-9.el7rhgs.noarch tendrl-notifier-1.5.4-6.el7rhgs.noarch Created attachment 1363305 [details]
Healing panel
@filip, What is the system configuration of Tendrl server and storage nodes Also require the logs We need access to your setup. We have seen this working in our environment. Added a PR https://github.com/Tendrl/node-agent/pull/701 to handle parsing of `heal info` output in split brain case. Tested with: tendrl-monitoring-integration-1.5.4-13.el7rhgs.noarch The split-brain number is not zero or empty anymore. However as the number is sum of all entries, it's always multiple of replica count. i.e. 1 'bad' file on volume with replica 3 means there will be 'Split Brains - 3' in Grafana Healing panel. Because there is 1 occurrence of the same file with different content on each brick from the replica set. I am not sure if this is correct behaviour. Lubos, yes this concern was discussed with Bala as well. Bala would be raising a separate BZ for the same. Please refer https://bugzilla.redhat.com/show_bug.cgi?id=1523786 and fix is available in the latest builds Tested with: tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch The split-brain status is displayed on Grafana per brick. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3478 |