Bug 1519742 - Split Brain data is not reflected in grafana
Summary: Split Brain data is not reflected in grafana
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: web-admin-tendrl-monitoring-integration
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Nishanth Thomas
QA Contact: Lubos Trilety
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-01 10:59 UTC by Vijay Avuthu
Modified: 2017-12-18 04:38 UTC (History)
7 users (show)

Fixed In Version: tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-18 04:38:26 UTC
Target Upstream Version:


Attachments (Terms of Use)
Healing panel (4.60 KB, image/png)
2017-12-05 18:13 UTC, Filip Balák
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:3478 normal SHIPPED_LIVE RHGS Web Administration packages 2017-12-18 09:34:49 UTC
Github https://github.com/Tendrl node-agent pull 701 None None None 2017-12-06 09:55:38 UTC
Red Hat Bugzilla 1523786 None None None Never

Internal Links: 1523786

Description Vijay Avuthu 2017-12-01 10:59:18 UTC
Description of problem:

Split Brain data is not reflected in grafana. Showing as 0 even though there are split brain count in gluster.

Version-Release number of selected component (if applicable):

#dhcp46-247.lab.eng.blr.redhat.com
[root@dhcp46-247 ~]# rpm -qa | grep -i tendrl
tendrl-monitoring-integration-1.5.4-8.el7rhgs.noarch
tendrl-commons-1.5.4-5.el7rhgs.noarch
tendrl-api-httpd-1.5.4-3.el7rhgs.noarch
tendrl-selinux-1.5.4-1.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-8.el7rhgs.noarch
tendrl-notifier-1.5.4-5.el7rhgs.noarch
tendrl-node-agent-1.5.4-8.el7rhgs.noarch
tendrl-ui-1.5.4-4.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-1.el7rhgs.noarch
tendrl-ansible-1.5.4-2.el7rhgs.noarch
tendrl-api-1.5.4-3.el7rhgs.noarch
[root@dhcp46-247 ~]# 

How reproducible:

Always

Steps to Reproduce:

1. Create vol ( 1 * 2 )
2. Create split brain files ( stopping all heal daemons )

eg: 

[root@dhcp42-129 ~]# gluster vol heal vol12 info split-brain
Brick 10.70.42.127:/gluster/brick10/b10
/
Status: Connected
Number of entries in split-brain: 1

Brick 10.70.42.119:/gluster/brick10/b10
/
Status: Connected
Number of entries in split-brain: 1

[root@dhcp42-129 ~]# 

3. check the grafana for the same volume.

Actual results:

Showing as 0

Expected results:

It should show Split Brain - 1

Additional info:

Comment 3 Shubhendu Tripathi 2017-12-04 05:39:37 UTC
@Vijay, we have modified the logic to use `gluster volume heal <volname> info` and `gluster volume heal <volname> info split-brain` now. If the last command `gluster volume heal <volname> info split-brain` returns a value, it should be reported in grafana. Yes it might take sometime to reflect in grafana as next sync would reflect the same.

Comment 4 Shubhendu Tripathi 2017-12-04 06:18:33 UTC
Also, please check if you are using the latest builds of tendrl as there have been changes around heal info commands. If not I would suggest to migrate to latest builds and verify the scenario once.

Comment 6 Filip Balák 2017-12-05 18:11:53 UTC
I followed reproducer for setting split brain file from https://usmqe-testdoc.readthedocs.io/en/latest/web/alerting/glusternative_subvolume.html
I ended up with:

````
# gluster volume heal volume_alpha_distrep_6x2 info split-brain
...
Brick fbalak-usm1-gl5.usmqe.com:/mnt/brick_alpha_distrep_2/2
/
Status: Connected
Number of entries in split-brain: 1

Brick fbalak-usm1-gl6.usmqe.com:/mnt/brick_alpha_distrep_2/2
/
Status: Connected
Number of entries in split-brain: 1
````

But even after hour in `Volume` dashboard is:
`Split Brain: -`

--> ASSIGNED

Tested with:
tendrl-gluster-integration-1.5.4-8.el7rhgs.noarch
tendrl-ansible-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.5.4-5.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-11.el7rhgs.noarch
tendrl-selinux-1.5.4-1.el7rhgs.noarch
tendrl-commons-1.5.4-6.el7rhgs.noarch
tendrl-api-1.5.4-4.el7rhgs.noarch
tendrl-api-httpd-1.5.4-4.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-1.el7rhgs.noarch
tendrl-node-agent-1.5.4-9.el7rhgs.noarch
tendrl-notifier-1.5.4-6.el7rhgs.noarch

Comment 7 Filip Balák 2017-12-05 18:13:22 UTC
Created attachment 1363305 [details]
Healing panel

Comment 9 Nishanth Thomas 2017-12-05 19:07:18 UTC
@filip, What is the system configuration of Tendrl server and storage nodes
Also require the logs
We need access to your setup. We have seen this working in our environment.

Comment 11 Shubhendu Tripathi 2017-12-06 09:55:39 UTC
Added a PR https://github.com/Tendrl/node-agent/pull/701 to handle parsing of `heal info` output in split brain case.

Comment 12 Lubos Trilety 2017-12-08 13:43:33 UTC
Tested with:
tendrl-monitoring-integration-1.5.4-13.el7rhgs.noarch

The split-brain number is not zero or empty anymore.

However as the number is sum of all entries, it's always multiple of replica count.
i.e. 1 'bad' file on volume with replica 3 means there will be 'Split Brains - 3' in Grafana Healing panel. Because there is 1 occurrence of the same file with different content on each brick from the replica set. I am not sure if this is correct behaviour.

Comment 13 Shubhendu Tripathi 2017-12-08 15:23:29 UTC
Lubos, yes this concern was discussed with Bala as well. Bala would be raising a separate BZ for the same.

Comment 14 Nishanth Thomas 2017-12-10 11:23:55 UTC
Please refer https://bugzilla.redhat.com/show_bug.cgi?id=1523786 and fix is available in the latest builds

Comment 15 Lubos Trilety 2017-12-11 09:42:37 UTC
Tested with:
tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch

The split-brain status is displayed on Grafana per brick.

Comment 17 errata-xmlrpc 2017-12-18 04:38:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3478


Note You need to log in before you can comment on or make changes to this bug.