Bug 1516876 - Rebalance panel status in Grafana
Summary: Rebalance panel status in Grafana
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: web-admin-tendrl-monitoring-integration
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Darshan
QA Contact: Lubos Trilety
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-23 13:24 UTC by Lubos Trilety
Modified: 2017-12-18 04:37 UTC (History)
7 users (show)

Fixed In Version: tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-18 04:37:58 UTC
Embargoed:


Attachments (Terms of Use)
Rebalance status as completed for the volume ssssssh(arbiter volume) (139.85 KB, image/png)
2017-12-06 09:25 UTC, Bala Konda Reddy M
no flags Details
Rebalance status as not started for the volume arbiter which is just started (arbiter volume) (150.71 KB, image/png)
2017-12-06 12:50 UTC, Bala Konda Reddy M
no flags Details
rebalance status (53.77 KB, image/png)
2017-12-06 13:14 UTC, Lubos Trilety
no flags Details
On the volumes tab, I am able to see NA and Not started respectively (114.12 KB, image/png)
2017-12-06 13:51 UTC, Bala Konda Reddy M
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github https://github.com/Tendrl gluster-integration issues 497 0 None None None 2017-11-23 14:39:30 UTC
Github https://github.com/Tendrl monitoring-integration issues 283 0 None None None 2017-11-23 14:39:09 UTC
Red Hat Product Errata RHEA-2017:3478 0 normal SHIPPED_LIVE RHGS Web Administration packages 2017-12-18 09:34:49 UTC

Description Lubos Trilety 2017-11-23 13:24:50 UTC
Description of problem:
Rebalance panel displays NA when rebalance is run manually from CLI

# gluster volume rebalance <volume_name> start
volume rebalance: <volume_name>: success: Rebalance on <volume_name> has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: b7ab4d78-9e2a-4b44-a456-9a4e1e20440f

# gluster volume rebalance <volume_name> status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             7             0             0            completed        0:00:01
<hostname1>                0        0Bytes             5             0             0            completed        0:00:01
...

The same NA status is displayed on volume details page on RHGSWA UI.


Version-Release number of selected component (if applicable):
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-ui-1.5.4-4.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-5.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-4.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-5.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-node-agent-1.5.4-5.el7rhgs.noarch
tendrl-notifier-1.5.4-3.el7rhgs.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create volume (e.g. arbiter volume)
2. Start rebalance
3. Check Rebalance panel in Grafana, and rebalance status on Volume Details page

Actual results:
Rebalance panel shows NA as last rebalance status, the same is displayed on Volume Details page

Expected results:
Rebalance status should correspond to the status get by
gluster volume rebalance <volume_name> status

Additional info:
Rebalance takes no time as there are no data or very few data loaded.

Comment 3 Lubos Trilety 2017-11-28 13:29:44 UTC
Tested with:
tendrl-gluster-integration-1.5.4-6.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-8.el7rhgs.noarch

It was not working for me. There's still NA after more than 20 minutes of waiting.

Comment 7 Lubos Trilety 2017-12-05 14:36:58 UTC
Tested with:
tendrl-gluster-integration-1.5.4-8.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch

It was not working for me if the volume was arbiter. Interestingly when the volume is disperse RHGSWA shows the status as it should.

Comment 8 Nishanth Thomas 2017-12-05 18:48:15 UTC
We want to have a look at your setup.
Can you run 'gluster volume info' on the cluster where you are trying to test this scenario and paste the output here.

What I think is you are trying to run rebalance on an invalid volume type. Just wanted to confirm that before we do any debugging.

Comment 9 Lubos Trilety 2017-12-06 08:37:29 UTC
(In reply to Nishanth Thomas from comment #8)
> We want to have a look at your setup.
> Can you run 'gluster volume info' on the cluster where you are trying to
> test this scenario and paste the output here.
> 
> What I think is you are trying to run rebalance on an invalid volume type.
> Just wanted to confirm that before we do any debugging.

OK, makes sense. Here's what I get:
# gluster volume info
 
Volume Name: volume_beta_arbiter_2_plus_1x2
Type: Distributed-Replicate
Volume ID: 30fc5ce2-8c10-4d28-b7f9-8a3126ef5ff8
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x (2 + 1) = 18
Transport-type: tcp
Bricks:
Brick1: <hostname1>:/mnt/brick_beta_arbiter_1/1
Brick2: <hostname2>:/mnt/brick_beta_arbiter_1/1
Brick3: Mhostname3>:/mnt/brick_beta_arbiter_1/1 (arbiter)
Brick4: <hostname4>:/mnt/brick_beta_arbiter_1/1
Brick5: <hostname5>:/mnt/brick_beta_arbiter_1/1
Brick6: <hostname6>:/mnt/brick_beta_arbiter_1/1 (arbiter)
Brick7: <hostname1>:/mnt/brick_beta_arbiter_2/2
Brick8: <hostname2>:/mnt/brick_beta_arbiter_2/2
Brick9: Mhostname3>:/mnt/brick_beta_arbiter_2/2 (arbiter)
Brick10: <hostname4>:/mnt/brick_beta_arbiter_2/2
Brick11: <hostname5>:/mnt/brick_beta_arbiter_2/2
Brick12: <hostname6>:/mnt/brick_beta_arbiter_2/2 (arbiter)
Brick13: <hostname1>:/mnt/brick_beta_arbiter_3/3
Brick14: <hostname2>:/mnt/brick_beta_arbiter_3/3
Brick15: Mhostname3>:/mnt/brick_beta_arbiter_3/3 (arbiter)
Brick16: <hostname4>:/mnt/brick_beta_arbiter_3/3
Brick17: <hostname5>:/mnt/brick_beta_arbiter_3/3
Brick18: <hostname6>:/mnt/brick_beta_arbiter_3/3 (arbiter)
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
transport.address-family: inet
nfs.disable: on

Comment 10 Bala Konda Reddy M 2017-12-06 09:24:20 UTC
(In reply to Lubos Trilety from comment #7)
> Tested with:
> tendrl-gluster-integration-1.5.4-8.el7rhgs.noarch
> tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch
> 
> It was not working for me if the volume was arbiter. Interestingly when the
> volume is disperse RHGSWA shows the status as it should.

Lubos I tried the same scenario on the arbiter volume.

gluster vol info ssssssh
 
Volume Name: ssssssh
Type: Distributed-Replicate
Volume ID: 66c88f57-5a05-4b15-aeb6-0412b225cf8e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: dhcp42-119.lab.eng.blr.redhat.com:/gluster/brick10/first
Brick2: dhcp42-129.lab.eng.blr.redhat.com:/gluster/brick10/second
Brick3: dhcp42-127.lab.eng.blr.redhat.com:/gluster/brick10/third (arbiter)
Brick4: dhcp42-125.lab.eng.blr.redhat.com:/gluster/brick10/fourth
Brick5: dhcp42-129.lab.eng.blr.redhat.com:/gluster/brick3/fifth
Brick6: dhcp42-127.lab.eng.blr.redhat.com:/gluster/brick3/sixth (arbiter)
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: disable
nfs-ganesha: disable


I am able to see the information on grafana dashboard for rebalance.

Steps I performed:
1. Created arbiter volume with 6 bricks.2*(2+1)
2. Started the rebalance on one of the storage node.
3. I am able to see the information on the dashboard for rebalance

Am I missing something here.

Please find the attachment helpful

Comment 11 Bala Konda Reddy M 2017-12-06 09:25:14 UTC
Created attachment 1363571 [details]
Rebalance status as completed for the volume ssssssh(arbiter volume)

Comment 12 Lubos Trilety 2017-12-06 09:52:14 UTC
Hmm, it seems it doesn't matter on the volume type then, but on the speed of rebalance action. Mine was very quick as there were no data. I completely did not see 'In Progress' state. This state I could see, when I tried the same scenario with disperse volume.

# gluster volume rebalance volume_beta_arbiter_2_plus_1x2 status
       Node Rebalanced-files          size   ...       status  run time in h:m:s
  ---------      -----------   -----------   ... ------------     --------------
  localhost                0        0Bytes   ...    completed        0:00:01
<hostname1>                0        0Bytes   ...    completed        0:00:01
<hostname2>                0        0Bytes   ...    completed        0:00:01
<hostname4>                0        0Bytes   ...    completed        0:00:01
<hostname5>                0        0Bytes   ...    completed        0:00:01
<hostname6>                0        0Bytes   ...    completed        0:00:01

Comment 13 Nishanth Thomas 2017-12-06 10:07:01 UTC
Based on https://bugzilla.redhat.com/show_bug.cgi?id=1516876#c10, moving the bug back to ON_QA. In development setup as well we see the results as expected.

Also make sure that the the setup meets the requirements specified at https://github.com/Tendrl/documentation/wiki/Tendrl-release-v1.5.4-(install-guide)#tendrl-server-system-requirements

Comment 14 Lubos Trilety 2017-12-06 11:01:48 UTC
Bala do you create the volume before RHGSWA install or after? I have the volume present in gluster before RHGSWA install. BTW I checked it and found that rebalance status is different from beginning. When I had disperse volume prepared the status was 'Not started' when I had arbiter volume prepared the status was 'NA'.

Comment 15 Bala Konda Reddy M 2017-12-06 12:48:30 UTC
Lubos i created volume after RHGSWA install. I haven't tried the scenario which you mentioned and i feel like it doesn't make any different. correct me if i am wrong.

I am able to see information like Not started on the rebalance panel.

Please find the attachment helpful.

Comment 16 Bala Konda Reddy M 2017-12-06 12:50:24 UTC
Created attachment 1363649 [details]
Rebalance status as not started for the volume arbiter which is just started (arbiter volume)

Comment 17 Lubos Trilety 2017-12-06 13:10:18 UTC
(In reply to Bala Konda Reddy M from comment #15)
> Lubos i created volume after RHGSWA install. I haven't tried the scenario
> which you mentioned and i feel like it doesn't make any different. correct
> me if i am wrong.
> 
> I am able to see information like Not started on the rebalance panel.
> 
> Please find the attachment helpful.

I thought so too, but when I created a new arbiter volume the rebalance status is correct. That said it makes a difference if it is created before or after.

Comment 18 Lubos Trilety 2017-12-06 13:14:20 UTC
Created attachment 1363662 [details]
rebalance status

volume_beta_arbiter_2_plus_1x2 arbiter volume created before RHGSWA is installed
volume_gamma_arbiter_2_plus_1x2 arbiter volume created after RHGSWA is installed and cluster imported

Comment 20 Bala Konda Reddy M 2017-12-06 13:51:00 UTC
Created attachment 1363680 [details]
On the volumes tab, I am able to see NA and Not started respectively

Comment 23 Nishanth Thomas 2017-12-12 02:00:01 UTC
Please test with the latest builds.

Comment 24 Lubos Trilety 2017-12-12 08:20:20 UTC
Tested with:
tendrl-gluster-integration-1.5.4-14.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch

Working properly, rebalance status is displayed on Grafana dashboard.

Comment 26 errata-xmlrpc 2017-12-18 04:37:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3478


Note You need to log in before you can comment on or make changes to this bug.