Bug 1022996

Summary: [RHSC] - Monitoring stop rebalance from CLI does not work.
Product: Red Hat Gluster Storage Reporter: RamaKasturi <knarra>
Component: rhscAssignee: Sahina Bose <sabose>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: urgent Docs Contact:
Priority: high    
Version: 2.1CC: dpati, dtsang, mmahoney, pprakash, rhs-bugs, ssampat
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 2.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cb10 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-25 07:44:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1040303    
Bug Blocks:    
Attachments:
Description Flags
Attaching the screenshot
none
Attaching the screenshot none

Description RamaKasturi 2013-10-24 12:31:40 UTC
Description of problem:
Monitoring stop rebalance from CLI does not work.

Version-Release number of selected component (if applicable):
rhsc-2.1.2-0.21.beta1.el6_4.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a distribute volume and start it.
2. start rebalance on the volume.
3. Once rebalance is started go to CLI and run the command "gluster vol rebalance <volName> stop"

Actual results:
When rebalance is stopped from CLI, the following happens.

1) Status dialog gets status as aborted.
2) Stop Rebalance button does not get disabled.
3) Rebalance icon does not change to rebalance stopped.
4) Task pane gets hung.
5) stop button is enabled in the drop down menu of activities column.
6) stopping rebalance from the UI suceeds.

Expected results:
When rebalance is stopped from CLI, the following shoudl happen.

1) Stop Rebalance button should get disabled.
2) Rebalance icon should change to rebalance stopped.
3) Tasks pane should execute the task properly.
4) stop button should get disabled in the drop down menu of activities column.
5) stopping rebalance from UI should not suceed.

Additional info:

Comment 2 RamaKasturi 2013-10-24 13:09:34 UTC
Please find the sosreports in the following link:

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1022996/

Comment 3 RamaKasturi 2013-10-24 13:10:05 UTC
Following is seen in from the gluster CLI.

[root@localhost vdsm]# gluster volume rebalance vol_dis stop
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes            50             0             0              stopped              13.00
                            10.70.37.140                0        0Bytes             1             0             0              stopped              13.00
                             10.70.37.43                2       253.0KB            24             0             0              stopped              14.00
                            10.70.37.108                0        0Bytes            73             0             0              stopped              13.00
volume rebalance: vol_dis: success: rebalance process may be in the middle of a file migration.
The process will be fully stopped once the migration of the file is complete.
Please check rebalance process for completion before doing any further brick related tasks on the volume.
[root@localhost vdsm]# gluster volume status vol_dis tasks
Task Status of Volume vol_dis
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@localhost vdsm]# gluster volume status vol_dis tasks
Task Status of Volume vol_dis
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@localhost vdsm]#

Comment 4 Dusmant 2013-10-25 06:41:16 UTC
We might require a fix from GlusterFS also, along with our fix...

Comment 5 Sahina Bose 2013-10-30 09:21:12 UTC
Will add fix in engine to end Job with status "UNKNOWN" when gluster does not return the task information.

Comment 6 Sahina Bose 2013-11-06 09:55:33 UTC
Added code to end orphan tasks (tasks that gluster is no longer aware of) with status UNKNOWN.

Comment 7 Sahina Bose 2013-11-18 08:56:40 UTC
Added code as per Comment 5

Comment 8 RamaKasturi 2013-11-19 06:31:19 UTC
Able to reproduce the issue in cb8. steps to reproduce:

1. Create a distribute volume and start it.
2. start rebalance on the volume.
3. Once rebalance is started go to CLI and run the command "gluster vol rebalance <volName> stop"

Actual results:
When rebalance is stopped from CLI, the following happens.

1) Status dialog gets status as aborted.
2) Stop Rebalance button does not get disabled.
3) Rebalance icon does not change to rebalance stopped.
4) Task pane gets hung.
5) stop button is enabled in the drop down menu of activities column.
6) stopping rebalance from the UI suceeds.

please find the sosreports in the below link:

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1022996/

Comment 9 Sahina Bose 2013-11-21 09:23:07 UTC
The issue was that cleaning up of tasks is not called when there is some other cluster that returns error on getting task list from gluster. In this case, the Default cluster failed as there were no UP servers in it.

Posted a patch to make sure the task clean up is done for operational clusters.

Comment 10 RamaKasturi 2013-12-03 12:34:45 UTC
As part of the fix provided, the following were the expected results.

1) Icon in the activities column should have the unknown symbol/icon with the drop down enabled.

2) Tasks pane should have the task marked as "UNKNOWN"

The following are the results seen while verifying the bug.

1) Icon in the activities column gets updated to Rebalance Stopped Icon and the task gets updated to aborted.

2) Icon in the activities column disappears, and only the drop down is present.
Task gets updated as INPROGRESS and the volume name comes as <UNKNOWN>. Attaching the screen shot for the same.

Both the above steps happens alternatively.

Comment 11 RamaKasturi 2013-12-03 12:35:21 UTC
Created attachment 832037 [details]
Attaching the screenshot

Comment 12 Sahina Bose 2013-12-05 10:17:46 UTC
The issue seems to be that when you stop rebalance and start it again - the second time it fails to start. The same behaviour is observed from gluster CLI.

From engine log:
2013-12-05 21:15:22,928 INFO  [org.ovirt.engine.core.vdsbroker.gluster.StartRebalanceGlusterVolumeVDSCommand] (pool-4-thread-48) [528dd7f3] FINISH, StartRebalanceGlusterVolumeVDSCommand, return: org.ovirt.engine.core.common.asynctasks.gluster.GlusterAsyncTask@67e12cdf, log id: 5e125684
2013-12-05 21:15:22,964 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-48) [528dd7f3] Correlation ID: 528dd7f3, Job ID: 55fb6cdb-8ce3-43f6-b769-56bab47e2e37, Call Stack: null, Custom Event ID: -1, Message: Could not start Gluster Volume vol_dis rebalance.

From vdsm log:
Thread-89862::ERROR::2013-12-05 21:15:25,524::BindingXMLRPC::1000::vds::(wrapper) vdsm exception occured
Traceback (most recent call last):
  File "/usr/share/vdsm/BindingXMLRPC.py", line 989, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/gluster/api.py", line 53, in wrapper
    rv = func(*args, **kwargs)
  File "/usr/share/vdsm/gluster/api.py", line 125, in volumeRebalanceStart
    force)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in glusterVolumeRebalanceStart
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod
    raise convert_to_error(kind, result)
GlusterVolumeRebalanceStartFailedException: Volume rebalance start failed
error: Rebalance on vol_dis is already started
return code: -1

Comment 13 Dusmant 2013-12-09 15:48:10 UTC
We have to release note this bug.

Comment 14 Sahina Bose 2013-12-10 05:58:56 UTC
The issue reported in Comment 10, where the Rebalance activity has a Stopped icon is due to the error where rebalance could not be started (because the earlier stop rebalance has not completed)

This is not related to monitoring, so please log a separate bug for this so that it can be release noted.

Moving this bug to ON_QA for verification.

Comment 15 RamaKasturi 2013-12-11 07:37:39 UTC
Verified in cb10. The following happens when rebalance is stopped from gluster cli.

1) In the volume activities column icon gets disappeared , mouse hovering on the icon shows the text as "unknown" and drop down with status enabled. Icon should get updated as "?". Looged a bug for the same.

https://bugzilla.redhat.com/show_bug.cgi?id=1035601

2) Tasks pane does not get updated with the correct status. Logged a bug for that.
https://bugzilla.redhat.com/show_bug.cgi?id=1040303

3) If status dialog is opened , before or after stop command is issued , activities column gets updated with rebalance stopped icon. Logged a bug for this.

https://bugzilla.redhat.com/show_bug.cgi?id=1040310

Comment 16 RamaKasturi 2013-12-11 07:38:52 UTC
Will mark this bug verified only after the following bugs are fixed.

https://bugzilla.redhat.com/show_bug.cgi?id=1040303

https://bugzilla.redhat.com/show_bug.cgi?id=1040303

Comment 17 RamaKasturi 2013-12-11 08:20:25 UTC
Will mark this bug verified only after the following bugs are fixed.

https://bugzilla.redhat.com/show_bug.cgi?id=1040303

https://bugzilla.redhat.com/show_bug.cgi?id=1035601

Comment 18 RamaKasturi 2013-12-19 11:21:33 UTC
verified and works fine with cb12 build rhsc-2.1.2-0.28.beta.el6_5.noarch

When rebalance is stopped from CLI, an '?' icon appears in the volume activities column and tasks pane gets updated with a task with 'x' mark and expanding the task pane gives the message "Rebalancing gluster volume <volName> in cluster <clusterName> (UNKNOWN)

An event message gets displayed saying "Could not find information for rebalance on volume <volName> of Cluster <clusterName> from CLI. Marking it as unknown.

Attaching the screenshot for the same.

Comment 19 RamaKasturi 2013-12-19 11:26:32 UTC
Created attachment 838935 [details]
Attaching the screenshot

Comment 22 errata-xmlrpc 2014-02-25 07:44:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html