Hide Forgot
Description of problem: Monitoring stop rebalance from CLI does not work. Version-Release number of selected component (if applicable): rhsc-2.1.2-0.21.beta1.el6_4.noarch How reproducible: Always Steps to Reproduce: 1. Create a distribute volume and start it. 2. start rebalance on the volume. 3. Once rebalance is started go to CLI and run the command "gluster vol rebalance <volName> stop" Actual results: When rebalance is stopped from CLI, the following happens. 1) Status dialog gets status as aborted. 2) Stop Rebalance button does not get disabled. 3) Rebalance icon does not change to rebalance stopped. 4) Task pane gets hung. 5) stop button is enabled in the drop down menu of activities column. 6) stopping rebalance from the UI suceeds. Expected results: When rebalance is stopped from CLI, the following shoudl happen. 1) Stop Rebalance button should get disabled. 2) Rebalance icon should change to rebalance stopped. 3) Tasks pane should execute the task properly. 4) stop button should get disabled in the drop down menu of activities column. 5) stopping rebalance from UI should not suceed. Additional info:
Please find the sosreports in the following link: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1022996/
Following is seen in from the gluster CLI. [root@localhost vdsm]# gluster volume rebalance vol_dis stop Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 50 0 0 stopped 13.00 10.70.37.140 0 0Bytes 1 0 0 stopped 13.00 10.70.37.43 2 253.0KB 24 0 0 stopped 14.00 10.70.37.108 0 0Bytes 73 0 0 stopped 13.00 volume rebalance: vol_dis: success: rebalance process may be in the middle of a file migration. The process will be fully stopped once the migration of the file is complete. Please check rebalance process for completion before doing any further brick related tasks on the volume. [root@localhost vdsm]# gluster volume status vol_dis tasks Task Status of Volume vol_dis ------------------------------------------------------------------------------ There are no active volume tasks [root@localhost vdsm]# gluster volume status vol_dis tasks Task Status of Volume vol_dis ------------------------------------------------------------------------------ There are no active volume tasks [root@localhost vdsm]#
We might require a fix from GlusterFS also, along with our fix...
Will add fix in engine to end Job with status "UNKNOWN" when gluster does not return the task information.
Added code to end orphan tasks (tasks that gluster is no longer aware of) with status UNKNOWN.
Added code as per Comment 5
Able to reproduce the issue in cb8. steps to reproduce: 1. Create a distribute volume and start it. 2. start rebalance on the volume. 3. Once rebalance is started go to CLI and run the command "gluster vol rebalance <volName> stop" Actual results: When rebalance is stopped from CLI, the following happens. 1) Status dialog gets status as aborted. 2) Stop Rebalance button does not get disabled. 3) Rebalance icon does not change to rebalance stopped. 4) Task pane gets hung. 5) stop button is enabled in the drop down menu of activities column. 6) stopping rebalance from the UI suceeds. please find the sosreports in the below link: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/rhsc/1022996/
The issue was that cleaning up of tasks is not called when there is some other cluster that returns error on getting task list from gluster. In this case, the Default cluster failed as there were no UP servers in it. Posted a patch to make sure the task clean up is done for operational clusters.
As part of the fix provided, the following were the expected results. 1) Icon in the activities column should have the unknown symbol/icon with the drop down enabled. 2) Tasks pane should have the task marked as "UNKNOWN" The following are the results seen while verifying the bug. 1) Icon in the activities column gets updated to Rebalance Stopped Icon and the task gets updated to aborted. 2) Icon in the activities column disappears, and only the drop down is present. Task gets updated as INPROGRESS and the volume name comes as <UNKNOWN>. Attaching the screen shot for the same. Both the above steps happens alternatively.
Created attachment 832037 [details] Attaching the screenshot
The issue seems to be that when you stop rebalance and start it again - the second time it fails to start. The same behaviour is observed from gluster CLI. From engine log: 2013-12-05 21:15:22,928 INFO [org.ovirt.engine.core.vdsbroker.gluster.StartRebalanceGlusterVolumeVDSCommand] (pool-4-thread-48) [528dd7f3] FINISH, StartRebalanceGlusterVolumeVDSCommand, return: org.ovirt.engine.core.common.asynctasks.gluster.GlusterAsyncTask@67e12cdf, log id: 5e125684 2013-12-05 21:15:22,964 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-48) [528dd7f3] Correlation ID: 528dd7f3, Job ID: 55fb6cdb-8ce3-43f6-b769-56bab47e2e37, Call Stack: null, Custom Event ID: -1, Message: Could not start Gluster Volume vol_dis rebalance. From vdsm log: Thread-89862::ERROR::2013-12-05 21:15:25,524::BindingXMLRPC::1000::vds::(wrapper) vdsm exception occured Traceback (most recent call last): File "/usr/share/vdsm/BindingXMLRPC.py", line 989, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/gluster/api.py", line 53, in wrapper rv = func(*args, **kwargs) File "/usr/share/vdsm/gluster/api.py", line 125, in volumeRebalanceStart force) File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> **kwargs) File "<string>", line 2, in glusterVolumeRebalanceStart File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod raise convert_to_error(kind, result) GlusterVolumeRebalanceStartFailedException: Volume rebalance start failed error: Rebalance on vol_dis is already started return code: -1
We have to release note this bug.
The issue reported in Comment 10, where the Rebalance activity has a Stopped icon is due to the error where rebalance could not be started (because the earlier stop rebalance has not completed) This is not related to monitoring, so please log a separate bug for this so that it can be release noted. Moving this bug to ON_QA for verification.
Verified in cb10. The following happens when rebalance is stopped from gluster cli. 1) In the volume activities column icon gets disappeared , mouse hovering on the icon shows the text as "unknown" and drop down with status enabled. Icon should get updated as "?". Looged a bug for the same. https://bugzilla.redhat.com/show_bug.cgi?id=1035601 2) Tasks pane does not get updated with the correct status. Logged a bug for that. https://bugzilla.redhat.com/show_bug.cgi?id=1040303 3) If status dialog is opened , before or after stop command is issued , activities column gets updated with rebalance stopped icon. Logged a bug for this. https://bugzilla.redhat.com/show_bug.cgi?id=1040310
Will mark this bug verified only after the following bugs are fixed. https://bugzilla.redhat.com/show_bug.cgi?id=1040303 https://bugzilla.redhat.com/show_bug.cgi?id=1040303
Will mark this bug verified only after the following bugs are fixed. https://bugzilla.redhat.com/show_bug.cgi?id=1040303 https://bugzilla.redhat.com/show_bug.cgi?id=1035601
verified and works fine with cb12 build rhsc-2.1.2-0.28.beta.el6_5.noarch When rebalance is stopped from CLI, an '?' icon appears in the volume activities column and tasks pane gets updated with a task with 'x' mark and expanding the task pane gives the message "Rebalancing gluster volume <volName> in cluster <clusterName> (UNKNOWN) An event message gets displayed saying "Could not find information for rebalance on volume <volName> of Cluster <clusterName> from CLI. Marking it as unknown. Attaching the screenshot for the same.
Created attachment 838935 [details] Attaching the screenshot
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html