Bug 1039982 - rest: start rebalance while migration in progress prevents volume from stopping
Summary: rest: start rebalance while migration in progress prevents volume from stopping
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhsc-sdk
Version: unspecified
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: ---
: RHGS 2.1.2
Assignee: Shubhendu Tripathi
QA Contact: Dustin Tsang
URL:
Whiteboard:
Depends On:
Blocks: 1035040
TreeView+ depends on / blocked
 
Reported: 2013-12-10 13:04 UTC by Dustin Tsang
Modified: 2015-07-13 04:39 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: If there is a bigger file in migration, even if stop migration is executed the activity gets completed only after completion of the migration of the big file. Consequence: Volume cannot be stopped immediately after stopping rebalance when large file migration is in progress. Workaround (if any): NA Result: NA
Clone Of:
Environment:
Last Closed: 2013-12-13 11:10:34 UTC
Target Upstream Version:


Attachments (Terms of Use)
test automation log (1.23 MB, text/plain)
2013-12-10 13:04 UTC, Dustin Tsang
no flags Details

Description Dustin Tsang 2013-12-10 13:04:38 UTC
Created attachment 834735 [details]
test automation log

Description of problem:

This is an issue with the rest api: Cannot stop volume after s•tarting rebalance while migration in progress.

test automation log attached

Version-Release number of selected component (if applicable):
rhsc-cb10

How reproducible:
intermittent 

Steps to Reproduce:
1. create a distributed volume with data on each brick
2. start migration on a single brick
3. start rebalance on the volume
=> rebalance fails as expected
4. call stop migration 
=> stop migration succeeds
5. stop volume via rest api

Actual results:
HTTP 400 received

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>[\n]"
<action>[\n]"
    <status>[\n]"
        <state>failed</state>[\n]"
    </status>[\n]"
    <fault>[\n]"
        <reason>Operation Failed</reason>[\n]"
        <detail>[volume stop failed[\n]"
error: staging failed on 10.14.16.158. error: rebalance session is in progress for the volume 'rebalwhilemigration'[\»
return code: -1]</detail>[\n]"
    </fault>[\n]"
</action>[\n]"


Running 'gluster vol status' on a gluster host before shows now tasks in progress:
Status of volume: rebalwhilemigration
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick latest-a:/tmp/201312100720580051942637805		49152	Y	11012
Brick latest-b:/tmp/20131210072058976850249563		49152	Y	7315
NFS Server on localhost					2049	Y	11024
NFS Server on latest-b					2049	Y	11679
 
Task Status of Volume rebalwhilemigration
------------------------------------------------------------------------------
There are no active volume tasks



Expected results:

stop succeeds


Additional info:

Comment 2 Dusmant 2013-12-10 17:22:42 UTC
It just might be a timing issue : some file is in migration, even though gluster says the migration stopped. We might need to investigate this. 

If the file in migration is completed and then this is tried, this might work out fine. If that's confirmed by Dustin, then we will mark it CLOSED.

Comment 3 Sahina Bose 2013-12-11 12:19:28 UTC
Dustin,

please check if rebalance process is running on the nodes when you get this error.

thanks!

Comment 4 Dustin Tsang 2013-12-11 13:40:01 UTC
Hi Sahina,

Running `Gluster vol status` before stopping the volume shows that there are no tasks in progress.

Comment 5 Dustin Tsang 2013-12-11 16:56:42 UTC
Dusmant, 

I don't think this bug should be closed out even if I put a long delay between steps 4 and 5. Looks like there are a few issues occurring in this bug:

* One issue is that the error message is reporting rebalance is in progress when it should read migration is in progress.

* stop migration does not stop migration immediately even though stop migration is not an asynchronous task.

Comment 6 Sahina Bose 2013-12-12 05:31:26 UTC
Dustin,

Regarding your point 2 - 
Gluster vol status will not show stopped tasks. So though a task is stopped, it will only stop once the migration of file that is in progress is completed. The only way to know if rebalance is finished for now, is to grep for rebalance process on the node where it was stopped.

Regarding point 1 - 
Gluster treats migration of data during rebalance and that during remove-brick as rebalancing data. And the error you see is one reported from gluster.

Comment 7 Dusmant 2013-12-13 11:10:34 UTC
Dustin,
    This is the expected behaviour from Gluster and as you and Sahina discussed, it's not a bug as such. Hence moving it to CLOSED state.

We are going to document the behaviour of Gluster and it's impact on RHSC through the following bug 1022955

Comment 8 Dustin Tsang 2014-01-23 13:17:53 UTC
I believe that both point2 and point 1 need to be documented. I especially think point 2 needs to be documented because I don't believe the user will expect that status of steps and jobs in rhsc do not reflect gluster's real status.

Comment 9 Dustin Tsang 2014-01-23 13:22:06 UTC
^ was hoping documentation would be specific to the rhsc SDK.


Note You need to log in before you can comment on or make changes to this bug.