Bug 1038452 - When brick process is killed while remove-brick is in progress, the status of the remove-brick task is shown as as stopped
Summary: When brick process is killed while remove-brick is in progress, the status of...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Kaushal
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1028995
TreeView+ depends on / blocked
 
Reported: 2013-12-05 07:08 UTC by Kaushal
Modified: 2014-11-11 08:25 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.6.0beta1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1028995
Environment:
Last Closed: 2014-11-11 08:25:19 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kaushal 2013-12-05 07:08:52 UTC
+++ This bug was initially created as a clone of Bug #1028995 +++

Description of problem:
-----------------------
After starting remove-brick on a volume, the bricks were brought down. The status of the remove-brick operations is now shown as stopped.

[root@rhs ~]# gluster v status
Status of volume: test_dis
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.147:/rhs/brick1/b1                       N/A     N       N/A
Brick 10.70.37.147:/rhs/brick1/b2                       N/A     N       N/A
NFS Server on localhost                                 2049    Y       7094
 
Task Status of Volume test_dis
------------------------------------------------------------------------------
Task                 : Remove brick        
ID                   : b3b23f85-f5d5-4e48-a673-4c93a02177ad
Removed bricks:     
10.70.37.147:/rhs/brick1/b1
Status               : stopped             


IMO, brick processes going down, while remove-brick is in progress, should result in a failure in the remove-brick operation, and should not cause it to 'stop'. The status should be shown as failed, instead of stopped.

Version-Release number of selected component (if applicable):
glusterfs 3.4.0.35.1u2rhs

How reproducible:
Always

Steps to Reproduce:
1. Create a distribute volume with 2 bricks, start it, mount it and create data at the mount point.
2. Start remove-brick operation on one of the bricks.
3. Kill glusterfsd processes.
4. Check volume status.

Actual results:
The status of the remove-brick operation is shown as stopped.

Expected results:
The status of the remove-brick operation should be shown as failed, not stopped.

Additional info:
sosreports attached.

--- Additional comment from Shruti Sampat on 2013-11-11 18:23:31 IST ---

Find sosreport at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1028995/

--- Additional comment from Dusmant on 2013-11-11 18:51:52 IST ---

When brick processes are killed while remove-brick is going on, the status of the
remove-brick operation is shown as stopped, instead of failed. glusterfs now expects either a commit or stop of this operation, before another task can be started.

This causes RHSC engine to display the task as stopped. But, neither Commit nor Retain in the UI are enabled. So, a user using the Console can neither start a new task, nor commit/stop the previous task.

This is causing RHSC problem...

--- Additional comment from RamaKasturi on 2013-11-21 13:55:20 IST ---

The above happens with rebalance as well.

When brick processes is killed, rebalance status is shown as stopped , instead of failed.

Comment 1 Anand Avati 2013-12-05 07:17:37 UTC
REVIEW: http://review.gluster.org/6435 (dht: Set status to FAILED when rebalance stops due to brick going down) posted (#1) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2013-12-11 06:49:44 UTC
COMMIT: http://review.gluster.org/6435 committed in master by Vijay Bellur (vbellur) 
------
commit 6a163b22144a689cd89a6a605715959e654ea015
Author: Kaushal M <kaushal>
Date:   Thu Dec 5 11:16:55 2013 +0530

    dht: Set status to FAILED when rebalance stops due to brick going down
    
    Change-Id: I98da41342127b1690d887a5bc025e4c9dd504894
    BUG: 1038452
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/6435
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Shishir Gowda <gowda.shishir>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 3 Anand Avati 2013-12-23 08:59:30 UTC
REVIEW: http://review.gluster.org/6563 (dht: Set status to FAILED when rebalance stops due to brick going down) posted (#1) for review on release-3.5 by Krishnan Parthasarathi (kparthas)

Comment 4 Anand Avati 2013-12-23 14:57:01 UTC
COMMIT: http://review.gluster.org/6563 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit 48fb488818f82d5889a84ca36f489674c8557354
Author: Krishnan Parthasarathi <kparthas>
Date:   Mon Dec 23 14:07:47 2013 +0530

    dht: Set status to FAILED when rebalance stops due to brick going down
    
            Backport of http://review.gluster.org/6435
    
    Change-Id: I98da41342127b1690d887a5bc025e4c9dd504894
    BUG: 1038452
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/6563
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 5 Niels de Vos 2014-09-22 12:33:22 UTC
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 6 Niels de Vos 2014-11-11 08:25:19 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users


Note You need to log in before you can comment on or make changes to this bug.