1027094 – The output of command "gluster volume status all tasks --xml" and "gluster volume remove-brick <vol> <brick> status --xml" not in agreement

Bug 1027094 - The output of command "gluster volume status all tasks --xml" and "gluster volume remove-brick <vol> <brick> status --xml" not in agreement

Summary: The output of command "gluster volume status all tasks --xml" and "gluster vo...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Assignee:	Kaushal
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1015659 1020325 1020331 1022511
TreeView+	depends on / blocked

Reported:	2013-11-06 07:04 UTC by Kaushal
Modified:	2014-11-11 08:24 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.6.0beta1
Clone Of:	1020331
Environment:
Last Closed:	2014-11-11 08:24:22 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kaushal 2013-11-06 07:04:02 UTC

+++ This bug was initially created as a clone of Bug #1020331 +++

Description of problem:
VDSM has a verb to get the list tasks on a volume. Internally it -

1. Uses the command "gluster volume remove-brick <vol> <brick> status --xml" to get the status of remove brick action on a brick.
2. uses the command "gluster volume status all tasks --xml" to get the overall status of the tasks.

If there are two hosts in a cluster and a remove brick action is performed while getting the overall status of the tasks, the output of the command "gluster volume status all tasks --xml" is different on the two hosts -

Version-Release number of selected component (if applicable):


How reproducible:
Almost always

Steps to Reproduce:
1. Make sure two hosts are present in peer group
2. Create a distributed volume with 2 bricks (brick dirs from server-1 only)
3. Populate the volume with data
4. Start remove brick for one of the bricks on the volume
5. Individually run the command "gluster volume status all tasks --xml"

Actual results:
The status values returned on the hosts differ

Expected results:
Both the hosts should return the same status value

--- Additional comment from Ramesh N on 2013-10-21 17:09:15 IST ---

Same scenario is applicable for volume rebalance task.

Comment 1 Anand Avati 2013-11-06 07:20:53 UTC

REVIEW: http://review.gluster.org/6230 (glusterd: Aggregate tasks status in 'volume status [tasks]') posted (#1) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2013-11-06 12:14:19 UTC

REVIEW: http://review.gluster.org/6230 (glusterd: Aggregate tasks status in 'volume status [tasks]') posted (#2) for review on master by Kaushal M (kaushal)

Comment 3 Anand Avati 2013-11-07 08:37:21 UTC

REVIEW: http://review.gluster.org/6230 (glusterd: Aggregate tasks status in 'volume status [tasks]') posted (#3) for review on master by Kaushal M (kaushal)

Comment 4 Anand Avati 2013-11-07 12:12:35 UTC

REVIEW: http://review.gluster.org/6230 (glusterd: Aggregate tasks status in 'volume status [tasks]') posted (#4) for review on master by Kaushal M (kaushal)

Comment 5 Anand Avati 2013-11-08 06:04:36 UTC

REVIEW: http://review.gluster.org/6230 (glusterd: Aggregate tasks status in 'volume status [tasks]') posted (#5) for review on master by Kaushal M (kaushal)

Comment 6 Anand Avati 2013-12-04 11:03:32 UTC

REVIEW: http://review.gluster.org/6230 (glusterd: Aggregate tasks status in 'volume status [tasks]') posted (#7) for review on master by Kaushal M (kaushal)

Comment 7 Anand Avati 2013-12-04 21:41:03 UTC

COMMIT: http://review.gluster.org/6230 committed in master by Anand Avati (avati) 
------
commit b6c835282de500dff69e68bc4aebd3700c7388d0
Author: Kaushal M <kaushal>
Date:   Wed Oct 30 18:25:39 2013 +0530

    glusterd: Aggregate tasks status in 'volume status [tasks]'
    
    Previously, glusterd used to just send back the local status of a task
    in a 'volume status [tasks]' command. As the rebalance operation is
    distributed and asynchronus, this meant that different peers could give
    different status values for a rebalance or remove-brick task.
    
    With this patch, all the peers will send back the tasks status as a part
    of the 'volume status' commit op, and the origin peer will aggregate
    these to arrive at a final status for the task.
    
    The aggregation is only done for rebalance or remove-brick tasks. The
    replace-brick task will have the same status on all the peers (see
    comment in glusterd_volume_status_aggregate_tasks_status() for more
    information) and need not be aggregated.
    
    The rebalance process has 5 states,
     NOT_STARTED - rebalance process has not been started on this node
     STARTED - rebalance process has been started and is still running
     STOPPED - rebalance process was stopped by a 'rebalance/remove-brick
               stop' command
     COMPLETED - rebalance process completed successfully
     FAILED - rebalance process failed to complete successfully
    The aggregation is done using the following precedence,
     STARTED > FAILED > STOPPED > COMPLETED > NOT_STARTED
    
    The new changes make the 'volume status tasks' command a distributed
    command as we need to get the task status from all peers.
    
    The following tests were performed,
    - Start a remove-brick task and do a status command on a peer which
      doesn't have the brick being removed. The remove-brick status was
      given correctly as 'in progress' and 'completed', instead of 'not
      started'
    - Start a rebalance task, run the status command. The status moved to
      'completed' only after rebalance completed on all nodes.
    
    Also, change the CLI xml output code for rebalance status to use the
    same algorithm for status aggregation.
    
    Change-Id: Ifd4aff705aa51609a612d5a9194acc73e10a82c0
    BUG: 1027094
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/6230
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>

Comment 8 Anand Avati 2013-12-23 08:59:23 UTC

REVIEW: http://review.gluster.org/6562 (glusterd: Aggregate tasks status in 'volume status [tasks]') posted (#1) for review on release-3.5 by Krishnan Parthasarathi (kparthas)

Comment 9 Anand Avati 2013-12-23 14:56:42 UTC

COMMIT: http://review.gluster.org/6562 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit 9d592246d6121aa38cd6fb6a875be4473d4979c8
Author: Krishnan Parthasarathi <kparthas>
Date:   Mon Dec 23 14:07:45 2013 +0530

    glusterd: Aggregate tasks status in 'volume status [tasks]'
    
            Backport of http://review.gluster.org/6230
    Previously, glusterd used to just send back the local status of a task
    in a 'volume status [tasks]' command. As the rebalance operation is
    distributed and asynchronus, this meant that different peers could give
    different status values for a rebalance or remove-brick task.
    
    With this patch, all the peers will send back the tasks status as a part
    of the 'volume status' commit op, and the origin peer will aggregate
    these to arrive at a final status for the task.
    
    The aggregation is only done for rebalance or remove-brick tasks. The
    replace-brick task will have the same status on all the peers (see
    comment in glusterd_volume_status_aggregate_tasks_status() for more
    information) and need not be aggregated.
    
    The rebalance process has 5 states,
     NOT_STARTED - rebalance process has not been started on this node
     STARTED - rebalance process has been started and is still running
     STOPPED - rebalance process was stopped by a 'rebalance/remove-brick
               stop' command
     COMPLETED - rebalance process completed successfully
     FAILED - rebalance process failed to complete successfully
    The aggregation is done using the following precedence,
     STARTED > FAILED > STOPPED > COMPLETED > NOT_STARTED
    
    The new changes make the 'volume status tasks' command a distributed
    command as we need to get the task status from all peers.
    
    The following tests were performed,
    - Start a remove-brick task and do a status command on a peer which
      doesn't have the brick being removed. The remove-brick status was
      given correctly as 'in progress' and 'completed', instead of 'not
      started'
    - Start a rebalance task, run the status command. The status moved to
      'completed' only after rebalance completed on all nodes.
    
    Also, change the CLI xml output code for rebalance status to use the
    same algorithm for status aggregation.
    
    Change-Id: Ifd4aff705aa51609a612d5a9194acc73e10a82c0
    BUG: 1027094
    Signed-off-by: Krishnan Parthasarathi <kparthas>
     http://review.gluster.org/6230
    Reviewed-on: http://review.gluster.org/6562
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 10 Niels de Vos 2014-09-22 12:32:31 UTC

A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 11 Niels de Vos 2014-11-11 08:24:22 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users

Note You need to log in before you can comment on or make changes to this bug.