Bug 1040809

Summary: 'gluster volume status' command fails on a server after glusterd is brought down and back up, while remove-brick is in progress
Product: [Community] GlusterFS Reporter: Kaushal <kaushal>
Component: glusterdAssignee: Kaushal <kaushal>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: dpati, dtsang, gluster-bugs, jbyers, knarra, mmahoney, pprakash, rwheeler, sasundar, sdharane, shaines, ssampat, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.5.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1027699 Environment:
Last Closed: 2014-04-17 11:52:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1027699    

Description Kaushal 2013-12-12 07:10:04 UTC
+++ This bug was initially created as a clone of Bug #1027699 +++

Description of problem:
-----------------------

In a single-node cluster, when remove-brick is in progress, glusterd is killed and then brought back up. Following this, 'gluster volume status' command fails on the node - 

[root@rhs ~]# gluster v status test_dis 
Commit failed on localhost. Please check the log file for more details.

The following errors are seen in the glusterd logs - 

[2013-11-07 03:02:59.984190] I [glusterd-handler.c:3498:__glusterd_handle_status_volume] 0-management: Received status volume req for volume test_dis
[2013-11-07 03:02:59.984708] E [glusterd-op-sm.c:1973:_add_remove_bricks_to_dict] 0-management: Failed to get brick count
[2013-11-07 03:02:59.984737] E [glusterd-op-sm.c:2037:_add_task_to_dict] 0-management: Failed to add remove bricks to dict
[2013-11-07 03:02:59.984753] E [glusterd-op-sm.c:2122:glusterd_aggregate_task_status] 0-management: Failed to add task details to dict
[2013-11-07 03:02:59.984768] E [glusterd-syncop.c:993:gd_commit_op_phase] 0-management: Commit of operation 'Volume Status' failed on localhost    

Version-Release number of selected component (if applicable):
glusterfs 3.4.0.35.1u2rhs

How reproducible:
Always

Steps to Reproduce:
1. Create a distribute volume with two bricks, start it, fuse mount it and create some data on the mount point.
2. Start remove-brick of one of the bricks.
3. While remove-brick is in progress, kill glusterd and start it again.
4. Check volume status - 
# gluster volume status

Actual results:
The command fails with the following message - 

Commit failed on localhost. Please check the log file for more details.

Expected results:
The command should not fail.

Additional info:

--- Additional comment from Shruti Sampat on 2013-11-07 15:48:43 IST ---



--- Additional comment from Dusmant on 2013-11-07 15:50:13 IST ---

Because of this problem, RHSC does not update the icon and task does not get updated

Comment 1 Anand Avati 2013-12-12 07:19:12 UTC
REVIEW: http://review.gluster.org/6492 (glusterd: Save/restore/sync rebalance dict) posted (#1) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2013-12-13 05:49:17 UTC
REVIEW: http://review.gluster.org/6492 (glusterd: Save/restore/sync rebalance dict) posted (#2) for review on master by Kaushal M (kaushal)

Comment 3 Anand Avati 2013-12-16 13:02:28 UTC
COMMIT: http://review.gluster.org/6492 committed in master by Vijay Bellur (vbellur) 
------
commit f502e28e8b416f80bd9506ac204948681610b305
Author: Kaushal M <kaushal>
Date:   Tue Dec 10 11:34:06 2013 +0530

    glusterd: Save/restore/sync rebalance dict
    
    A dictionary was added to store additional information of a rebalance
    process, like the bricks being removed in case of a rebalance started
    by remove-brick. This dictionary wasn't being stored/restored or synced
    during volume sync, leading to errors like a volume status command
    failing. These issues have been fixed in this patch. The rebalance dict
    is now stored/restored and also exported/imported during volume sync.
    
    Also, this makes sure that the rebalance dict is only create on
    remove-brick start. This adds a bricks decommissioned status to the
    information imported/exported during volume sync.
    
    Change-Id: I56fed23dc2de80a96648055fe705e9c3ffd55227
    BUG: 1040809
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/6492
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 4 Anand Avati 2013-12-17 06:00:25 UTC
REVIEW: http://review.gluster.org/6524 (glusterd: Save/restore/sync rebalance dict) posted (#1) for review on release-3.5 by Kaushal M (kaushal)

Comment 5 Anand Avati 2013-12-23 08:59:47 UTC
REVIEW: http://review.gluster.org/6565 (glusterd: Save/restore/sync rebalance dict) posted (#1) for review on release-3.5 by Krishnan Parthasarathi (kparthas)

Comment 6 Anand Avati 2013-12-23 14:58:11 UTC
COMMIT: http://review.gluster.org/6565 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit 3ea6954c120c968ec3b16916cf4fc304b9b4517a
Author: Krishnan Parthasarathi <kparthas>
Date:   Mon Dec 23 14:07:53 2013 +0530

    glusterd: Save/restore/sync rebalance dict
    
            Backport of http://review.gluster.org/6492
    
    A dictionary was added to store additional information of a rebalance
    process, like the bricks being removed in case of a rebalance started
    by remove-brick. This dictionary wasn't being stored/restored or synced
    during volume sync, leading to errors like a volume status command
    failing. These issues have been fixed in this patch. The rebalance dict
    is now stored/restored and also exported/imported during volume sync.
    
    Also, this makes sure that the rebalance dict is only create on
    remove-brick start. This adds a bricks decommissioned status to the
    information imported/exported during volume sync.
    
    Change-Id: I56fed23dc2de80a96648055fe705e9c3ffd55227
    BUG: 1040809
    Signed-off-by: Kaushal M <kaushal>
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/6565
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 7 Anand Avati 2014-01-03 11:11:12 UTC
COMMIT: http://review.gluster.org/6524 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit 52130cb989d32aa302912d2d75e11f4041db2e72
Author: Kaushal M <kaushal>
Date:   Tue Dec 10 11:34:06 2013 +0530

    glusterd: Save/restore/sync rebalance dict
    
    A dictionary was added to store additional information of a rebalance
    process, like the bricks being removed in case of a rebalance started
    by remove-brick. This dictionary wasn't being stored/restored or synced
    during volume sync, leading to errors like a volume status command
    failing. These issues have been fixed in this patch. The rebalance dict
    is now stored/restored and also exported/imported during volume sync.
    
    Also, this makes sure that the rebalance dict is only create on
    remove-brick start. This adds a bricks decommissioned status to the
    information imported/exported during volume sync.
    
    BUG: 1040809
    Change-Id: I46cee3e4e34a1f3266a20aac43368854594f01bc
    Signed-off-by: Kaushal M <kaushal>
    Backport-of: http://review.gluster.org/6492
    Reviewed-on: http://review.gluster.org/6524
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 8 Niels de Vos 2014-04-17 11:52:20 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user