Bug 1040809 - 'gluster volume status' command fails on a server after glusterd is brought down and back up, while remove-brick is in progress
Summary: 'gluster volume status' command fails on a server after glusterd is brought d...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Kaushal
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1027699
TreeView+ depends on / blocked
 
Reported: 2013-12-12 07:10 UTC by Kaushal
Modified: 2014-04-17 11:52 UTC (History)
14 users (show)

Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Clone Of: 1027699
Environment:
Last Closed: 2014-04-17 11:52:20 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kaushal 2013-12-12 07:10:04 UTC
+++ This bug was initially created as a clone of Bug #1027699 +++

Description of problem:
-----------------------

In a single-node cluster, when remove-brick is in progress, glusterd is killed and then brought back up. Following this, 'gluster volume status' command fails on the node - 

[root@rhs ~]# gluster v status test_dis 
Commit failed on localhost. Please check the log file for more details.

The following errors are seen in the glusterd logs - 

[2013-11-07 03:02:59.984190] I [glusterd-handler.c:3498:__glusterd_handle_status_volume] 0-management: Received status volume req for volume test_dis
[2013-11-07 03:02:59.984708] E [glusterd-op-sm.c:1973:_add_remove_bricks_to_dict] 0-management: Failed to get brick count
[2013-11-07 03:02:59.984737] E [glusterd-op-sm.c:2037:_add_task_to_dict] 0-management: Failed to add remove bricks to dict
[2013-11-07 03:02:59.984753] E [glusterd-op-sm.c:2122:glusterd_aggregate_task_status] 0-management: Failed to add task details to dict
[2013-11-07 03:02:59.984768] E [glusterd-syncop.c:993:gd_commit_op_phase] 0-management: Commit of operation 'Volume Status' failed on localhost    

Version-Release number of selected component (if applicable):
glusterfs 3.4.0.35.1u2rhs

How reproducible:
Always

Steps to Reproduce:
1. Create a distribute volume with two bricks, start it, fuse mount it and create some data on the mount point.
2. Start remove-brick of one of the bricks.
3. While remove-brick is in progress, kill glusterd and start it again.
4. Check volume status - 
# gluster volume status

Actual results:
The command fails with the following message - 

Commit failed on localhost. Please check the log file for more details.

Expected results:
The command should not fail.

Additional info:

--- Additional comment from Shruti Sampat on 2013-11-07 15:48:43 IST ---



--- Additional comment from Dusmant on 2013-11-07 15:50:13 IST ---

Because of this problem, RHSC does not update the icon and task does not get updated

Comment 1 Anand Avati 2013-12-12 07:19:12 UTC
REVIEW: http://review.gluster.org/6492 (glusterd: Save/restore/sync rebalance dict) posted (#1) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2013-12-13 05:49:17 UTC
REVIEW: http://review.gluster.org/6492 (glusterd: Save/restore/sync rebalance dict) posted (#2) for review on master by Kaushal M (kaushal)

Comment 3 Anand Avati 2013-12-16 13:02:28 UTC
COMMIT: http://review.gluster.org/6492 committed in master by Vijay Bellur (vbellur) 
------
commit f502e28e8b416f80bd9506ac204948681610b305
Author: Kaushal M <kaushal>
Date:   Tue Dec 10 11:34:06 2013 +0530

    glusterd: Save/restore/sync rebalance dict
    
    A dictionary was added to store additional information of a rebalance
    process, like the bricks being removed in case of a rebalance started
    by remove-brick. This dictionary wasn't being stored/restored or synced
    during volume sync, leading to errors like a volume status command
    failing. These issues have been fixed in this patch. The rebalance dict
    is now stored/restored and also exported/imported during volume sync.
    
    Also, this makes sure that the rebalance dict is only create on
    remove-brick start. This adds a bricks decommissioned status to the
    information imported/exported during volume sync.
    
    Change-Id: I56fed23dc2de80a96648055fe705e9c3ffd55227
    BUG: 1040809
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/6492
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 4 Anand Avati 2013-12-17 06:00:25 UTC
REVIEW: http://review.gluster.org/6524 (glusterd: Save/restore/sync rebalance dict) posted (#1) for review on release-3.5 by Kaushal M (kaushal)

Comment 5 Anand Avati 2013-12-23 08:59:47 UTC
REVIEW: http://review.gluster.org/6565 (glusterd: Save/restore/sync rebalance dict) posted (#1) for review on release-3.5 by Krishnan Parthasarathi (kparthas)

Comment 6 Anand Avati 2013-12-23 14:58:11 UTC
COMMIT: http://review.gluster.org/6565 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit 3ea6954c120c968ec3b16916cf4fc304b9b4517a
Author: Krishnan Parthasarathi <kparthas>
Date:   Mon Dec 23 14:07:53 2013 +0530

    glusterd: Save/restore/sync rebalance dict
    
            Backport of http://review.gluster.org/6492
    
    A dictionary was added to store additional information of a rebalance
    process, like the bricks being removed in case of a rebalance started
    by remove-brick. This dictionary wasn't being stored/restored or synced
    during volume sync, leading to errors like a volume status command
    failing. These issues have been fixed in this patch. The rebalance dict
    is now stored/restored and also exported/imported during volume sync.
    
    Also, this makes sure that the rebalance dict is only create on
    remove-brick start. This adds a bricks decommissioned status to the
    information imported/exported during volume sync.
    
    Change-Id: I56fed23dc2de80a96648055fe705e9c3ffd55227
    BUG: 1040809
    Signed-off-by: Kaushal M <kaushal>
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/6565
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 7 Anand Avati 2014-01-03 11:11:12 UTC
COMMIT: http://review.gluster.org/6524 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit 52130cb989d32aa302912d2d75e11f4041db2e72
Author: Kaushal M <kaushal>
Date:   Tue Dec 10 11:34:06 2013 +0530

    glusterd: Save/restore/sync rebalance dict
    
    A dictionary was added to store additional information of a rebalance
    process, like the bricks being removed in case of a rebalance started
    by remove-brick. This dictionary wasn't being stored/restored or synced
    during volume sync, leading to errors like a volume status command
    failing. These issues have been fixed in this patch. The rebalance dict
    is now stored/restored and also exported/imported during volume sync.
    
    Also, this makes sure that the rebalance dict is only create on
    remove-brick start. This adds a bricks decommissioned status to the
    information imported/exported during volume sync.
    
    BUG: 1040809
    Change-Id: I46cee3e4e34a1f3266a20aac43368854594f01bc
    Signed-off-by: Kaushal M <kaushal>
    Backport-of: http://review.gluster.org/6492
    Reviewed-on: http://review.gluster.org/6524
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 8 Niels de Vos 2014-04-17 11:52:20 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.