Bug 1027699

Summary: 'gluster volume status' command fails on a server after glusterd is brought down and back up, while remove-brick is in progress
Product: Red Hat Gluster Storage Reporter: Shruti Sampat <ssampat>
Component: glusterfsAssignee: Kaushal <kaushal>
Status: CLOSED ERRATA QA Contact: Shruti Sampat <ssampat>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: dpati, dtsang, kaushal, knarra, mmahoney, pprakash, psriniva, sdharane, vagarwal, vbellur, vraman
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 2.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.50rhs-1 Doc Type: Bug Fix
Doc Text:
Previously, the gluster volume status command would fail on a node when glusterd was restated while remove brick operation was in progress. With this fix, the command works as expected.
Story Points: ---
Clone Of:
: 1040809 (view as bug list) Environment:
Last Closed: 2014-02-25 08:01:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1040809    
Bug Blocks:    
Attachments:
Description Flags
sosreport none

Description Shruti Sampat 2013-11-07 10:05:47 UTC
Description of problem:
-----------------------

In a single-node cluster, when remove-brick is in progress, glusterd is killed and then brought back up. Following this, 'gluster volume status' command fails on the node - 

[root@rhs ~]# gluster v status test_dis 
Commit failed on localhost. Please check the log file for more details.

The following errors are seen in the glusterd logs - 

[2013-11-07 03:02:59.984190] I [glusterd-handler.c:3498:__glusterd_handle_status_volume] 0-management: Received status volume req for volume test_dis
[2013-11-07 03:02:59.984708] E [glusterd-op-sm.c:1973:_add_remove_bricks_to_dict] 0-management: Failed to get brick count
[2013-11-07 03:02:59.984737] E [glusterd-op-sm.c:2037:_add_task_to_dict] 0-management: Failed to add remove bricks to dict
[2013-11-07 03:02:59.984753] E [glusterd-op-sm.c:2122:glusterd_aggregate_task_status] 0-management: Failed to add task details to dict
[2013-11-07 03:02:59.984768] E [glusterd-syncop.c:993:gd_commit_op_phase] 0-management: Commit of operation 'Volume Status' failed on localhost    

Version-Release number of selected component (if applicable):
glusterfs 3.4.0.35.1u2rhs

How reproducible:
Always

Steps to Reproduce:
1. Create a distribute volume with two bricks, start it, fuse mount it and create some data on the mount point.
2. Start remove-brick of one of the bricks.
3. While remove-brick is in progress, kill glusterd and start it again.
4. Check volume status - 
# gluster volume status

Actual results:
The command fails with the following message - 

Commit failed on localhost. Please check the log file for more details.

Expected results:
The command should not fail.

Additional info:

Comment 1 Shruti Sampat 2013-11-07 10:18:43 UTC
Created attachment 820981 [details]
sosreport

Comment 2 Dusmant 2013-11-07 10:20:13 UTC
Because of this problem, RHSC does not update the icon and task does not get updated

Comment 3 Shruti Sampat 2013-12-19 09:43:09 UTC
Verified as fixed in glusterfs 3.4.0.50rhs.

Volume status command is successful after restarting glusterd while remove-brick is in progress.

Comment 4 Pavithra 2014-01-03 06:18:47 UTC
Can you please verify the doc text for technical accuracy?

Comment 5 Kaushal 2014-01-03 07:15:56 UTC
Doc text looks okay.

Comment 7 errata-xmlrpc 2014-02-25 08:01:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html