Bug 982184 - " Remove-brick start " command is not working
" Remove-brick start " command is not working
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.1
Unspecified Unspecified
high Severity urgent
: ---
: ---
Assigned To: Kaushal
Sudhir D
: TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-08 07:01 EDT by shylesh
Modified: 2013-09-23 18:29 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.12rhs.beta6-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-23 18:29:50 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description shylesh 2013-07-08 07:01:55 EDT
Description of problem:
remove-brick start on any volume is not working



Version-Release number of selected component (if applicable):

[root@beta2 glusterd]# rpm -qa | grep gluster
gluster-swift-container-1.4.8-4.el6.noarch
glusterfs-fuse-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.12rhs.beta3-1.el6rhs.x86_64
vdsm-gluster-4.10.2-22.5.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.12rhs.beta3-1.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch
glusterfs-devel-3.4.0.12rhs.beta3-1.el6rhs.x86_64


How reproducible:
always

Steps to Reproduce:
1. created a 3 brick distribute volume and start the volume
2. run "gluster volume remove-brick <vol> brick1 start


Actual results:

It fails with the message 
[root@beta1 ~]# gluster v remove-brick a1 10.70.35.62:/brick1/a31 start
volume remove-brick start: failed: A remove-brick task on volume a1 is not yet committed. Either commit or stop the remove-brick task.




RHS nodes
==========
10.70.35.62
10.70.35.64

command issued from 10.70.35.62

Volume Name: a1
Type: Distribute
Volume ID: c5389939-7db4-4248-9236-7abfc8d720d2
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 10.70.35.62:/brick1/a11
Brick2: 10.70.35.64:/brick2/a221
Brick3: 10.70.35.62:/brick1/a31



glusterd.log
=============
3698] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled
[2013-07-08 10:38:55.643712] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-07-08 10:38:55.644112] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-07-08 10:38:59.019730] I [glusterd-handler.c:1021:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2013-07-08 10:38:59.020280] I [glusterd-handler.c:1021:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2013-07-08 10:38:59.020637] I [glusterd-handler.c:1021:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2013-07-08 10:38:59.021241] I [glusterd-handler.c:1021:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2013-07-08 10:39:13.836498] I [glusterd-brick-ops.c:593:__glusterd_handle_remove_brick] 0-management: Received rem brick req
[2013-07-08 10:39:13.840039] I [glusterd-utils.c:7724:glusterd_generate_and_set_task_id] 0-management: Generated task-id 53b196a0-a2ce
-41d0-b940-a79139119421 for key remove-brick-id
[2013-07-08 10:39:13.840577] I [glusterd-op-sm.c:4032:glusterd_bricks_select_remove_brick] 0-management: force flag is not set
[2013-07-08 10:39:13.866741] E [glusterd-brick-ops.c:1712:glusterd_op_remove_brick] 0-management: failed to start the rebalance
[2013-07-08 10:39:13.866759] E [glusterd-syncop.c:927:gd_commit_op_phase] 0-management: Commit of operation 'Volume Remove brick' fail
ed on localhost : A remove-brick task on volume another is not yet committed. Either commit or stop the remove-brick task.
[2013-07-08 10:52:26.849777] I [glusterd-utils.c:7724:glusterd_generate_and_set_task_id] 0-management: Generated task-id a868e373-298b
-4262-a833-7292b7b0af89 for key rebalance-id
[2013-07-08 10:52:26.849814] E [glusterd-op-sm.c:2790:glusterd_op_ac_send_stage_op] 0-management: Staging of operation 'Volume Rebalan
ce' failed on localhost : A remove-brick task on volume another is not yet committed. Either commit or stop the remove-brick task.
[2013-07-08 10:53:05.613725] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /brick1/a11 on port 49159
[2013-07-08 10:53:05.614575] I [rpc-clnt.c:961:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-07-08 10:53:05.614619] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled
[2013-07-08 10:53:05.614630] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-07-08 10:53:05.628241] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /brick1/a31 on port 49160
[2013-07-08 10:53:05.628984] I [rpc-clnt.c:961:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-07-08 10:53:05.629062] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled
[2013-07-08 10:53:05.629073] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-07-08 10:53:05.658005] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0





attached the sosreports
Comment 4 Amar Tumballi 2013-07-19 06:22:29 EDT
Shylesh, was another rebalance running at this time?

[2013-07-08 10:52:26.849814] E [glusterd-op-sm.c:2790:glusterd_op_ac_send_stage_op] 0-management: Staging of operation 'Volume Rebalan
ce' failed on localhost : A remove-brick task on volume another is not yet committed. Either commit or stop the remove-brick task.
Comment 5 shylesh 2013-07-19 06:27:59 EDT
(In reply to Amar Tumballi from comment #4)
> Shylesh, was another rebalance running at this time?
> 
> [2013-07-08 10:52:26.849814] E
> [glusterd-op-sm.c:2790:glusterd_op_ac_send_stage_op] 0-management: Staging
> of operation 'Volume Rebalan
> ce' failed on localhost : A remove-brick task on volume another is not yet
> committed. Either commit or stop the remove-brick task.

Amar,
No rebalance or any other remove-brick operation was running
Comment 6 Kaushal 2013-07-20 01:47:42 EDT
Going through this now. Will update by EOD.
Comment 7 Kaushal 2013-07-20 08:46:20 EDT
This was caused by a incomplete backport of the upstream patch. Sent out a patch to add the missing code. 
Patch under review at https://code.engineering.redhat.com/gerrit/10517
Comment 8 Kaushal 2013-07-21 23:37:22 EDT
Patch merged as commit b1c79ed82ea3ce2853588e77f5bbcdffdd65dfcd
Comment 10 shylesh 2013-08-01 05:27:18 EDT
Verified on 3.4.0.14rhs-1.el6rhs.x86_64
Comment 11 Scott Haines 2013-09-23 18:29:50 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.