Description of problem: remove-brick start on any volume is not working Version-Release number of selected component (if applicable): [root@beta2 glusterd]# rpm -qa | grep gluster gluster-swift-container-1.4.8-4.el6.noarch glusterfs-fuse-3.4.0.12rhs.beta3-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.12rhs.beta3-1.el6rhs.x86_64 vdsm-gluster-4.10.2-22.5.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-proxy-1.4.8-4.el6.noarch gluster-swift-account-1.4.8-4.el6.noarch gluster-swift-doc-1.4.8-4.el6.noarch glusterfs-3.4.0.12rhs.beta3-1.el6rhs.x86_64 glusterfs-server-3.4.0.12rhs.beta3-1.el6rhs.x86_64 glusterfs-geo-replication-3.4.0.12rhs.beta3-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.12rhs.beta3-1.el6rhs.x86_64 gluster-swift-1.4.8-4.el6.noarch gluster-swift-object-1.4.8-4.el6.noarch glusterfs-devel-3.4.0.12rhs.beta3-1.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1. created a 3 brick distribute volume and start the volume 2. run "gluster volume remove-brick <vol> brick1 start Actual results: It fails with the message [root@beta1 ~]# gluster v remove-brick a1 10.70.35.62:/brick1/a31 start volume remove-brick start: failed: A remove-brick task on volume a1 is not yet committed. Either commit or stop the remove-brick task. RHS nodes ========== 10.70.35.62 10.70.35.64 command issued from 10.70.35.62 Volume Name: a1 Type: Distribute Volume ID: c5389939-7db4-4248-9236-7abfc8d720d2 Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: 10.70.35.62:/brick1/a11 Brick2: 10.70.35.64:/brick2/a221 Brick3: 10.70.35.62:/brick1/a31 glusterd.log ============= 3698] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled [2013-07-08 10:38:55.643712] I [socket.c:3502:socket_init] 0-management: using system polling thread [2013-07-08 10:38:55.644112] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-07-08 10:38:59.019730] I [glusterd-handler.c:1021:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2013-07-08 10:38:59.020280] I [glusterd-handler.c:1021:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2013-07-08 10:38:59.020637] I [glusterd-handler.c:1021:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2013-07-08 10:38:59.021241] I [glusterd-handler.c:1021:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2013-07-08 10:39:13.836498] I [glusterd-brick-ops.c:593:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2013-07-08 10:39:13.840039] I [glusterd-utils.c:7724:glusterd_generate_and_set_task_id] 0-management: Generated task-id 53b196a0-a2ce -41d0-b940-a79139119421 for key remove-brick-id [2013-07-08 10:39:13.840577] I [glusterd-op-sm.c:4032:glusterd_bricks_select_remove_brick] 0-management: force flag is not set [2013-07-08 10:39:13.866741] E [glusterd-brick-ops.c:1712:glusterd_op_remove_brick] 0-management: failed to start the rebalance [2013-07-08 10:39:13.866759] E [glusterd-syncop.c:927:gd_commit_op_phase] 0-management: Commit of operation 'Volume Remove brick' fail ed on localhost : A remove-brick task on volume another is not yet committed. Either commit or stop the remove-brick task. [2013-07-08 10:52:26.849777] I [glusterd-utils.c:7724:glusterd_generate_and_set_task_id] 0-management: Generated task-id a868e373-298b -4262-a833-7292b7b0af89 for key rebalance-id [2013-07-08 10:52:26.849814] E [glusterd-op-sm.c:2790:glusterd_op_ac_send_stage_op] 0-management: Staging of operation 'Volume Rebalan ce' failed on localhost : A remove-brick task on volume another is not yet committed. Either commit or stop the remove-brick task. [2013-07-08 10:53:05.613725] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /brick1/a11 on port 49159 [2013-07-08 10:53:05.614575] I [rpc-clnt.c:961:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-07-08 10:53:05.614619] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled [2013-07-08 10:53:05.614630] I [socket.c:3502:socket_init] 0-management: using system polling thread [2013-07-08 10:53:05.628241] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /brick1/a31 on port 49160 [2013-07-08 10:53:05.628984] I [rpc-clnt.c:961:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-07-08 10:53:05.629062] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled [2013-07-08 10:53:05.629073] I [socket.c:3502:socket_init] 0-management: using system polling thread [2013-07-08 10:53:05.658005] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0 attached the sosreports
Shylesh, was another rebalance running at this time? [2013-07-08 10:52:26.849814] E [glusterd-op-sm.c:2790:glusterd_op_ac_send_stage_op] 0-management: Staging of operation 'Volume Rebalan ce' failed on localhost : A remove-brick task on volume another is not yet committed. Either commit or stop the remove-brick task.
(In reply to Amar Tumballi from comment #4) > Shylesh, was another rebalance running at this time? > > [2013-07-08 10:52:26.849814] E > [glusterd-op-sm.c:2790:glusterd_op_ac_send_stage_op] 0-management: Staging > of operation 'Volume Rebalan > ce' failed on localhost : A remove-brick task on volume another is not yet > committed. Either commit or stop the remove-brick task. Amar, No rebalance or any other remove-brick operation was running
Going through this now. Will update by EOD.
This was caused by a incomplete backport of the upstream patch. Sent out a patch to add the missing code. Patch under review at https://code.engineering.redhat.com/gerrit/10517
Patch merged as commit b1c79ed82ea3ce2853588e77f5bbcdffdd65dfcd
Verified on 3.4.0.14rhs-1.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html