Description of problem: Volume Name: doa Type: Replicate Volume ID: 1f0ef1ab-4f35-4dd3-ada9-f1b5d37a2876 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: vostro:/root/bricks/doa/d1 Brick2: vostro:/root/bricks/doa/d2 Brick3: vostro:/root/bricks/doa/d3 root@vostro:~# gluster volume remove-brick doa replica 3 vostro:/root/bricks/doa/d3 Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y Remove Brick successful root@vostro:~# gluster volume info Volume Name: doa Type: Replicate Volume ID: 1f0ef1ab-4f35-4dd3-ada9-f1b5d37a2876 Status: Started Number of Bricks: 0 x 3 = 2 Transport-type: tcp Bricks: Brick1: vostro:/root/bricks/doa/d1 Brick2: vostro:/root/bricks/doa/d2 root@vostro:~# gluster volume remove-brick doa replica 1 vostro:/root/bricks/doa/d3 Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y Connection failed. Please check if gluster daemon is operational. Version-Release number of selected component (if applicable): [d05708d7976a8340ae7647fd26f38f22f1863b6a]release-3.3 How reproducible:always Additional info:And replicate translator has only one sub-volume volume doa-client-0 type protocol/client option remote-host vostro option remote-subvolume /root/bricks/doa/d1 option transport-type tcp end-volume volume doa-client-1 type protocol/client option remote-host vostro option remote-subvolume /root/bricks/doa/d2 option transport-type tcp end-volume volume doa-replicate-0 type cluster/replicate subvolumes doa-client-0 doa-client-1 end-volume This is the back-trace- ############################################### #0 0x00007f0bdc961ee2 in gd_rmbr_validate_replica_count (volinfo=0x231ef60, replica_count=1, brick_count=1, err_str=0x7fff97d7bc40 "") at glusterd-brick-ops.c:294 #1 0x00007f0bdc963273 in glusterd_handle_remove_brick (req=0x7f0bdc86f04c) at glusterd-brick-ops.c:609 #2 0x00007f0bdfdf1279 in rpcsvc_handle_rpc_call (svc=0x23115d0, trans=0x231bdc0, msg=0x23c0a70) at rpcsvc.c:520 #3 0x00007f0bdfdf15f6 in rpcsvc_notify (trans=0x231bdc0, mydata=0x23115d0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x23c0a70) at rpcsvc.c:616 #4 0x00007f0bdfdf72ac in rpc_transport_notify (this=0x231bdc0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x23c0a70) at rpc-transport.c:498 #5 0x00007f0bdc663317 in socket_event_poll_in (this=0x231bdc0) at socket.c:1686 #6 0x00007f0bdc663880 in socket_event_handler (fd=33, idx=25, data=0x231bdc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801 #7 0x00007f0be005179c in event_dispatch_epoll_handler (event_pool=0x22f73a0, events=0x231b180, i=0) at event.c:794 #8 0x00007f0be00519af in event_dispatch_epoll (event_pool=0x22f73a0) at event.c:856 #9 0x00007f0be0051d22 in event_dispatch (event_pool=0x22f73a0) at event.c:956 #10 0x0000000000408247 in main (argc=3, argv=0x7fff97d7cb48) at glusterfsd.c:1624 (gdb) f 1 #1 0x00007f0bdc963273 in glusterd_handle_remove_brick (req=0x7f0bdc86f04c) at glusterd-brick-ops.c:609 609 ret = gd_rmbr_validate_replica_count (volinfo, replica_count, (gdb) f 2 #2 0x00007f0bdfdf1279 in rpcsvc_handle_rpc_call (svc=0x23115d0, trans=0x231bdc0, msg=0x23c0a70) at rpcsvc.c:520 520 ret = actor->actor (req); (gdb) f 23 #0 0x0000000000000000 in ?? () (gdb) f 3 #3 0x00007f0bdfdf15f6 in rpcsvc_notify (trans=0x231bdc0, mydata=0x23115d0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x23c0a70) at rpcsvc.c:616 616 ret = rpcsvc_handle_rpc_call (svc, trans, msg); (gdb) f 4 #4 0x00007f0bdfdf72ac in rpc_transport_notify (this=0x231bdc0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x23c0a70) at rpc-transport.c:498 498 ret = this->notify (this, this->mydata, event, data); (gdb)
CHANGE: http://review.gluster.com/3050 (glusterd: remove-brick validation behavior fix) merged in master by Vijay Bellur (vijay)
Noticed that the fix done in above case was good for one of the few cases. There is still an issue with remove-brick pattern in plain replicate type of volume, hence re-opening this bug. Thanks to Shwetha and Shylesh for trying to verify the bug, and finding the other cases.
CHANGE: http://review.gluster.com/3278 (glusterd: remove-brick: add more error handling) merged in master by Vijay Bellur (vijay)