Bug 803711 - Remove-brick with wrong replica value , gives wrong info and further replace-brick operation causes glusterd to crash
Summary: Remove-brick with wrong replica value , gives wrong info and further replace-...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: x86_64
OS: Linux
urgent
medium
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact: Vijaykumar Koppad
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-15 13:41 UTC by Vijaykumar Koppad
Modified: 2015-12-01 16:45 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:19:48 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 3.3.0qa42
Embargoed:


Attachments (Terms of Use)

Description Vijaykumar Koppad 2012-03-15 13:41:58 UTC
Description of problem:
Volume Name: doa
Type: Replicate
Volume ID: 1f0ef1ab-4f35-4dd3-ada9-f1b5d37a2876
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vostro:/root/bricks/doa/d1
Brick2: vostro:/root/bricks/doa/d2
Brick3: vostro:/root/bricks/doa/d3
root@vostro:~# gluster volume remove-brick doa replica 3 vostro:/root/bricks/doa/d3
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
Remove Brick successful
root@vostro:~# gluster volume info
 
Volume Name: doa
Type: Replicate
Volume ID: 1f0ef1ab-4f35-4dd3-ada9-f1b5d37a2876
Status: Started
Number of Bricks: 0 x 3 = 2
Transport-type: tcp
Bricks:
Brick1: vostro:/root/bricks/doa/d1
Brick2: vostro:/root/bricks/doa/d2


root@vostro:~# gluster volume remove-brick doa replica 1 vostro:/root/bricks/doa/d3
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
Connection failed. Please check if gluster daemon is operational.


Version-Release number of selected component (if applicable):
[d05708d7976a8340ae7647fd26f38f22f1863b6a]release-3.3

How reproducible:always


Additional info:And replicate translator has only one sub-volume
volume doa-client-0
    type protocol/client
    option remote-host vostro
    option remote-subvolume /root/bricks/doa/d1
    option transport-type tcp
end-volume

volume doa-client-1
    type protocol/client
    option remote-host vostro
    option remote-subvolume /root/bricks/doa/d2
    option transport-type tcp
end-volume

volume doa-replicate-0
    type cluster/replicate
    subvolumes doa-client-0 doa-client-1
end-volume

This is the back-trace- 
###############################################
#0  0x00007f0bdc961ee2 in gd_rmbr_validate_replica_count (volinfo=0x231ef60, replica_count=1, brick_count=1, err_str=0x7fff97d7bc40 "") at glusterd-brick-ops.c:294
#1  0x00007f0bdc963273 in glusterd_handle_remove_brick (req=0x7f0bdc86f04c) at glusterd-brick-ops.c:609
#2  0x00007f0bdfdf1279 in rpcsvc_handle_rpc_call (svc=0x23115d0, trans=0x231bdc0, msg=0x23c0a70) at rpcsvc.c:520
#3  0x00007f0bdfdf15f6 in rpcsvc_notify (trans=0x231bdc0, mydata=0x23115d0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x23c0a70) at rpcsvc.c:616
#4  0x00007f0bdfdf72ac in rpc_transport_notify (this=0x231bdc0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x23c0a70) at rpc-transport.c:498
#5  0x00007f0bdc663317 in socket_event_poll_in (this=0x231bdc0) at socket.c:1686
#6  0x00007f0bdc663880 in socket_event_handler (fd=33, idx=25, data=0x231bdc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
#7  0x00007f0be005179c in event_dispatch_epoll_handler (event_pool=0x22f73a0, events=0x231b180, i=0) at event.c:794
#8  0x00007f0be00519af in event_dispatch_epoll (event_pool=0x22f73a0) at event.c:856
#9  0x00007f0be0051d22 in event_dispatch (event_pool=0x22f73a0) at event.c:956
#10 0x0000000000408247 in main (argc=3, argv=0x7fff97d7cb48) at glusterfsd.c:1624
(gdb) f 1 
#1  0x00007f0bdc963273 in glusterd_handle_remove_brick (req=0x7f0bdc86f04c) at glusterd-brick-ops.c:609
609	                ret = gd_rmbr_validate_replica_count (volinfo, replica_count,
(gdb) f 2 
#2  0x00007f0bdfdf1279 in rpcsvc_handle_rpc_call (svc=0x23115d0, trans=0x231bdc0, msg=0x23c0a70) at rpcsvc.c:520
520	                        ret = actor->actor (req);
(gdb) f 23
#0  0x0000000000000000 in ?? ()
(gdb) f 3
#3  0x00007f0bdfdf15f6 in rpcsvc_notify (trans=0x231bdc0, mydata=0x23115d0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x23c0a70) at rpcsvc.c:616
616	                ret = rpcsvc_handle_rpc_call (svc, trans, msg);
(gdb) f 4
#4  0x00007f0bdfdf72ac in rpc_transport_notify (this=0x231bdc0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x23c0a70) at rpc-transport.c:498
498	                ret = this->notify (this, this->mydata, event, data);
(gdb)

Comment 1 Anand Avati 2012-03-31 14:39:06 UTC
CHANGE: http://review.gluster.com/3050 (glusterd: remove-brick validation behavior fix) merged in master by Vijay Bellur (vijay)

Comment 2 Amar Tumballi 2012-05-05 04:22:34 UTC
Noticed that the fix done in above case was good for one of the few cases. There is still an issue with remove-brick pattern in plain replicate type of volume, hence re-opening this bug. Thanks to Shwetha and Shylesh for trying to verify the bug, and finding the other cases.

Comment 3 Anand Avati 2012-05-08 09:41:06 UTC
CHANGE: http://review.gluster.com/3278 (glusterd: remove-brick: add more error handling) merged in master by Vijay Bellur (vijay)


Note You need to log in before you can comment on or make changes to this bug.