Bug 770914 - glusterd crashes with add-brick (replica to dist-replica)
Summary: glusterd crashes with add-brick (replica to dist-replica)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: krishnan parthasarathi
QA Contact: shylesh
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2011-12-30 06:27 UTC by shishir gowda
Modified: 2013-12-09 01:28 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:25:03 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description shishir gowda 2011-12-30 06:27:34 UTC
Description of problem:
When converting a replica volume (2 bricks), to a distribute-replica, gluster crashes.

Version-Release number of selected component (if applicable):
mainline

How reproducible:
everytime

Steps to Reproduce:
1. create a replica vol with 2 bricks
2. add a new replica pair
 
Actual results:

gluster> volume create new replica 2 sng:/export/dir1 sng:/export/dir2
Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
Do you still want to continue creating the volume?  (y/n) y
Creation of volume new has been successful. Please start the volume to access data.
gluster> volume start new
Starting volume new has been successful

gluster> volume add-brick new replica 2 sng:/export/dir3 sng:/export/dir4
gluster> volume info 

Expected results:


Additional info:

bt:

Core was generated by `glusterd'.
Program terminated with signal 8, Arithmetic exception.
#0  0x00007f607a67f8c4 in add_brick_at_right_order (brickinfo=0x1a56800, volinfo=0x1a4ba80, count=0, stripe_cnt=0, replica_cnt=2)
    at glusterd-brick-ops.c:79
79	        idx = (count / (replica_cnt - sub_cnt) * sub_cnt) +
(gdb) bt
#0  0x00007f607a67f8c4 in add_brick_at_right_order (brickinfo=0x1a56800, volinfo=0x1a4ba80, count=0, stripe_cnt=0, replica_cnt=2)
    at glusterd-brick-ops.c:79
#1  0x00007f607a6825d8 in glusterd_op_perform_add_bricks (volinfo=0x1a4ba80, count=2, bricks=0x1a54c50 " sng:/export/dir3 sng:/export/dir4 ", 
    dict=0x1a55b10) at glusterd-brick-ops.c:826
#2  0x00007f607a683dd7 in glusterd_op_add_brick (dict=0x1a55b10, op_errstr=0x7fff14fda620) at glusterd-brick-ops.c:1327
#3  0x00007f607a63424d in glusterd_op_commit_perform (op=GD_OP_ADD_BRICK, dict=0x1a55b10, op_errstr=0x7fff14fda620, rsp_dict=0x0)
    at glusterd-op-sm.c:2304
#4  0x00007f607a63257b in glusterd_op_ac_send_commit_op (event=0x1a55a40, ctx=0x1a3f3a0) at glusterd-op-sm.c:1681
#5  0x00007f607a6369ca in glusterd_op_sm () at glusterd-op-sm.c:3321
#6  0x00007f607a680f3e in glusterd_handle_add_brick (req=0x7f607a54d02c) at glusterd-brick-ops.c:489
#7  0x00007f607d35f1ef in rpcsvc_handle_rpc_call (svc=0x1a44ec0, trans=0x1a48800, msg=0x1a50da0) at rpcsvc.c:507
#8  0x00007f607d35f56c in rpcsvc_notify (trans=0x1a48800, mydata=0x1a44ec0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1a50da0) at rpcsvc.c:603
#9  0x00007f607d36518c in rpc_transport_notify (this=0x1a48800, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1a50da0) at rpc-transport.c:498
#10 0x00007f607a3412e7 in socket_event_poll_in (this=0x1a48800) at socket.c:1675
#11 0x00007f607a341850 in socket_event_handler (fd=14, idx=7, data=0x1a48800, poll_in=1, poll_out=0, poll_err=0) at socket.c:1790
#12 0x00007f607d5b9c6c in event_dispatch_epoll_handler (event_pool=0x1a3e500, events=0x1a47be0, i=0) at event.c:794
#13 0x00007f607d5b9e7f in event_dispatch_epoll (event_pool=0x1a3e500) at event.c:856
#14 0x00007f607d5ba1f2 in event_dispatch (event_pool=0x1a3e500) at event.c:956
#15 0x0000000000407d1e in main (argc=1, argv=0x7fff14fdb468) at glusterfsd.c:1601


(gdb) p replica_cnt 
$1 = 2
(gdb) p sub_cnt
$2 = 2

Comment 1 krishnan parthasarathi 2012-01-02 14:09:16 UTC
In fix for 765774. setting the op dictionary with modified stripe and replica count was (accidently) removed affecting the correctness. The stage and op functions of add-brick rely on the fact that stripe/replica count are zero if there is no change in them, during this add-brick operation.

Comment 2 Anand Avati 2012-01-03 04:50:13 UTC
CHANGE: http://review.gluster.com/2548 (glusterd: Fixed add-brick handler algorithm.) merged in master by Vijay Bellur (vijay)

Comment 3 Kaushal 2012-05-29 11:14:05 UTC
Checked on release-3.3. glusterd no longer crashes on add brick.
The follwing are the steps followed and the output obtained,


gluster> volume create test replica 2 arch:/export/test1 arch:/export/test2
Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
Do you still want to continue creating the volume?  (y/n) y
Creation of volume test has been successful. Please start the volume to access data.
gluster> volume add-brick test arch:/export/test3 arch:/export/test4
Add Brick successful
gluster>


Note You need to log in before you can comment on or make changes to this bug.