Bug 770914

Summary: glusterd crashes with add-brick (replica to dist-replica)
Product: [Community] GlusterFS Reporter: shishir gowda <sgowda>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED CURRENTRELEASE QA Contact: shylesh <shmohan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, kaushal, nsathyan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:25:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    

Description shishir gowda 2011-12-30 06:27:34 UTC
Description of problem:
When converting a replica volume (2 bricks), to a distribute-replica, gluster crashes.

Version-Release number of selected component (if applicable):
mainline

How reproducible:
everytime

Steps to Reproduce:
1. create a replica vol with 2 bricks
2. add a new replica pair
 
Actual results:

gluster> volume create new replica 2 sng:/export/dir1 sng:/export/dir2
Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
Do you still want to continue creating the volume?  (y/n) y
Creation of volume new has been successful. Please start the volume to access data.
gluster> volume start new
Starting volume new has been successful

gluster> volume add-brick new replica 2 sng:/export/dir3 sng:/export/dir4
gluster> volume info 

Expected results:


Additional info:

bt:

Core was generated by `glusterd'.
Program terminated with signal 8, Arithmetic exception.
#0  0x00007f607a67f8c4 in add_brick_at_right_order (brickinfo=0x1a56800, volinfo=0x1a4ba80, count=0, stripe_cnt=0, replica_cnt=2)
    at glusterd-brick-ops.c:79
79	        idx = (count / (replica_cnt - sub_cnt) * sub_cnt) +
(gdb) bt
#0  0x00007f607a67f8c4 in add_brick_at_right_order (brickinfo=0x1a56800, volinfo=0x1a4ba80, count=0, stripe_cnt=0, replica_cnt=2)
    at glusterd-brick-ops.c:79
#1  0x00007f607a6825d8 in glusterd_op_perform_add_bricks (volinfo=0x1a4ba80, count=2, bricks=0x1a54c50 " sng:/export/dir3 sng:/export/dir4 ", 
    dict=0x1a55b10) at glusterd-brick-ops.c:826
#2  0x00007f607a683dd7 in glusterd_op_add_brick (dict=0x1a55b10, op_errstr=0x7fff14fda620) at glusterd-brick-ops.c:1327
#3  0x00007f607a63424d in glusterd_op_commit_perform (op=GD_OP_ADD_BRICK, dict=0x1a55b10, op_errstr=0x7fff14fda620, rsp_dict=0x0)
    at glusterd-op-sm.c:2304
#4  0x00007f607a63257b in glusterd_op_ac_send_commit_op (event=0x1a55a40, ctx=0x1a3f3a0) at glusterd-op-sm.c:1681
#5  0x00007f607a6369ca in glusterd_op_sm () at glusterd-op-sm.c:3321
#6  0x00007f607a680f3e in glusterd_handle_add_brick (req=0x7f607a54d02c) at glusterd-brick-ops.c:489
#7  0x00007f607d35f1ef in rpcsvc_handle_rpc_call (svc=0x1a44ec0, trans=0x1a48800, msg=0x1a50da0) at rpcsvc.c:507
#8  0x00007f607d35f56c in rpcsvc_notify (trans=0x1a48800, mydata=0x1a44ec0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1a50da0) at rpcsvc.c:603
#9  0x00007f607d36518c in rpc_transport_notify (this=0x1a48800, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1a50da0) at rpc-transport.c:498
#10 0x00007f607a3412e7 in socket_event_poll_in (this=0x1a48800) at socket.c:1675
#11 0x00007f607a341850 in socket_event_handler (fd=14, idx=7, data=0x1a48800, poll_in=1, poll_out=0, poll_err=0) at socket.c:1790
#12 0x00007f607d5b9c6c in event_dispatch_epoll_handler (event_pool=0x1a3e500, events=0x1a47be0, i=0) at event.c:794
#13 0x00007f607d5b9e7f in event_dispatch_epoll (event_pool=0x1a3e500) at event.c:856
#14 0x00007f607d5ba1f2 in event_dispatch (event_pool=0x1a3e500) at event.c:956
#15 0x0000000000407d1e in main (argc=1, argv=0x7fff14fdb468) at glusterfsd.c:1601


(gdb) p replica_cnt 
$1 = 2
(gdb) p sub_cnt
$2 = 2

Comment 1 krishnan parthasarathi 2012-01-02 14:09:16 UTC
In fix for 765774. setting the op dictionary with modified stripe and replica count was (accidently) removed affecting the correctness. The stage and op functions of add-brick rely on the fact that stripe/replica count are zero if there is no change in them, during this add-brick operation.

Comment 2 Anand Avati 2012-01-03 04:50:13 UTC
CHANGE: http://review.gluster.com/2548 (glusterd: Fixed add-brick handler algorithm.) merged in master by Vijay Bellur (vijay)

Comment 3 Kaushal 2012-05-29 11:14:05 UTC
Checked on release-3.3. glusterd no longer crashes on add brick.
The follwing are the steps followed and the output obtained,


gluster> volume create test replica 2 arch:/export/test1 arch:/export/test2
Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
Do you still want to continue creating the volume?  (y/n) y
Creation of volume test has been successful. Please start the volume to access data.
gluster> volume add-brick test arch:/export/test3 arch:/export/test4
Add Brick successful
gluster>