Created attachment 573949 [details] rebalance log Description of problem: Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id dstore --xlator-option *dht'. Program terminated with signal 6, Aborted. #0 0x0000003638632885 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 (gdb) bt full #0 0x0000003638632885 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x0000003638634065 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x000000363866f7a7 in __libc_message () from /lib64/libc.so.6 No symbol table info available. #3 0x00000036386750c6 in malloc_printerr () from /lib64/libc.so.6 No symbol table info available. #4 0x00007fba53136eb9 in client3_1_xattrop_cbk (req=0x7fba4b675a80, iov=0x7fba4b675ac0, count=1, myframe=0x7fba5645a928) at client3_1-fops.c:1711 frame = 0x7fba5645a928 dict = 0x19d899c rsp = {op_ret = 0, op_errno = 0, dict = {dict_len = 100, dict_val = 0x189cc60 ""}, xdata = {xdata_len = 0, xdata_val = 0x0}} ret = 116 op_errno = 0 local = 0x1914fd0 this = 0x1868470 xdata = 0x0 __FUNCTION__ = "client3_1_xattrop_cbk" #5 0x00007fba574029fc in rpc_clnt_handle_reply (clnt=0x1928be0, pollin=0x19e0e10) at rpc-clnt.c:797 conn = 0x1928c10 saved_frame = 0x192914c ret = 0 req = 0x7fba4b675a80 xid = 7774 __FUNCTION__ = "rpc_clnt_handle_reply" #6 0x00007fba57402d99 in rpc_clnt_notify (trans=0x1938770, mydata=0x1928c10, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x19e0e10) at rpc-clnt.c:916 conn = 0x1928c10 clnt = 0x1928be0 ret = -1 req_info = 0x0 pollin = 0x19e0e10 tv = {tv_sec = 0, tv_usec = 0} #7 0x00007fba573fee7c in rpc_transport_notify (this=0x1938770, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x19e0e10) at rpc-transport.c:498 ret = -1 __FUNCTION__ = "rpc_transport_notify" #8 0x00007fba53f83270 in socket_event_poll_in (this=0x1938770) at socket.c:1686 ---Type <return> to continue, or q <return> to quit--- ret = 0 pollin = 0x19e0e10 #9 0x00007fba53f837f4 in socket_event_handler (fd=11, idx=5, data=0x1938770, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801 this = 0x1938770 priv = 0x1938b20 ret = 0 __FUNCTION__ = "socket_event_handler" #10 0x00007fba5765a628 in event_dispatch_epoll_handler (event_pool=0x18313a0, events=0x185fb30, i=0) at event.c:794 event_data = 0x185fb34 handler = 0x7fba53f835d7 <socket_event_handler> data = 0x1938770 idx = 5 ret = -1 __FUNCTION__ = "event_dispatch_epoll_handler" #11 0x00007fba5765a84b in event_dispatch_epoll (event_pool=0x18313a0) at event.c:856 events = 0x185fb30 size = 2 i = 0 ret = 2 __FUNCTION__ = "event_dispatch_epoll" #12 0x00007fba5765abd6 in event_dispatch (event_pool=0x18313a0) at event.c:956 ret = -1 __FUNCTION__ = "event_dispatch" #13 0x0000000000408057 in main (argc=21, argv=0x7fff603a47c8) at glusterfsd.c:1650 ctx = 0x1819010 ret = 0 __FUNCTION__ = "main" Version-Release number of selected component (if applicable): mainline Steps to Reproduce: 1.create distribute-replicate volume(2X2). start the volume. 2.create fuse, nfs mounts. 3.start dd in loop on both nfs, fuse mount. 4.add-brick to the volume 5.start rebalance 6.stop rebalance 7.brink down bricks one from each replicate pair. 8.brink back the bricks online. 9.force start rebalance 10.brick down bricks one from each replicate pair. 11.query for rebalance status. Actual results: crash in rebalance process Additional Info:- before crash rebalance status:- ----------------------------- [03/30/12 - 20:56:54 root@APP-SERVER1 ~]# gluster volume rebalance dstore status Node Rebalanced-files size scanned status --------- ----------- ----------- ----------- ------------ localhost 36 937426944 1198 in progress 192.168.2.36 0 0 257 completed after crash rebalance status:- ------------------------------ [03/30/12 - 20:57:04 root@APP-SERVER1 ~]# gluster volume rebalance dstore status Node Rebalanced-files size scanned status --------- ----------- ----------- ----------- ------------ localhost 36 937426944 1198 completed 192.168.2.36 0 0 257 completed
CHANGE: http://review.gluster.com/3062 (dht/rebalance: Send PARENT_DOWN event before cleanup in rebalance) merged in master by Vijay Bellur (vijay)
CHANGE: http://review.gluster.com/3107 (rebalance: revert sending PARENT_DOWN event to xlators) merged in master by Vijay Bellur (vijay)
Bug is fixed. Verified on 3.3.0qa43