Created attachment 575716 [details] rebalance logs Description of problem: while rebalancing is happening , I/O on the mount point and initiating remove-brick on the same volume leads to crash. Version-Release number of selected component (if applicable): 3.3.0qa33 How reproducible: Steps to Reproduce: 1. created a distribute volume with 6 bricks 2. Initiated rebalance 3. Do some I/O on the mount point while rebalance is happening 4. Initiate remove-brick on the same volume Actual results: glusterfs crashed Expected results: remove-brick should not start while rebalance is happening Additional info: Program terminated with signal 11, Segmentation fault. #0 0x0000003d7b0157f8 in ?? () from /lib64/libgcc_s.so.1 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64 ================================================== (gdb) bt #0 0x0000003d7b0157f8 in ?? () from /lib64/libgcc_s.so.1 #1 0x00007fba83a74933 in xlator_notify (xl=0x2274fb0, event=6, data=0x226ee10) at xlator.c:457 #2 0x00007fba83a87dda in default_notify (this=0x226ee10, event=6, data=0x0) at defaults.c:1334 #3 0x00007fba7f578c7c in client_rpc_notify (rpc=0x23a1c40, mydata=0x226ee10, event=RPC_CLNT_DISCONNECT, data=0x0) at client.c:2107 #4 0x00007fba8384fe2b in rpc_clnt_notify (trans=0x23b16a0, mydata=0x23a1c70, event=RPC_TRANSPORT_DISCONNECT, data=0x23b16a0) at rpc-clnt.c:887 #5 0x00007fba8384bee4 in rpc_transport_notify (this=0x23b16a0, event=RPC_TRANSPORT_DISCONNECT, data=0x23b16a0) at rpc-transport.c:498 #6 0x00007fba803cd1d3 in socket_event_poll_err (this=0x23b16a0) at socket.c:694 #7 0x00007fba803d188c in socket_event_handler (fd=13, idx=6, data=0x23b16a0, poll_in=1, poll_out=0, poll_err=16) at socket.c:1808 #8 0x00007fba83aa8640 in event_dispatch_epoll_handler (event_pool=0x223adb0, events=0x2268830, i=0) at event.c:794 #9 0x00007fba83aa8863 in event_dispatch_epoll (event_pool=0x223adb0) at event.c:856 #10 0x00007fba83aa8bee in event_dispatch (event_pool=0x223adb0) at event.c:956 #11 0x000000000040801c in main (argc=21, argv=0x7fffb6c96b88) at glusterfsd.c:1650 =========================================================================== attached the logs
Same crash also generated while 1. create a 2 brick distribute from a single node 2. Keep on doing some I/O on the mount 3. Attach another node to the cluster 4. Add a brick from the new node to this volume 5. Initiate fix-layout and rebalance .
Same steps lead to another crash whose stack frames are almost same. Program terminated with signal 11, Segmentation fault. #0 0x00007fcd94f82da4 in default_notify (this=0x2502040, event=6, data=0x24ff840) at defaults.c:1333 1333 if (parent->xlator->init_succeeded) Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 ====================================================================== (gdb) p this $1 = (xlator_t *) 0x2502040 (gdb) p this->ctx $2 = (glusterfs_ctx_t *) 0x24b4010 (gdb) p this->ctx->master $3 = (void *) 0x0 (gdb) p this->graph $4 = (glusterfs_graph_t *) 0x24fa420 ==================================================================== (gdb) bt #0 0x00007fcd94f82da4 in default_notify (this=0x2502040, event=6, data=0x24ff840) at defaults.c:1333 #1 0x00007fcd90842759 in dht_notify (this=0x2502040, event=6, data=0x24ff840) at dht-common.c:4703 #2 0x00007fcd90853bed in notify (this=0x2502040, event=6, data=0x24ff840) at dht.c:201 #3 0x00007fcd94f6f933 in xlator_notify (xl=0x2502040, event=6, data=0x24ff840) at xlator.c:457 #4 0x00007fcd94f82dda in default_notify (this=0x24ff840, event=6, data=0x0) at defaults.c:1334 #5 0x00007fcd90a73c7c in client_rpc_notify (rpc=0x2579d70, mydata=0x24ff840, event=RPC_CLNT_DISCONNECT, data=0x0) at client.c:2107 #6 0x00007fcd94d4ae2b in rpc_clnt_notify (trans=0x25897d0, mydata=0x2579da0, event=RPC_TRANSPORT_DISCONNECT, data=0x25897d0) at rpc-clnt.c:887 #7 0x00007fcd94d46ee4 in rpc_transport_notify (this=0x25897d0, event=RPC_TRANSPORT_DISCONNECT, data=0x25897d0) at rpc-transport.c:498 #8 0x00007fcd918c81d3 in socket_event_poll_err (this=0x25897d0) at socket.c:694 #9 0x00007fcd918cc88c in socket_event_handler (fd=9, idx=4, data=0x25897d0, poll_in=1, poll_out=0, poll_err=16) at socket.c:1808 #10 0x00007fcd94fa3640 in event_dispatch_epoll_handler (event_pool=0x24cbdb0, events=0x24f9800, i=0) at event.c:794 #11 0x00007fcd94fa3863 in event_dispatch_epoll (event_pool=0x24cbdb0) at event.c:856 #12 0x00007fcd94fa3bee in event_dispatch (event_pool=0x24cbdb0) at event.c:956 #13 0x000000000040801c in main (argc=21, argv=0x7fff8844c918) at glusterfsd.c:1650
Can you please check if this bug is still valid?
This bug is not reproducible on latest master .