Created attachment 618500 [details] Included log files, strace Description of problem: glusterd hangs while re-balance command started on distributed-stripe volume. Version-Release number of selected component (if applicable): master How reproducible: May not be reproducible on all machines. Steps to Reproduce: 1. Create & start a distributed stripe volume 2. Mount & add some files 3. Add bricks & initiate rebalance Actual results: gluster volume rebalance doesn't succeed. Expected results: gluster volume rebalance should succeed. Additional info: strace, backtrace, logs and volume info files are attached. Backtrace: (gdb) bt #0 0x00007ffaa29ae88d in waitpid () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007ffaa303c60b in runner_end_reuse (runner=0x7fff558c1790) at run.c:345 #2 0x00007ffaa303c750 in runner_run_generic (rfin=0x7ffaa303c5e0 <runner_end_reuse>, runner=0x7fff558c1790) at run.c:386 #3 runner_run_reuse (runner=0x7fff558c1790) at run.c:417 #4 0x00007ffa9f25a277 in glusterd_handle_defrag_start (volinfo=0x235e2c0, op_errstr=<optimized out>, len=<optimized out>, cmd=1, cbk=0) at glusterd-rebalance.c:253 #5 0x00007ffa9f25afb7 in glusterd_op_rebalance (dict=0x7ffaa0e87748, op_errstr=0x7fff558c8198, rsp_dict=<optimized out>) at glusterd-rebalance.c:542 #6 0x00007ffa9f2314ee in glusterd_op_commit_perform (op=<optimized out>, dict=0x7ffaa0e87748, op_errstr=0x7fff558c8198, rsp_dict=0x0) at glusterd-op-sm.c:3021 #7 0x00007ffa9f233346 in glusterd_op_ac_send_commit_op (event=<optimized out>, ctx=<optimized out>) at glusterd-op-sm.c:2325 #8 0x00007ffa9f230233 in glusterd_op_sm () at glusterd-op-sm.c:4602 #9 0x00007ffa9f25aafd in glusterd_handle_defrag_volume (req=0x7ffa9f19402c) at glusterd-rebalance.c:435 #10 0x00007ffaa2dd9be5 in rpcsvc_handle_rpc_call (svc=0x2353b60, trans=<optimized out>, msg=<optimized out>) at rpcsvc.c:535 #11 0x00007ffaa2dda0f3 in rpcsvc_notify (trans=0x237eac0, mydata=<optimized out>, event=<optimized out>, data=0x23992b0) at rpcsvc.c:633 #12 0x00007ffaa2ddd3f7 in rpc_transport_notify (this=<optimized out>, event=<optimized out>, data=<optimized out>) at rpc-transport.c:495 #13 0x00007ffa9ef890a4 in socket_event_poll_in (this=0x237eac0) at socket.c:1986 #14 0x00007ffa9ef89814 in socket_event_handler (fd=<optimized out>, idx=<optimized out>, data=0x237eac0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2098 #15 0x00007ffaa3044e43 in event_dispatch_epoll_handler (i=<optimized out>, events=0x235c4d0, event_pool=0x234eea0) at event-epoll.c:384 #16 event_dispatch_epoll (event_pool=0x234eea0) at event-epoll.c:445 #17 0x00000000004049d1 in main (argc=3, argv=0x7fff558c8538) at glusterfsd.c:1883 (gdb) Volume info: type=1 count=6 status=1 sub_count=2 stripe_count=2 replica_count=1 version=11 transport-type=0 volume-id=01cc6f21-859d-4da1-b041-3f68c1987022 username=b956693c-5d9e-4bc3-86e7-42f9aac726ec password=5fabdb6e-683e-4e7b-aebc-0a6b08c6d3e5 performance.io-cache=off diagnostics.brick-log-level=DEBUG diagnostics.client-log-level=DEBUG performance.io-thread-count=1 performance.quick-read=off performance.write-behind=off performance.read-ahead=off brick-0=vpshastry:-export1-b1 brick-1=vpshastry:-export2-b1 brick-2=vpshastry:-export1-b2 brick-3=vpshastry:-export2-b2 brick-4=vpshastry:-export1-b3 brick-5=vpshastry:-export2-b3 rebalance log was empty.
http://review.gluster.com/4024 fixes the issue on master.