Description of problem: testcaes glusterd/add-brick-and-validate-replicated-volume-options.t is crash while brick_mux is enable Version-Release number of selected component (if applicable): How reproducible: Allways Steps to Reproduce: 1.Enable brick_mux in add-brick-and-validate-replicated-volume-options.t 2.Run .t in a loop after some attempt .t is crash 3. Actual results: Test case is crash. Expected results: Test case should not crash Additional info:
Hi, test case is generating below crash after just call kill_brick. #0 0x0000560df20e821b in STACK_DESTROY (stack=0x3) at ../../libglusterfs/src/glusterfs/stack.h:182 182 LOCK(&stack->pool->lock); Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.168-8.el7.x86_64 elfutils-libs-0.168-8.el7.x86_64 glibc-2.17-196.el7_4.2.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libacl-2.2.51-12.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7_4.2.x86_64 libselinux-2.5-11.el7.x86_64 libuuid-2.23.2-43.el7_4.2.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-42.el7_4.10.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x0000560df20e821b in STACK_DESTROY (stack=0x3) at ../../libglusterfs/src/glusterfs/stack.h:182 #1 mgmt_pmap_signin_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7fda6802ddb8) at glusterfsd-mgmt.c:2824 #2 0x00007fda86559161 in rpc_clnt_handle_reply (clnt=clnt@entry=0x560df2d53b30, pollin=pollin@entry=0x560df2ea98f0) at rpc-clnt.c:755 #3 0x00007fda865594c7 in rpc_clnt_notify (trans=0x560df2d53e50, mydata=0x560df2d53b60, event=<optimized out>, data=0x560df2ea98f0) at rpc-clnt.c:922 #4 0x00007fda86555b33 in rpc_transport_notify (this=this@entry=0x560df2d53e50, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x560df2ea98f0) at rpc-transport.c:541 #5 0x00007fda7ab7f95d in socket_event_poll_in (notify_handled=true, this=0x560df2d53e50) at socket.c:2516 #6 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x560df2d53e50, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2918 #7 0x00007fda86814e15 in event_dispatch_epoll_handler (event=0x7fda34ff8e70, event_pool=0x560df2d03560) at event-epoll.c:642 #8 event_dispatch_epoll_worker (data=0x7fda40054740) at event-epoll.c:756 #9 0x00007fda855eee25 in start_thread () from /usr/lib64/libpthread.so.0 #10 0x00007fda84ebb34d in clone () from /usr/lib64/libc.so.6 (gdb) f 1 #1 mgmt_pmap_signin_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7fda6802ddb8) at glusterfsd-mgmt.c:2824 2824 STACK_DESTROY(frame->root); (gdb) p frame $1 = (call_frame_t *) 0x7fda6802ddb8 (gdb) p *frame $2 = {root = 0x4, parent = 0x400000001, frames = {next = 0xffffffffffffffff, prev = 0x7fda6802de18}, local = 0x7fda68059478, this = 0x0, ret = 0x0, ref_count = 0, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x7fda68059478}}, __size = '\000' <repeats 32 times>, "x\224\005h\332\177\000", __align = 0}}, cookie = 0x0, complete = 232, op = 32730, begin = {tv_sec = 0, tv_nsec = 140576024633944}, end = {tv_sec = 140576024654704, tv_nsec = 1125216510}, wind_from = 0x1 <Address 0x1 out of bounds>, wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0} (gdb) p frame->root $3 = (call_stack_t *) 0x4 (gdb) After checked the code I have found the current glusterfs_mgmt_pmap_signin code is not perfect to send signin request. It uses same frame to send multiple requests. >>>>>>> ....... ....... if (ctx->active) { top = ctx->active->first; for (trav_p = &top->children; *trav_p; trav_p = &(*trav_p)->next) { req.brick = (*trav_p)->xlator->name; ret = mgmt_submit_request(&req, frame, ctx, &clnt_pmap_prog, GF_PMAP_SIGNIN, mgmt_pmap_signin_cbk, (xdrproc_t)xdr_pmap_signin_req); if (ret < 0) { gf_log(THIS->name, GF_LOG_WARNING, "failed to send sign in request; brick = %s", req.brick); } count++; } } else { ret = mgmt_submit_request(&req, frame, ctx, &clnt_pmap_prog, GF_PMAP_SIGNIN, mgmt_pmap_signin_cbk, (xdrproc_t)xdr_pmap_signin_req); } >>>>>>>>>>>>> Thanks, Mohit Agrawal
REVIEW: https://review.gluster.org/22015 (core: glusterd/add-brick-and-validate-replicated-volume-options.t is crash) posted (#1) for review on master by MOHIT AGRAWAL
REVIEW: https://review.gluster.org/22015 (core: glusterd/add-brick-and-validate-replicated-volume-options.t is crash) merged (#3) on master by Amar Tumballi
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report. glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html [2] https://www.gluster.org/pipermail/gluster-users/