+++ This bug was initially created as a clone of Bug #1577672 +++ The failures are identified due to changes added for proper cleanup sequence. These patches need to be reverted from the branch and retested with mux enabled.
REVIEW: https://review.gluster.org/20059 (Revert \"gluster: Sometimes Brick process is crashed at the time of stopping brick\") posted (#3) for review on release-4.1 by Shyamsundar Ranganathan
REVIEW: https://review.gluster.org/20058 (Revert \"server: fix unresolved symbols by moving them to libglusterfs\") posted (#3) for review on release-4.1 by Shyamsundar Ranganathan
REVIEW: https://review.gluster.org/20060 (Revert \"glusterfsd: Memleak in glusterfsd process while brick mux is on\") posted (#3) for review on release-4.1 by Shyamsundar Ranganathan
Failures post reverting the above patches (runs of mux regression on https://review.gluster.org/#/c/20060/3 ) show the following errors: ## Tests that failed: ### Cores: ./tests/bugs/bug-1371806.t (Fix 1) ./tests/basic/mgmt_v3-locks.t (Problem 1) ./tests/basic/tier/tier.t (Problem 1 & 2) ./tests/bugs/bug-1371806_3.t (Fix 1) ### Failed: ./tests/bugs/fuse/bug-1309462.t ./tests/bugs/glusterd/rebalance-operations-in-single-node.t ./tests/basic/volume-snapshot-xml.t ### Test runs and cores data: #### Problem 1: glusterd crash on cleanup and exit path ===> https://build.gluster.org/job/regression-on-demand-multiplex/45/console <=== (4.1) similar: ===> https://build.gluster.org/job/regression-on-demand-multiplex/47/console <=== (4.1) ``` 16:58:06 1 test(s) failed 16:58:06 ./tests/bugs/glusterd/rebalance-operations-in-single-node.t 16:58:06 16:58:06 1 test(s) generated core 16:58:06 ./tests/basic/mgmt_v3-locks.t 16:58:06 16:58:06 2 test(s) needed retry 16:58:06 ./tests/basic/tier/tier.t 16:58:06 ./tests/bugs/glusterd/rebalance-operations-in-single-node.t 16:58:07 Core was generated by `glusterd --xlator-option management.working-directory=/d/backends/3/glusterd --'. 16:58:07 Program terminated with signal 6, Aborted. 16:58:07 #0 0x00007f9eb4d47277 in raise () from /lib64/libc.so.6 16:58:07 Thread 1 (Thread 0x7f9ea65cc700 (LWP 25880)): 16:58:07 #0 0x00007f9eb4d47277 in raise () from /lib64/libc.so.6 16:58:07 No symbol table info available. 16:58:07 #1 0x00007f9eb4d48968 in abort () from /lib64/libc.so.6 16:58:07 No symbol table info available. 16:58:07 #2 0x00007f9eb4d40096 in __assert_fail_base () from /lib64/libc.so.6 16:58:07 No symbol table info available. 16:58:07 #3 0x00007f9eb4d40142 in __assert_fail () from /lib64/libc.so.6 16:58:07 No symbol table info available. 16:58:07 #4 0x00007f9eb676e6a4 in event_unregister_epoll_common (event_pool=0x1666c30, fd=6, idx=-1, do_close=1) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:409 16:58:07 ret = -1 16:58:07 slot = 0x16ac260 16:58:07 __FUNCTION__ = "event_unregister_epoll_common" 16:58:07 __PRETTY_FUNCTION__ = "event_unregister_epoll_common" 16:58:07 #5 0x00007f9eb676e843 in event_unregister_close_epoll (event_pool=0x1666c30, fd=6, idx_hint=-1) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:453 16:58:07 ret = -1 16:58:07 #6 0x00007f9eb672cdcd in event_unregister_close (event_pool=0x1666c30, fd=6, idx=-1) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event.c:95 16:58:07 ret = -1 16:58:07 __FUNCTION__ = "event_unregister_close" 16:58:07 #7 0x00007f9ea837152b in __socket_reset (this=0x7f9e980180a0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-transport/socket/src/socket.c:972 16:58:07 priv = 0x7f9e98018600 16:58:07 __FUNCTION__ = "__socket_reset" 16:58:07 #8 0x00007f9ea837d034 in fini (this=0x7f9e980180a0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-transport/socket/src/socket.c:4710 16:58:07 priv = 0x7f9e98018600 16:58:07 __FUNCTION__ = "fini" 16:58:07 #9 0x00007f9eb64b9479 in rpc_transport_destroy (this=0x7f9e980180a0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpc-transport.c:470 16:58:07 ret = -1 16:58:07 __FUNCTION__ = "rpc_transport_destroy" 16:58:07 #10 0x00007f9eb64b968b in rpc_transport_unref (this=0x7f9e980180a0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpc-transport.c:520 16:58:07 refcount = 0 16:58:07 ret = -1 16:58:07 __FUNCTION__ = "rpc_transport_unref" 16:58:07 #11 0x00007f9ea83789b7 in socket_server_event_handler (fd=10, idx=0, gen=1, data=0x168bd60, poll_in=1, poll_out=0, poll_err=0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-transport/socket/src/socket.c:3117 16:58:07 this = 0x168bd60 16:58:07 priv = 0x16abe10 16:58:07 ret = -1 16:58:07 new_sock = 6 16:58:07 new_trans = 0x7f9e980180a0 16:58:07 new_sockaddr = {ss_family = 2, __ss_padding = "\277\345\177\000\000\001", '\000' <repeats 111 times>, __ss_align = 0} 16:58:07 addrlen = 16 16:58:07 new_priv = 0x7f9e98018600 16:58:07 ctx = 0x162f010 16:58:07 cname = 0x0 16:58:07 __FUNCTION__ = "socket_server_event_handler" 16:58:07 #12 0x00007f9eb676ed0c in event_dispatch_epoll_handler (event_pool=0x1666c30, event=0x7f9ea65cbea0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:579 16:58:07 ev_data = 0x7f9ea65cbea4 16:58:07 slot = 0x16ac2c0 16:58:07 handler = 0x7f9ea8377f2f <socket_server_event_handler> 16:58:07 data = 0x168bd60 16:58:07 idx = 0 16:58:07 gen = 1 16:58:07 ret = -1 16:58:07 fd = 10 16:58:07 handled_error_previously = false 16:58:07 __FUNCTION__ = "event_dispatch_epoll_handler" 16:58:07 #13 0x00007f9eb676efff in event_dispatch_epoll_worker (data=0x16c8910) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:655 16:58:07 event = {events = 1, data = {ptr = 0x100000000, fd = 0, u32 = 0, u64 = 4294967296}} 16:58:07 ret = 1 16:58:07 ev_data = 0x16c8910 16:58:07 event_pool = 0x1666c30 16:58:07 myindex = 1 16:58:07 timetodie = 0 16:58:07 __FUNCTION__ = "event_dispatch_epoll_worker" 16:58:07 #14 0x00007f9eb5546e25 in start_thread () from /lib64/libpthread.so.0 16:58:07 No symbol table info available. 16:58:07 #15 0x00007f9eb4e0fbad in clone () from /lib64/libc.so.6 16:58:07 No symbol table info available. ``` #### Problem 2: No clear root cause ===> https://build.gluster.org/job/regression-on-demand-multiplex/50/console <=== (4.1) ``` 07:57:32 Core was generated by `/build/install/sbin/glusterfsd -s builder100.cloud.gluster.org --volfile-id pat'. 07:57:32 Program terminated with signal 11, Segmentation fault. 07:57:32 #0 0x00007f7d6238f85e in default_lookup (frame=0x7f7d500c8f28, this=0x7f7d5002f430, loc=0x7f7d555dd8d0, xdata=0x0) at defaults.c:2714 07:57:32 2714 STACK_WIND_TAIL (frame, 07:57:32 07:57:32 Thread 1 (Thread 0x7f7d555de700 (LWP 28094)): 07:57:32 #0 0x00007f7d6238f85e in default_lookup (frame=0x7f7d500c8f28, this=0x7f7d5002f430, loc=0x7f7d555dd8d0, xdata=0x0) at defaults.c:2714 07:57:32 old_THIS = 0x0 07:57:32 next_xl = 0x7f7d500c8fa0 07:57:32 next_xl_fn = 0x7f7d500008c0 07:57:32 opn = 1 07:57:32 __FUNCTION__ = "default_lookup" 07:57:32 #1 0x00007f7d6230c399 in syncop_lookup (subvol=0x7f7d5002f430, loc=0x7f7d555dd8d0, iatt=0x7f7d555dd830, parent=0x0, xdata_in=0x0, xdata_out=0x0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/syncop.c:1260 07:57:32 _new = 0x7f7d500c8f28 07:57:32 old_THIS = 0x7f7d5002f430 07:57:32 next_xl_fn = 0x7f7d6238f836 <default_lookup> 07:57:32 tmp_cbk = 0x7f7d6230bddd <syncop_lookup_cbk> 07:57:32 task = 0x0 07:57:32 frame = 0x7f7d500c8e18 07:57:32 args = {op_ret = 0, op_errno = 0, iatt1 = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}}, iatt2 = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}}, xattr = 0x0, statvfs_buf = {f_bsize = 0, f_frsize = 0, f_blocks = 0, f_bfree = 0, f_bavail = 0, f_files = 0, f_ffree = 0, f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0, 0, 0, 0}}, vector = 0x0, count = 0, iobref = 0x0, buffer = 0x0, xdata = 0x0, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, lease = {cmd = 0, lease_type = NONE, lease_id = '\000' <repeats 15 times>, lease_flags = 0}, dict_out = 0x0, uuid = '\000' <repeats 15 times>, errstr = 0x0, dict = 0x0, lock_dict = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, barrier = {initialized = false, guard = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, waitq = {next = 0x0, prev = 0x0}, count = 0, waitfor = 0}, task = 0x0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, done = 0, entries = {{list = {next = 0x0, prev = 0x0}, {next = 0x0, prev = 0x0}}, d_ino = 0, d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}}, dict = 0x0, inode = 0x0, d_name = 0x7f7d555dd368 ""}, offset = 0, locklist = {list = {next = 0x0, prev = 0x0}, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, client_uid = 0x0, lk_flags = 0}} 07:57:32 __FUNCTION__ = "syncop_lookup" 07:57:32 #2 0x00007f7d4d6e3105 in server_first_lookup (this=0x7f7d5002f430, client=0x7f7d50093ee0, reply=0x7f7d5008d288) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-handshake.c:382 07:57:32 loc = {path = 0x7f7d4d70a425 "/", name = 0x7f7d4d70a52a "", inode = 0x7f7d50093b98, parent = 0x0, gfid = '\000' <repeats 15 times>, "\001", pargfid = '\000' <repeats 15 times>} 07:57:32 iatt = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}} 07:57:32 dict = 0x0 07:57:32 ret = 0 07:57:32 xl = 0x7f7d5002f430 07:57:32 msg = 0x0 07:57:32 inode = 0x0 07:57:32 bname = 0x0 07:57:32 str = 0x0 07:57:32 tmp = 0x0 07:57:32 saveptr = 0x0 07:57:32 __FUNCTION__ = "server_first_lookup" 07:57:32 #3 0x00007f7d4d6e4ad2 in server_setvolume (req=0x7f7d50002668) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-handshake.c:863 07:57:32 args = {dict = {dict_len = 831, dict_val = 0x7f7d50003650 ""}} 07:57:32 rsp = 0x0 07:57:32 client = 0x7f7d50093ee0 07:57:32 serv_ctx = 0x7f7d50094260 07:57:32 conf = 0x7f7d50036c40 07:57:32 peerinfo = 0x7f7d5008c510 07:57:32 reply = 0x7f7d5008d288 07:57:32 config_params = 0x7f7d5008d0e8 07:57:32 params = 0x7f7d5008dca8 07:57:32 name = 0x7f7d500a9df0 "/d/backends/patchy0" 07:57:32 client_uid = 0x7f7d500a9560 "CTX_ID:417d7b05-569a-40b1-b036-30e9ae7365a9-GRAPH_ID:0-PID:29450-HOST:builder100.cloud.gluster.org-PC_NAME:patchy-client-0-RECON_NO:-0" 07:57:32 clnt_version = 0x7f7d500a9160 "4.1.0alpha" 07:57:32 xl = 0x7f7d5002f430 07:57:32 msg = 0x0 07:57:32 volfile_key = 0x7f7d500a8f50 "rebalance/patchy" 07:57:32 this = 0x7f7d5002f430 07:57:32 checksum = 0 07:57:32 ret = 0 07:57:32 op_ret = 0 07:57:32 op_errno = 22 07:57:32 buf = 0x0 07:57:32 opversion = 40100 07:57:32 xprt = 0x7f7d500367d0 07:57:32 fop_version = 1298437 07:57:32 mgmt_version = 0 07:57:32 ctx = 0x8e6010 07:57:32 tmp = 0x7f7d50032590 07:57:32 subdir_mount = 0x0 07:57:32 client_name = 0x7f7d500a9360 "rebalance" 07:57:32 __FUNCTION__ = "server_setvolume" 07:57:32 __PRETTY_FUNCTION__ = "server_setvolume" 07:57:32 #4 0x00007f7d62078842 in rpcsvc_handle_rpc_call (svc=0x7f7d50044170, trans=0x7f7d5008c450, msg=0x7f7d500cca10) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpcsvc.c:721 07:57:32 actor = 0x7f7d4d91d8c0 <gluster_handshake_actors+64> 07:57:32 actor_fn = 0x7f7d4d6e33f9 <server_setvolume> 07:57:32 req = 0x7f7d50002668 07:57:32 ret = -1 07:57:32 port = 49128 07:57:32 is_unix = false 07:57:32 empty = false 07:57:32 unprivileged = true 07:57:32 reply = 0x0 07:57:32 drc = 0x0 07:57:32 __FUNCTION__ = "rpcsvc_handle_rpc_call" 07:57:32 #5 0x00007f7d62078b95 in rpcsvc_notify (trans=0x7f7d5008c450, mydata=0x7f7d50044170, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f7d500cca10) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpcsvc.c:815 07:57:32 ret = -1 07:57:32 msg = 0x7f7d500cca10 07:57:32 new_trans = 0x0 07:57:32 svc = 0x7f7d50044170 07:57:32 listener = 0x0 07:57:32 __FUNCTION__ = "rpcsvc_notify" 07:57:32 #6 0x00007f7d6207e7ab in rpc_transport_notify (this=0x7f7d5008c450, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f7d500cca10) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpc-transport.c:537 07:57:32 ret = -1 07:57:32 __FUNCTION__ = "rpc_transport_notify" 07:57:32 #7 0x00007f7d56e8fed8 in socket_event_poll_in (this=0x7f7d5008c450, notify_handled=true) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-transport/socket/src/socket.c:2462 07:57:32 ret = 0 07:57:32 pollin = 0x7f7d500cca10 07:57:32 priv = 0x7f7d50002060 07:57:32 ctx = 0x8e6010 07:57:32 #8 0x00007f7d56e90546 in socket_event_handler (fd=9, idx=4, gen=4, data=0x7f7d5008c450, poll_in=1, poll_out=0, poll_err=0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-transport/socket/src/socket.c:2618 07:57:32 this = 0x7f7d5008c450 07:57:32 priv = 0x7f7d50002060 07:57:32 ret = 0 07:57:32 ctx = 0x8e6010 07:57:32 socket_closed = false 07:57:32 notify_handled = false 07:57:32 __FUNCTION__ = "socket_event_handler" 07:57:32 #9 0x00007f7d62333d0c in event_dispatch_epoll_handler (event_pool=0x91dc30, event=0x7f7d555ddea0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:579 07:57:32 ev_data = 0x7f7d555ddea4 07:57:32 slot = 0x94efe0 07:57:32 handler = 0x7f7d56e90278 <socket_event_handler> 07:57:32 data = 0x7f7d5008c450 07:57:32 idx = 4 07:57:32 gen = 4 07:57:32 ret = -1 07:57:32 fd = 9 07:57:32 handled_error_previously = false 07:57:32 __FUNCTION__ = "event_dispatch_epoll_handler" 07:57:32 #10 0x00007f7d62333fff in event_dispatch_epoll_worker (data=0x967f00) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:655 07:57:32 event = {events = 1, data = {ptr = 0x400000004, fd = 4, u32 = 4, u64 = 17179869188}} 07:57:32 ret = 1 07:57:32 ev_data = 0x967f00 07:57:32 event_pool = 0x91dc30 07:57:32 myindex = 1 07:57:32 timetodie = 0 07:57:32 __FUNCTION__ = "event_dispatch_epoll_worker" 07:57:32 #11 0x00007f7d6110be25 in start_thread () from /lib64/libpthread.so.0 07:57:32 No symbol table info available. 07:57:32 #12 0x00007f7d609d4bad in clone () from /lib64/libc.so.6 07:57:32 No symbol table info available. ``` #### Fix 1: https://review.gluster.org/#/c/20060/1..2/xlators/protocol/server/src/server-rpc-fops.c ``` ===> https://build.gluster.org/job/regression-on-demand-multiplex/51/console <=== (4.0) Similar: ===> https://build.gluster.org/job/regression-on-demand-multiplex/53/console <=== (4.0) ===> https://build.gluster.org/job/regression-on-demand-multiplex/52/console <=== (4.0) ===> https://build.gluster.org/job/regression-on-demand-multiplex/46/console <=== (4.1) ===> https://build.gluster.org/job/regression-on-demand-multiplex/47/console <=== (4.1) ===> https://build.gluster.org/job/regression-on-demand-multiplex/49/console <=== (4.1) 18:24:55 Core was generated by `/build/install/sbin/glusterfsd -s builder102.cloud.gluster.org --volfile-id pat'. 18:24:55 Program terminated with signal 11, Segmentation fault. 18:24:55 #0 0x00007f222aea1f17 in server_inode_new (itable=0x0, gfid=0x7f2200c56f40 "") at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-helpers.c:1428 18:24:55 1428 return itable->root; 18:24:55 18:24:55 Thread 1 (Thread 0x7f21d6eee700 (LWP 4721)): 18:24:55 #0 0x00007f222aea1f17 in server_inode_new (itable=0x0, gfid=0x7f2200c56f40 "") at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-helpers.c:1428 18:24:55 No locals. 18:24:55 #1 0x00007f222ae9ab54 in resolve_gfid (frame=0x7f2200677538) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-resolve.c:205 18:24:55 state = 0x7f2200c56e40 18:24:55 this = 0x7f222c02f100 18:24:55 resolve = 0x7f2200c56ed8 18:24:55 resolve_loc = 0x7f2200c56f20 18:24:55 xdata = 0x0 18:24:55 __FUNCTION__ = "resolve_gfid" 18:24:55 #2 0x00007f222ae9b3e0 in server_resolve_entry (frame=0x7f2200677538) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-resolve.c:358 18:24:55 state = 0x7f2200c56e40 18:24:55 ret = 1 18:24:55 loc = 0x7f2200c56e58 18:24:55 #3 0x00007f222ae9b9ca in server_resolve (frame=0x7f2200677538) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-resolve.c:555 18:24:55 state = 0x7f2200c56e40 18:24:55 resolve = 0x7f2200c56ed8 18:24:55 __FUNCTION__ = "server_resolve" 18:24:55 #4 0x00007f222ae9bb86 in server_resolve_all (frame=0x7f2200677538) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-resolve.c:611 18:24:55 state = 0x7f2200c56e40 18:24:55 this = 0x7f222c02f100 18:24:55 __FUNCTION__ = "server_resolve_all" 18:24:55 #5 0x00007f222ae9bc9b in resolve_and_resume (frame=0x7f2200677538, fn=0x7f222aeee1cb <server4_lookup_resume>) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-resolve.c:642 18:24:55 state = 0x7f2200c56e40 18:24:55 #6 0x00007f222aef538f in server4_0_lookup (req=0x7f2220146bf8) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-rpc-fops_v2.c:5404 18:24:55 frame = 0x7f2200677538 18:24:55 state = 0x7f2200c56e40 18:24:55 args = {gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>, "\001", flags = 0, bname = 0x7f21d6eedc80 "tls", xdata = {xdr_size = 312, count = 8, pairs = {pairs_len = 8, pairs_val = 0x7f2200c56860}}} 18:24:55 ret = 0 18:24:55 __FUNCTION__ = "server4_0_lookup" 18:24:55 #7 0x00007f223f8899c9 in rpcsvc_request_handler (arg=0x7f222c045e50) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpcsvc.c:2002 18:24:55 program = 0x7f222c045e50 18:24:55 req = 0x7f2220146bf8 18:24:55 actor = 0x7f222b110140 <glusterfs4_0_fop_actors+1728> 18:24:55 done = false 18:24:55 ret = 0 18:24:55 __FUNCTION__ = "rpcsvc_request_handler" 18:24:55 #8 0x00007f223e91ae25 in start_thread () from /lib64/libpthread.so.0 18:24:55 No symbol table info available. 18:24:55 #9 0x00007f223e1e3bad in clone () from /lib64/libc.so.6 18:24:55 No symbol table info available. ```
Summary of comment #4: Problem 1: Crash in glusterd due to race during cleanup and exit Problem 2: Crash in brick process, while winding lookup due to unintialized(?) rick during first_lookup wind Problem 1 exists from before 4.1 code base, problem 2 looks to be new. @mohit/milind/du, please add more details as required, we may need the information to understand if we have degraded or are at similar failure points as before in the release code base.
Does the full backtrace of the glusterd core show the crash in urcu at the cleanup code path? If yes then this is known and isn’t a blocker. Othereise we need to look into it.
COMMIT: https://review.gluster.org/20058 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- Revert "server: fix unresolved symbols by moving them to libglusterfs" Updates: bz#1582286 This reverts commit 408a6d07ababde234ddeafe16687aacd2b810b42. Change-Id: If8247d7980d698141f47130a3c532b942408ec2b
COMMIT: https://review.gluster.org/20059 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- Revert "gluster: Sometimes Brick process is crashed at the time of stopping brick" Updates: bz#1582286 This reverts commit 0043c63f70776444f69667a4ef9596217ecb42b7. Change-Id: Iab3b4f4a54e122c589e515add93c6effc966b3e0
COMMIT: https://review.gluster.org/20060 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- Revert "glusterfsd: Memleak in glusterfsd process while brick mux is on" Updates: bz#1582286 This reverts commit 7c3cc485054e4ede1efb358552135b432fb7047a. Change-Id: I831d646112bcfa13d0c2153482ad00ff1b23aa6c Signed-off-by: Mohit Agrawal <moagrawa>
(In reply to Atin Mukherjee from comment #6) > Does the full backtrace of the glusterd core show the crash in urcu at the > cleanup code path? If yes then this is known and isn’t a blocker. Othereise > we need to look into it. The stack is as in Problem 1 as detailed above. The crash is in https://github.com/gluster/glusterfs/blob/release-4.1/libglusterfs/src/event-epoll.c#L407 (an assert failure). It does not seem to be a urcu related crash. Here is another thread from the same dump that shows cleanup/exit is in progress ( https://build.gluster.org/job/regression-on-demand-multiplex/47/console ), same can be seen in the dump @ https://build.gluster.org/job/regression-on-demand-multiplex/45/console For the record, in the last 5 runs this has not reappeared. 16:58:07 Thread 8 (Thread 0x7f9ead4e6700 (LWP 25850)): 16:58:07 #0 0x00007f9eb4df4a47 in sched_yield () from /lib64/libc.so.6 16:58:07 No symbol table info available. 16:58:07 #1 0x00007f9eb4d8ec09 in _IO_cleanup () from /lib64/libc.so.6 16:58:07 No symbol table info available. 16:58:07 #2 0x00007f9eb4d4ab8b in __run_exit_handlers () from /lib64/libc.so.6 16:58:07 No symbol table info available. 16:58:07 #3 0x00007f9eb4d4ac27 in exit () from /lib64/libc.so.6 16:58:07 No symbol table info available. 16:58:07 #4 0x0000000000408d61 in ?? () 16:58:07 No symbol table info available. 16:58:07 #5 0x0000000000000001 in ?? () 16:58:07 No symbol table info available. 16:58:07 #6 0x00000000000186c0 in ?? () 16:58:07 No symbol table info available. 16:58:07 #7 0x00000000004146b0 in ?? () 16:58:07 No symbol table info available. 16:58:07 #8 0x000000000000000f in ?? () 16:58:07 No symbol table info available. 16:58:07 #9 0x0000000000000000 in ?? () 16:58:07 No symbol table info available.
For specific to glusterd crash, the problem was in socket code. Milind has posted a patch in upstream. For specific to brick crash for tier.t I have updated the test case after stop volume, needs to check volume status, thereafter I am not getting any crash. I executed 5 times "brick-mux regression" on https://review.gluster.org/20087, i did not get any crash. For 4.1 it is ok to go with test case change because we have reverted cleanup patch. Regards Mohit Agrawal
No longer blocks 4.1 as specific code bits are reverted to keep the state of further changes limited to as it was in 4.0.
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report. glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html [2] https://www.gluster.org/pipermail/gluster-users/