Bug 1582286 - Brick-mux regressions failing on master branch
Summary: Brick-mux regressions failing on master branch
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: unclassified
Version: mainline
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1582704 1583734
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-24 18:01 UTC by Shyamsundar
Modified: 2018-06-20 18:07 UTC (History)
5 users (show)

Fixed In Version: glusterfs-v4.1.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1577672
Environment:
Last Closed: 2018-06-20 18:07:14 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Shyamsundar 2018-05-24 18:01:41 UTC
+++ This bug was initially created as a clone of Bug #1577672 +++

The failures are identified due to changes added for proper cleanup sequence. These patches need to be reverted from the branch and retested with mux enabled.

Comment 1 Worker Ant 2018-05-24 18:03:19 UTC
REVIEW: https://review.gluster.org/20059 (Revert \"gluster: Sometimes Brick process is crashed at the time of stopping brick\") posted (#3) for review on release-4.1 by Shyamsundar Ranganathan

Comment 2 Worker Ant 2018-05-24 18:04:18 UTC
REVIEW: https://review.gluster.org/20058 (Revert \"server: fix unresolved symbols by moving them to libglusterfs\") posted (#3) for review on release-4.1 by Shyamsundar Ranganathan

Comment 3 Worker Ant 2018-05-24 18:05:18 UTC
REVIEW: https://review.gluster.org/20060 (Revert \"glusterfsd: Memleak in glusterfsd process while  brick mux is on\") posted (#3) for review on release-4.1 by Shyamsundar Ranganathan

Comment 4 Shyamsundar 2018-05-24 18:07:47 UTC
Failures post reverting the above patches (runs of mux regression on https://review.gluster.org/#/c/20060/3 ) show the following errors:

## Tests that failed:

### Cores:
./tests/bugs/bug-1371806.t (Fix 1)
./tests/basic/mgmt_v3-locks.t (Problem 1)
./tests/basic/tier/tier.t (Problem 1 & 2)
./tests/bugs/bug-1371806_3.t (Fix 1)

### Failed:
./tests/bugs/fuse/bug-1309462.t
./tests/bugs/glusterd/rebalance-operations-in-single-node.t
./tests/basic/volume-snapshot-xml.t

### Test runs and cores data:

#### Problem 1: glusterd crash on cleanup and exit path
===> https://build.gluster.org/job/regression-on-demand-multiplex/45/console <=== (4.1)
similar:
===> https://build.gluster.org/job/regression-on-demand-multiplex/47/console <=== (4.1)

```
16:58:06 1 test(s) failed 
16:58:06 ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
16:58:06 
16:58:06 1 test(s) generated core 
16:58:06 ./tests/basic/mgmt_v3-locks.t
16:58:06 
16:58:06 2 test(s) needed retry 
16:58:06 ./tests/basic/tier/tier.t
16:58:06 ./tests/bugs/glusterd/rebalance-operations-in-single-node.t

16:58:07 Core was generated by `glusterd --xlator-option management.working-directory=/d/backends/3/glusterd --'.
16:58:07 Program terminated with signal 6, Aborted.
16:58:07 #0  0x00007f9eb4d47277 in raise () from /lib64/libc.so.6

16:58:07 Thread 1 (Thread 0x7f9ea65cc700 (LWP 25880)):
16:58:07 #0  0x00007f9eb4d47277 in raise () from /lib64/libc.so.6
16:58:07 No symbol table info available.
16:58:07 #1  0x00007f9eb4d48968 in abort () from /lib64/libc.so.6
16:58:07 No symbol table info available.
16:58:07 #2  0x00007f9eb4d40096 in __assert_fail_base () from /lib64/libc.so.6
16:58:07 No symbol table info available.
16:58:07 #3  0x00007f9eb4d40142 in __assert_fail () from /lib64/libc.so.6
16:58:07 No symbol table info available.
16:58:07 #4  0x00007f9eb676e6a4 in event_unregister_epoll_common (event_pool=0x1666c30, fd=6, idx=-1, do_close=1) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:409
16:58:07         ret = -1
16:58:07         slot = 0x16ac260
16:58:07         __FUNCTION__ = "event_unregister_epoll_common"
16:58:07         __PRETTY_FUNCTION__ = "event_unregister_epoll_common"
16:58:07 #5  0x00007f9eb676e843 in event_unregister_close_epoll (event_pool=0x1666c30, fd=6, idx_hint=-1) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:453
16:58:07         ret = -1
16:58:07 #6  0x00007f9eb672cdcd in event_unregister_close (event_pool=0x1666c30, fd=6, idx=-1) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event.c:95
16:58:07         ret = -1
16:58:07         __FUNCTION__ = "event_unregister_close"
16:58:07 #7  0x00007f9ea837152b in __socket_reset (this=0x7f9e980180a0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-transport/socket/src/socket.c:972
16:58:07         priv = 0x7f9e98018600
16:58:07         __FUNCTION__ = "__socket_reset"
16:58:07 #8  0x00007f9ea837d034 in fini (this=0x7f9e980180a0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-transport/socket/src/socket.c:4710
16:58:07         priv = 0x7f9e98018600
16:58:07         __FUNCTION__ = "fini"
16:58:07 #9  0x00007f9eb64b9479 in rpc_transport_destroy (this=0x7f9e980180a0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpc-transport.c:470
16:58:07         ret = -1
16:58:07         __FUNCTION__ = "rpc_transport_destroy"
16:58:07 #10 0x00007f9eb64b968b in rpc_transport_unref (this=0x7f9e980180a0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpc-transport.c:520
16:58:07         refcount = 0
16:58:07         ret = -1
16:58:07         __FUNCTION__ = "rpc_transport_unref"
16:58:07 #11 0x00007f9ea83789b7 in socket_server_event_handler (fd=10, idx=0, gen=1, data=0x168bd60, poll_in=1, poll_out=0, poll_err=0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-transport/socket/src/socket.c:3117
16:58:07         this = 0x168bd60
16:58:07         priv = 0x16abe10
16:58:07         ret = -1
16:58:07         new_sock = 6
16:58:07         new_trans = 0x7f9e980180a0
16:58:07         new_sockaddr = {ss_family = 2, __ss_padding = "\277\345\177\000\000\001", '\000' <repeats 111 times>, __ss_align = 0}
16:58:07         addrlen = 16
16:58:07         new_priv = 0x7f9e98018600
16:58:07         ctx = 0x162f010
16:58:07         cname = 0x0
16:58:07         __FUNCTION__ = "socket_server_event_handler"
16:58:07 #12 0x00007f9eb676ed0c in event_dispatch_epoll_handler (event_pool=0x1666c30, event=0x7f9ea65cbea0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:579
16:58:07         ev_data = 0x7f9ea65cbea4
16:58:07         slot = 0x16ac2c0
16:58:07         handler = 0x7f9ea8377f2f <socket_server_event_handler>
16:58:07         data = 0x168bd60
16:58:07         idx = 0
16:58:07         gen = 1
16:58:07         ret = -1
16:58:07         fd = 10
16:58:07         handled_error_previously = false
16:58:07         __FUNCTION__ = "event_dispatch_epoll_handler"
16:58:07 #13 0x00007f9eb676efff in event_dispatch_epoll_worker (data=0x16c8910) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:655
16:58:07         event = {events = 1, data = {ptr = 0x100000000, fd = 0, u32 = 0, u64 = 4294967296}}
16:58:07         ret = 1
16:58:07         ev_data = 0x16c8910
16:58:07         event_pool = 0x1666c30
16:58:07         myindex = 1
16:58:07         timetodie = 0
16:58:07         __FUNCTION__ = "event_dispatch_epoll_worker"
16:58:07 #14 0x00007f9eb5546e25 in start_thread () from /lib64/libpthread.so.0
16:58:07 No symbol table info available.
16:58:07 #15 0x00007f9eb4e0fbad in clone () from /lib64/libc.so.6
16:58:07 No symbol table info available.
```

#### Problem 2: No clear root cause
===> https://build.gluster.org/job/regression-on-demand-multiplex/50/console <=== (4.1)
```
07:57:32 Core was generated by `/build/install/sbin/glusterfsd -s builder100.cloud.gluster.org --volfile-id pat'.
07:57:32 Program terminated with signal 11, Segmentation fault.
07:57:32 #0  0x00007f7d6238f85e in default_lookup (frame=0x7f7d500c8f28, this=0x7f7d5002f430, loc=0x7f7d555dd8d0, xdata=0x0) at defaults.c:2714
07:57:32 2714		STACK_WIND_TAIL (frame,
07:57:32 

07:57:32 Thread 1 (Thread 0x7f7d555de700 (LWP 28094)):
07:57:32 #0  0x00007f7d6238f85e in default_lookup (frame=0x7f7d500c8f28, this=0x7f7d5002f430, loc=0x7f7d555dd8d0, xdata=0x0) at defaults.c:2714
07:57:32         old_THIS = 0x0
07:57:32         next_xl = 0x7f7d500c8fa0
07:57:32         next_xl_fn = 0x7f7d500008c0
07:57:32         opn = 1
07:57:32         __FUNCTION__ = "default_lookup"
07:57:32 #1  0x00007f7d6230c399 in syncop_lookup (subvol=0x7f7d5002f430, loc=0x7f7d555dd8d0, iatt=0x7f7d555dd830, parent=0x0, xdata_in=0x0, xdata_out=0x0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/syncop.c:1260
07:57:32         _new = 0x7f7d500c8f28
07:57:32         old_THIS = 0x7f7d5002f430
07:57:32         next_xl_fn = 0x7f7d6238f836 <default_lookup>
07:57:32         tmp_cbk = 0x7f7d6230bddd <syncop_lookup_cbk>
07:57:32         task = 0x0
07:57:32         frame = 0x7f7d500c8e18
07:57:32         args = {op_ret = 0, op_errno = 0, iatt1 = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}}, iatt2 = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}}, xattr = 0x0, statvfs_buf = {f_bsize = 0, f_frsize = 0, f_blocks = 0, f_bfree = 0, f_bavail = 0, f_files = 0, f_ffree = 0, f_favail = 0, f_fsid = 0, f_flag = 0, f_namemax = 0, __f_spare = {0, 0, 0, 0, 0, 0}}, vector = 0x0, count = 0, iobref = 0x0, buffer = 0x0, xdata = 0x0, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, lease = {cmd = 0, lease_type = NONE, lease_id = '\000' <repeats 15 times>, lease_flags = 0}, dict_out = 0x0, uuid = '\000' <repeats 15 times>, errstr = 0x0, dict = 0x0, lock_dict = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, barrier = {initialized = false, guard = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, waitq = {next = 0x0, prev = 0x0}, count = 0, waitfor = 0}, task = 0x0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, done = 0, entries = {{list = {next = 0x0, prev = 0x0}, {next = 0x0, prev = 0x0}}, d_ino = 0, d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}}, dict = 0x0, inode = 0x0, d_name = 0x7f7d555dd368 ""}, offset = 0, locklist = {list = {next = 0x0, prev = 0x0}, flock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, client_uid = 0x0, lk_flags = 0}}
07:57:32         __FUNCTION__ = "syncop_lookup"
07:57:32 #2  0x00007f7d4d6e3105 in server_first_lookup (this=0x7f7d5002f430, client=0x7f7d50093ee0, reply=0x7f7d5008d288) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-handshake.c:382
07:57:32         loc = {path = 0x7f7d4d70a425 "/", name = 0x7f7d4d70a52a "", inode = 0x7f7d50093b98, parent = 0x0, gfid = '\000' <repeats 15 times>, "\001", pargfid = '\000' <repeats 15 times>}
07:57:32         iatt = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' <repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}}
07:57:32         dict = 0x0
07:57:32         ret = 0
07:57:32         xl = 0x7f7d5002f430
07:57:32         msg = 0x0
07:57:32         inode = 0x0
07:57:32         bname = 0x0
07:57:32         str = 0x0
07:57:32         tmp = 0x0
07:57:32         saveptr = 0x0
07:57:32         __FUNCTION__ = "server_first_lookup"
07:57:32 #3  0x00007f7d4d6e4ad2 in server_setvolume (req=0x7f7d50002668) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-handshake.c:863
07:57:32         args = {dict = {dict_len = 831, dict_val = 0x7f7d50003650 ""}}
07:57:32         rsp = 0x0
07:57:32         client = 0x7f7d50093ee0
07:57:32         serv_ctx = 0x7f7d50094260
07:57:32         conf = 0x7f7d50036c40
07:57:32         peerinfo = 0x7f7d5008c510
07:57:32         reply = 0x7f7d5008d288
07:57:32         config_params = 0x7f7d5008d0e8
07:57:32         params = 0x7f7d5008dca8
07:57:32         name = 0x7f7d500a9df0 "/d/backends/patchy0"
07:57:32         client_uid = 0x7f7d500a9560 "CTX_ID:417d7b05-569a-40b1-b036-30e9ae7365a9-GRAPH_ID:0-PID:29450-HOST:builder100.cloud.gluster.org-PC_NAME:patchy-client-0-RECON_NO:-0"
07:57:32         clnt_version = 0x7f7d500a9160 "4.1.0alpha"
07:57:32         xl = 0x7f7d5002f430
07:57:32         msg = 0x0
07:57:32         volfile_key = 0x7f7d500a8f50 "rebalance/patchy"
07:57:32         this = 0x7f7d5002f430
07:57:32         checksum = 0
07:57:32         ret = 0
07:57:32         op_ret = 0
07:57:32         op_errno = 22
07:57:32         buf = 0x0
07:57:32         opversion = 40100
07:57:32         xprt = 0x7f7d500367d0
07:57:32         fop_version = 1298437
07:57:32         mgmt_version = 0
07:57:32         ctx = 0x8e6010
07:57:32         tmp = 0x7f7d50032590
07:57:32         subdir_mount = 0x0
07:57:32         client_name = 0x7f7d500a9360 "rebalance"
07:57:32         __FUNCTION__ = "server_setvolume"
07:57:32         __PRETTY_FUNCTION__ = "server_setvolume"
07:57:32 #4  0x00007f7d62078842 in rpcsvc_handle_rpc_call (svc=0x7f7d50044170, trans=0x7f7d5008c450, msg=0x7f7d500cca10) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpcsvc.c:721
07:57:32         actor = 0x7f7d4d91d8c0 <gluster_handshake_actors+64>
07:57:32         actor_fn = 0x7f7d4d6e33f9 <server_setvolume>
07:57:32         req = 0x7f7d50002668
07:57:32         ret = -1
07:57:32         port = 49128
07:57:32         is_unix = false
07:57:32         empty = false
07:57:32         unprivileged = true
07:57:32         reply = 0x0
07:57:32         drc = 0x0
07:57:32         __FUNCTION__ = "rpcsvc_handle_rpc_call"
07:57:32 #5  0x00007f7d62078b95 in rpcsvc_notify (trans=0x7f7d5008c450, mydata=0x7f7d50044170, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f7d500cca10) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpcsvc.c:815
07:57:32         ret = -1
07:57:32         msg = 0x7f7d500cca10
07:57:32         new_trans = 0x0
07:57:32         svc = 0x7f7d50044170
07:57:32         listener = 0x0
07:57:32         __FUNCTION__ = "rpcsvc_notify"
07:57:32 #6  0x00007f7d6207e7ab in rpc_transport_notify (this=0x7f7d5008c450, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f7d500cca10) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpc-transport.c:537
07:57:32         ret = -1
07:57:32         __FUNCTION__ = "rpc_transport_notify"
07:57:32 #7  0x00007f7d56e8fed8 in socket_event_poll_in (this=0x7f7d5008c450, notify_handled=true) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-transport/socket/src/socket.c:2462
07:57:32         ret = 0
07:57:32         pollin = 0x7f7d500cca10
07:57:32         priv = 0x7f7d50002060
07:57:32         ctx = 0x8e6010
07:57:32 #8  0x00007f7d56e90546 in socket_event_handler (fd=9, idx=4, gen=4, data=0x7f7d5008c450, poll_in=1, poll_out=0, poll_err=0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-transport/socket/src/socket.c:2618
07:57:32         this = 0x7f7d5008c450
07:57:32         priv = 0x7f7d50002060
07:57:32         ret = 0
07:57:32         ctx = 0x8e6010
07:57:32         socket_closed = false
07:57:32         notify_handled = false
07:57:32         __FUNCTION__ = "socket_event_handler"
07:57:32 #9  0x00007f7d62333d0c in event_dispatch_epoll_handler (event_pool=0x91dc30, event=0x7f7d555ddea0) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:579
07:57:32         ev_data = 0x7f7d555ddea4
07:57:32         slot = 0x94efe0
07:57:32         handler = 0x7f7d56e90278 <socket_event_handler>
07:57:32         data = 0x7f7d5008c450
07:57:32         idx = 4
07:57:32         gen = 4
07:57:32         ret = -1
07:57:32         fd = 9
07:57:32         handled_error_previously = false
07:57:32         __FUNCTION__ = "event_dispatch_epoll_handler"
07:57:32 #10 0x00007f7d62333fff in event_dispatch_epoll_worker (data=0x967f00) at /home/jenkins/root/workspace/regression-on-demand-multiplex/libglusterfs/src/event-epoll.c:655
07:57:32         event = {events = 1, data = {ptr = 0x400000004, fd = 4, u32 = 4, u64 = 17179869188}}
07:57:32         ret = 1
07:57:32         ev_data = 0x967f00
07:57:32         event_pool = 0x91dc30
07:57:32         myindex = 1
07:57:32         timetodie = 0
07:57:32         __FUNCTION__ = "event_dispatch_epoll_worker"
07:57:32 #11 0x00007f7d6110be25 in start_thread () from /lib64/libpthread.so.0
07:57:32 No symbol table info available.
07:57:32 #12 0x00007f7d609d4bad in clone () from /lib64/libc.so.6
07:57:32 No symbol table info available.
```

#### Fix 1: https://review.gluster.org/#/c/20060/1..2/xlators/protocol/server/src/server-rpc-fops.c
```
===> https://build.gluster.org/job/regression-on-demand-multiplex/51/console <=== (4.0)
Similar:
===> https://build.gluster.org/job/regression-on-demand-multiplex/53/console <=== (4.0)
===> https://build.gluster.org/job/regression-on-demand-multiplex/52/console <=== (4.0)
===> https://build.gluster.org/job/regression-on-demand-multiplex/46/console <=== (4.1)
===> https://build.gluster.org/job/regression-on-demand-multiplex/47/console <=== (4.1)
===> https://build.gluster.org/job/regression-on-demand-multiplex/49/console <=== (4.1)

18:24:55 Core was generated by `/build/install/sbin/glusterfsd -s builder102.cloud.gluster.org --volfile-id pat'.
18:24:55 Program terminated with signal 11, Segmentation fault.
18:24:55 #0  0x00007f222aea1f17 in server_inode_new (itable=0x0, gfid=0x7f2200c56f40 "") at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-helpers.c:1428
18:24:55 1428	                return itable->root;
18:24:55 

18:24:55 Thread 1 (Thread 0x7f21d6eee700 (LWP 4721)):
18:24:55 #0  0x00007f222aea1f17 in server_inode_new (itable=0x0, gfid=0x7f2200c56f40 "") at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-helpers.c:1428
18:24:55 No locals.
18:24:55 #1  0x00007f222ae9ab54 in resolve_gfid (frame=0x7f2200677538) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-resolve.c:205
18:24:55         state = 0x7f2200c56e40
18:24:55         this = 0x7f222c02f100
18:24:55         resolve = 0x7f2200c56ed8
18:24:55         resolve_loc = 0x7f2200c56f20
18:24:55         xdata = 0x0
18:24:55         __FUNCTION__ = "resolve_gfid"
18:24:55 #2  0x00007f222ae9b3e0 in server_resolve_entry (frame=0x7f2200677538) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-resolve.c:358
18:24:55         state = 0x7f2200c56e40
18:24:55         ret = 1
18:24:55         loc = 0x7f2200c56e58
18:24:55 #3  0x00007f222ae9b9ca in server_resolve (frame=0x7f2200677538) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-resolve.c:555
18:24:55         state = 0x7f2200c56e40
18:24:55         resolve = 0x7f2200c56ed8
18:24:55         __FUNCTION__ = "server_resolve"
18:24:55 #4  0x00007f222ae9bb86 in server_resolve_all (frame=0x7f2200677538) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-resolve.c:611
18:24:55         state = 0x7f2200c56e40
18:24:55         this = 0x7f222c02f100
18:24:55         __FUNCTION__ = "server_resolve_all"
18:24:55 #5  0x00007f222ae9bc9b in resolve_and_resume (frame=0x7f2200677538, fn=0x7f222aeee1cb <server4_lookup_resume>) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-resolve.c:642
18:24:55         state = 0x7f2200c56e40
18:24:55 #6  0x00007f222aef538f in server4_0_lookup (req=0x7f2220146bf8) at /home/jenkins/root/workspace/regression-on-demand-multiplex/xlators/protocol/server/src/server-rpc-fops_v2.c:5404
18:24:55         frame = 0x7f2200677538
18:24:55         state = 0x7f2200c56e40
18:24:55         args = {gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>, "\001", flags = 0, bname = 0x7f21d6eedc80 "tls", xdata = {xdr_size = 312, count = 8, pairs = {pairs_len = 8, pairs_val = 0x7f2200c56860}}}
18:24:55         ret = 0
18:24:55         __FUNCTION__ = "server4_0_lookup"
18:24:55 #7  0x00007f223f8899c9 in rpcsvc_request_handler (arg=0x7f222c045e50) at /home/jenkins/root/workspace/regression-on-demand-multiplex/rpc/rpc-lib/src/rpcsvc.c:2002
18:24:55         program = 0x7f222c045e50
18:24:55         req = 0x7f2220146bf8
18:24:55         actor = 0x7f222b110140 <glusterfs4_0_fop_actors+1728>
18:24:55         done = false
18:24:55         ret = 0
18:24:55         __FUNCTION__ = "rpcsvc_request_handler"
18:24:55 #8  0x00007f223e91ae25 in start_thread () from /lib64/libpthread.so.0
18:24:55 No symbol table info available.
18:24:55 #9  0x00007f223e1e3bad in clone () from /lib64/libc.so.6
18:24:55 No symbol table info available.
```

Comment 5 Shyamsundar 2018-05-24 18:35:41 UTC
Summary of comment #4:

Problem 1: Crash in glusterd due to race during cleanup and exit

Problem 2: Crash in brick process, while winding lookup due to unintialized(?) rick during first_lookup wind

Problem 1 exists from before 4.1 code base, problem 2 looks to be new.

@mohit/milind/du, please add more details as required, we may need the information to understand if we have degraded or are at similar failure points as before in the release code base.

Comment 6 Atin Mukherjee 2018-05-25 01:49:47 UTC
Does the full backtrace of the glusterd core show the crash in urcu at the cleanup code path? If yes then this is known and isn’t a blocker. Othereise we need to look into it.

Comment 7 Worker Ant 2018-05-25 02:06:05 UTC
COMMIT: https://review.gluster.org/20058 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- Revert "server: fix unresolved symbols by moving them to libglusterfs"

Updates: bz#1582286
This reverts commit 408a6d07ababde234ddeafe16687aacd2b810b42.
Change-Id: If8247d7980d698141f47130a3c532b942408ec2b

Comment 8 Worker Ant 2018-05-25 02:06:26 UTC
COMMIT: https://review.gluster.org/20059 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- Revert "gluster: Sometimes Brick process is crashed at the time of stopping brick"

Updates: bz#1582286
This reverts commit 0043c63f70776444f69667a4ef9596217ecb42b7.
Change-Id: Iab3b4f4a54e122c589e515add93c6effc966b3e0

Comment 9 Worker Ant 2018-05-25 02:06:46 UTC
COMMIT: https://review.gluster.org/20060 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- Revert "glusterfsd: Memleak in glusterfsd process while  brick mux is on"

Updates: bz#1582286
This reverts commit 7c3cc485054e4ede1efb358552135b432fb7047a.
Change-Id: I831d646112bcfa13d0c2153482ad00ff1b23aa6c
Signed-off-by: Mohit Agrawal <moagrawa>

Comment 10 Shyamsundar 2018-05-25 14:52:33 UTC
(In reply to Atin Mukherjee from comment #6)
> Does the full backtrace of the glusterd core show the crash in urcu at the
> cleanup code path? If yes then this is known and isn’t a blocker. Othereise
> we need to look into it.

The stack is as in Problem 1 as detailed above.

The crash is in https://github.com/gluster/glusterfs/blob/release-4.1/libglusterfs/src/event-epoll.c#L407 (an assert failure).

It does not seem to be a urcu related crash.

Here is another thread from the same dump that shows cleanup/exit is in progress ( https://build.gluster.org/job/regression-on-demand-multiplex/47/console ), same can be seen in the dump @ https://build.gluster.org/job/regression-on-demand-multiplex/45/console

For the record, in the last 5 runs this has not reappeared.

16:58:07 Thread 8 (Thread 0x7f9ead4e6700 (LWP 25850)):
16:58:07 #0  0x00007f9eb4df4a47 in sched_yield () from /lib64/libc.so.6
16:58:07 No symbol table info available.
16:58:07 #1  0x00007f9eb4d8ec09 in _IO_cleanup () from /lib64/libc.so.6
16:58:07 No symbol table info available.
16:58:07 #2  0x00007f9eb4d4ab8b in __run_exit_handlers () from /lib64/libc.so.6
16:58:07 No symbol table info available.
16:58:07 #3  0x00007f9eb4d4ac27 in exit () from /lib64/libc.so.6
16:58:07 No symbol table info available.
16:58:07 #4  0x0000000000408d61 in ?? ()
16:58:07 No symbol table info available.
16:58:07 #5  0x0000000000000001 in ?? ()
16:58:07 No symbol table info available.
16:58:07 #6  0x00000000000186c0 in ?? ()
16:58:07 No symbol table info available.
16:58:07 #7  0x00000000004146b0 in ?? ()
16:58:07 No symbol table info available.
16:58:07 #8  0x000000000000000f in ?? ()
16:58:07 No symbol table info available.
16:58:07 #9  0x0000000000000000 in ?? ()
16:58:07 No symbol table info available.

Comment 11 Mohit Agrawal 2018-05-28 02:32:36 UTC
For specific to glusterd crash, the problem was in socket code. 
Milind has posted a patch in upstream.

For specific to brick crash for tier.t 
I have updated the test case after stop volume, needs to check volume status, thereafter I am not getting any crash.
I executed 5 times "brick-mux regression" on https://review.gluster.org/20087, i did not get any crash.
For 4.1 it is ok to go with test case change because we have reverted cleanup patch.

Regards
Mohit Agrawal

Comment 12 Shyamsundar 2018-06-12 14:50:19 UTC
No longer blocks 4.1 as specific code bits are reverted to keep the state of further changes limited to as it was in 4.0.

Comment 13 Shyamsundar 2018-06-20 18:07:14 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.