Here is the verbatim posting of a mail from Nithya related to this issue: <mail> Hi, I am not sure how correct this is but I tried some stuff with objdump and the stacktrace in the BZ. Here is what I have: /lib64/libc.so.6(+0x2e1f2)[0x7fac3d50e1f2] /lib64/libglusterfs.so.0(+0x8a072)[0x7fac3ee8c072] /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0x3814)[0x7fac39fbc814] /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0xba18)[0x7fac39fc4a18] Messing around with objdump and debuginfo I have come up with: /lib64/libc.so.6(best guess - assert) /lib64/libglusterfs.so.0(+0x8a072 = event_unregister_epoll_common + offset) static int event_unregister_epoll_common (struct event_pool *event_pool, int fd, int idx, int do_close) { ... slot = event_slot_get (event_pool, idx); assert (slot->fd == fd); 8a053: 48 8d 0d 86 09 02 00 lea 0x20986(%rip),%rcx # aa9e0 <_fini@@Base+0xbd9c> 8a05a: 48 8d 35 d3 06 02 00 lea 0x206d3(%rip),%rsi # aa734 <_fini@@Base+0xbaf0> 8a061: 48 8d 3d f2 06 02 00 lea 0x206f2(%rip),%rdi # aa75a <_fini@@Base+0xbb16> 8a068: ba 9d 01 00 00 mov $0x19d,%edx 8a06d: e8 ee 0e f9 ff callq 1af60 <__assert_fail@plt> 8a072: 0f 1f 40 00 nopl 0x0(%rax) 8a076: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 8a07d: 00 00 00 /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0x3814 = __socket_reset + offset) static void __socket_reset (rpc_transport_t *this) ... memset (&priv->incoming, 0, sizeof (priv->incoming)); event_unregister_close (this->ctx->event_pool, priv->sock, priv->idx); 37ff: 48 8b 45 60 mov 0x60(%rbp),%rax 3803: 8b 53 04 mov 0x4(%rbx),%edx 3806: 8b 33 mov (%rbx),%esi 3808: 48 8b b8 b0 01 00 00 mov 0x1b0(%rax),%rdi 380f: e8 3c f7 ff ff callq 2f50 <event_unregister_close@plt> priv->sock = -1; 3814: c7 03 ff ff ff ff movl $0xffffffff,(%rbx) /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0xba18 =socket_event_handler + offset) static int socket_event_handler (int fd, int idx, void *data, int poll_in, int poll_out, int poll_err) ... __socket_ioq_flush (this); ba08: 48 89 df mov %rbx,%rdi ba0b: e8 b0 a2 ff ff callq 5cc0 <socket_connect_error_cbk@@Base+0x2980> __socket_reset (this); ba10: 48 89 df mov %rbx,%rdi ba13: e8 48 7d ff ff callq 3760 <socket_connect_error_cbk@@Base+0x420> } pthread_mutex_unlock (&priv->lock); In event_unregister_epoll_common, the cause of the crash seems to be a call to assert: slot = event_slot_get (event_pool, idx); assert (slot->fd == fd); <---- here!!! So the stack would be 0 assert 1 event_unregister_epoll_common 2 event_unregister_close_epoll 3 __socket_reset 4 socket_event_handler Many thanks to Shyam for helping me with objdump. Hope this helps. Regards, Nithya </mail>
Failure of assertion (slot->fd == fd) means that slot is allocated to a different fd. Slot deallocation happens only after receiving a POLLERR event. This means we got at least one another parallel POLLERR event on the socket. However, we register socket with EPOLLONESHOT, which means it has to be explicitly added back through epoll_ctl to receive more events. Normally we do this once the handler completes processing of current event. But event_select_on_epoll is one asynchronous codepath where socket can be added back for polling while an event on the same socket is being processed. event_select_on_epoll has a check whether an event is being processed in the form of slot->in_handler. But this check is not sufficient enough to prevent parallel events as slot->in_handler is not atomically incremented with respect to reception of the event. This means following imaginary sequence of events can happen: * epoll_wait returns with a POLLIN - say POLLIN1 - on a socket (sock1) associated with slot s1. * an event_select_on called from __socket_ioq_churn which was called in request/reply/msg submission codepath (as opposed to __socket_ioq_churn called as part of POLLOUT handling - we cannot receive a POLLOUT due to EPOLLONESHOT) adds back sock1 for polling. * since sock1 was added back for polling in step 2 and our polling is level-triggered, another thread picks up a POLLIN - say POLLIN2 - event. Similarly we can argue for more than one POLLERR events too. However, every event process gets a reference on slot and slot is deallocated only when refcount goes to 0 and I cannot think of way this can happen even with parallel POLLERR events. So, as of now RCA is unknown. While making event_unregister_epoll_common returning doing nothing when (slot->fd != fd) seems to be the only sane fix, I would like to figure out the RCA before doing so. It would be better if we can avoid running into such a situation than handling it.
Some observations by reading code: * A slot won't be deallocated and hence the fd associated with it won't change till there is a positive non-zero refcount. Since we increment refcount by one before calling handler, slot's fd won't be changed till handler returns. * From __socket_reset, event_unregister_close (this->ctx->event_pool, priv->sock, priv->idx); priv->sock = -1; priv->idx = -1; priv->connected = -1; As can be seen above, priv->sock that is passed as argument to event_unregister_close and it is set to -1 post that. So, I guess we got more than one POLLERR event on the socket. The first event set priv->sock to -1 and the second event resulted in this crash during event_unregister_close as slot->fd != -1. Since socket_event_handler logs in DEBUG log-level, logs cannot help me to confirm this hypothesis. As to parallel POLLERR events, comment #5 explains a possible scenario where this can happen.
A more comprehensive analysis can be found at https://bugzilla.redhat.com/show_bug.cgi?id=1486134#c3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607