Description of problem: glusterfsd brick process crashed. Unfortunately core file limit was set to 0 on that node. So core file is not available. Below is the backtrace of the crash [2017-08-17 19:01:08.790160] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-dv_repo-server: fd cleanup on / pending frames: frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2017-08-17 19:01:08 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.1 [2017-08-17 19:01:08.790499] E [server.c:200:server_submit_reply] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fac3ee299d6] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server .so(server_submit_reply+0x2ee)[0x7fac34c6a02e] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server.so(server_lookup_cbk+0x55c)[0x7fac34c8641c] (--> /usr/lib64/glusterfs/3.7.1/xlator/debug/ io-stats.so(io_stats_lookup_cbk+0x154)[0x7fac34eaa6f4] (--> /lib64/libglusterfs.so.0(default_lookup_cbk+0xde)[0x7fac3ee2da4e] ))))) 0-: Reply submission failed [2017-08-17 19:01:08.790536] E [server.c:200:server_submit_reply] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fac3ee299d6] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server .so(server_submit_reply+0x2ee)[0x7fac34c6a02e] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server.so(server_lookup_cbk+0x55c)[0x7fac34c8641c] (--> /usr/lib64/glusterfs/3.7.1/xlator/debug/ io-stats.so(io_stats_lookup_cbk+0x154)[0x7fac34eaa6f4] (--> /lib64/libglusterfs.so.0(default_lookup_cbk+0xde)[0x7fac3ee2da4e] ))))) 0-: Reply submission failed /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7fac3ee27f62] /lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7fac3ee463ed] /lib64/libc.so.6(+0x35250)[0x7fac3d515250] /lib64/libc.so.6(gsignal+0x37)[0x7fac3d5151d7] /lib64/libc.so.6(abort+0x148)[0x7fac3d5168c8] /lib64/libc.so.6(+0x2e146)[0x7fac3d50e146] /lib64/libc.so.6(+0x2e1f2)[0x7fac3d50e1f2] /lib64/libglusterfs.so.0(+0x8a072)[0x7fac3ee8c072] /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0x3814)[0x7fac39fbc814] /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0xba18)[0x7fac39fc4a18] /lib64/libglusterfs.so.0(+0x8a77a)[0x7fac3ee8c77a] /lib64/libpthread.so.0(+0x7dc5)[0x7fac3dc90dc5] /lib64/libc.so.6(clone+0x6d)[0x7fac3d5d776d] --------- Version-Release number of selected component (if applicable): glusterfs-3.7.1-11
Here is the verbatim posting of a mail from Nithya related to this issue: <mail> Hi, I am not sure how correct this is but I tried some stuff with objdump and the stacktrace in the BZ. Here is what I have: /lib64/libc.so.6(+0x2e1f2)[0x7fac3d50e1f2] /lib64/libglusterfs.so.0(+0x8a072)[0x7fac3ee8c072] /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0x3814)[0x7fac39fbc814] /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0xba18)[0x7fac39fc4a18] Messing around with objdump and debuginfo I have come up with: /lib64/libc.so.6(best guess - assert) /lib64/libglusterfs.so.0(+0x8a072 = event_unregister_epoll_common + offset) static int event_unregister_epoll_common (struct event_pool *event_pool, int fd, int idx, int do_close) { ... slot = event_slot_get (event_pool, idx); assert (slot->fd == fd); 8a053: 48 8d 0d 86 09 02 00 lea 0x20986(%rip),%rcx # aa9e0 <_fini@@Base+0xbd9c> 8a05a: 48 8d 35 d3 06 02 00 lea 0x206d3(%rip),%rsi # aa734 <_fini@@Base+0xbaf0> 8a061: 48 8d 3d f2 06 02 00 lea 0x206f2(%rip),%rdi # aa75a <_fini@@Base+0xbb16> 8a068: ba 9d 01 00 00 mov $0x19d,%edx 8a06d: e8 ee 0e f9 ff callq 1af60 <__assert_fail@plt> 8a072: 0f 1f 40 00 nopl 0x0(%rax) 8a076: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 8a07d: 00 00 00 /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0x3814 = __socket_reset + offset) static void __socket_reset (rpc_transport_t *this) ... memset (&priv->incoming, 0, sizeof (priv->incoming)); event_unregister_close (this->ctx->event_pool, priv->sock, priv->idx); 37ff: 48 8b 45 60 mov 0x60(%rbp),%rax 3803: 8b 53 04 mov 0x4(%rbx),%edx 3806: 8b 33 mov (%rbx),%esi 3808: 48 8b b8 b0 01 00 00 mov 0x1b0(%rax),%rdi 380f: e8 3c f7 ff ff callq 2f50 <event_unregister_close@plt> priv->sock = -1; 3814: c7 03 ff ff ff ff movl $0xffffffff,(%rbx) /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0xba18 =socket_event_handler + offset) static int socket_event_handler (int fd, int idx, void *data, int poll_in, int poll_out, int poll_err) ... __socket_ioq_flush (this); ba08: 48 89 df mov %rbx,%rdi ba0b: e8 b0 a2 ff ff callq 5cc0 <socket_connect_error_cbk@@Base+0x2980> __socket_reset (this); ba10: 48 89 df mov %rbx,%rdi ba13: e8 48 7d ff ff callq 3760 <socket_connect_error_cbk@@Base+0x420> } pthread_mutex_unlock (&priv->lock); In event_unregister_epoll_common, the cause of the crash seems to be a call to assert: slot = event_slot_get (event_pool, idx); assert (slot->fd == fd); <---- here!!! So the stack would be 0 assert 1 event_unregister_epoll_common 2 event_unregister_close_epoll 3 __socket_reset 4 socket_event_handler Many thanks to Shyam for helping me with objdump. Hope this helps. Regards, Nithya </mail>
Some observations by reading code: * A slot won't be deallocated and hence the fd associated with it won't change till there is a positive non-zero refcount. Since we increment refcount by one before calling handler, slot's fd won't be changed till handler returns. * From __socket_reset, event_unregister_close (this->ctx->event_pool, priv->sock, priv->idx); priv->sock = -1; priv->idx = -1; priv->connected = -1; As can be seen above, priv->sock that is passed as argument to event_unregister_close and it is set to -1 post that. So, I guess we got more than one POLLERR event on the socket. The first event set priv->sock to -1 and the second event resulted in this crash during event_unregister_close as slot->fd != -1. Since socket_event_handler logs in DEBUG log-level, logs cannot help me to confirm this hypothesis. As to parallel POLLERR events, we register socket with EPOLLONESHOT, which means it has to be explicitly added back through epoll_ctl to receive more events. Normally we do this once the handler completes processing of current event. But event_select_on_epoll is one asynchronous codepath where socket can be added back for polling while an event on the same socket is being processed. event_select_on_epoll has a check whether an event is being processed in the form of slot->in_handler. But this check is not sufficient enough to prevent parallel events as slot->in_handler is not atomically incremented with respect to reception of the event. This means following imaginary sequence of events can happen: * epoll_wait returns with a POLLERR - say POLLERR1 - on a socket (sock1) associated with slot s1. socket_event_handle_pollerr is yet to be invoked. * an event_select_on called from __socket_ioq_churn which was called in request/reply/msg submission codepath (as opposed to __socket_ioq_churn called as part of POLLOUT handling - we cannot receive a POLLOUT due to EPOLLONESHOT) adds back sock1 for polling. * since sock1 was added back for polling in step 2 and our polling is level-triggered, another thread picks up another POLLERR event - say POLLERR2. socket_event_handler is invoked as part of processing POLLERR2 and it completes execution setting priv->sock to -1. * event_unregister_epoll called as part of __socket_reset due to POLLERR1 would receive fd as -1 resulting in assert failure. Also, since the first pollerr event has done rpc_transport_unref, subsequent parallel events (not just pollerr, but other events too) could be acting on a freed up transport too.
> * epoll_wait returns with a POLLERR - say POLLERR1 - on a socket (sock1) associated with slot s1. socket_event_handle_pollerr is yet to be invoked. Correction. ... with slot s1. event_slot_get is done incrementing refcount of slot. But, slot->in_handler is yet to be incremented
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#1) for review on master by Raghavendra G (rgowdapp)
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#2) for review on master by Raghavendra G (rgowdapp)
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#3) for review on master by Raghavendra G (rgowdapp)
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#4) for review on master by Raghavendra G (rgowdapp)
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#5) for review on master by Raghavendra G (rgowdapp)
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#6) for review on master by Raghavendra G (rgowdapp)
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#7) for review on master by Raghavendra G (rgowdapp)
COMMIT: https://review.gluster.org/18129 committed in master by Raghavendra G (rgowdapp) ------ commit b1b49997574eeb7c6a42e6e8257c81ac8d2d7578 Author: Raghavendra G <rgowdapp> Date: Tue Aug 29 15:07:53 2017 +0530 event/epoll: don't call handler for events received after a pollerr we register socket with EPOLLONESHOT, which means it has to be explicitly added back through epoll_ctl to receive more events. Normally we do this once the handler completes processing of current event. But event_select_on_epoll is one asynchronous codepath where socket can be added back for polling while an event on the same socket is being processed. event_select_on_epoll has a check whether an event is being processed in the form of slot->in_handler. But this check is not sufficient enough to prevent parallel events as slot->in_handler is not atomically incremented with respect to reception of the event. This means following imaginary sequence of events can happen: * epoll_wait returns with a POLLERR - say POLLERR1 - on a socket (sock1) associated with slot s1. socket_event_handle_pollerr is yet to be invoked. * an event_select_on called from __socket_ioq_churn which was called in request/reply/msg submission codepath (as opposed to __socket_ioq_churn called as part of POLLOUT handling - we cannot receive a POLLOUT due to EPOLLONESHOT) adds back sock1 for polling. * since sock1 was added back for polling in step 2 and our polling is level-triggered, another thread picks up another POLLERR event - say POLLERR2. socket_event_handler is invoked as part of processing POLLERR2 and it completes execution setting priv->sock to -1. * event_unregister_epoll called as part of __socket_reset due to POLLERR1 would receive fd as -1 resulting in assert failure. Also, since the first pollerr event has done rpc_transport_unref, subsequent parallel events (not just pollerr, but other events too) could be acting on a freed up transport too. Change-Id: I5db755068e7890ec755b59f7a35a57da110339eb BUG: 1486134 Signed-off-by: Raghavendra G <rgowdapp> Reviewed-on: https://review.gluster.org/18129 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: mohammed rafi kc <rkavunga>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report. glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html [2] https://www.gluster.org/pipermail/gluster-users/