Bug 1486134

Summary: glusterfsd (brick) process crashed
Product: [Community] GlusterFS Reporter: Raghavendra G <rgowdapp>
Component: rpcAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, nbalacha, rabhat, rgowdapp, rhinduja, rhs-bugs, sheggodu, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.13.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1483730
: 1489296 1489297 1489298 (view as bug list) Environment:
Last Closed: 2017-12-08 17:39:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1483730    
Bug Blocks: 1489296, 1489297, 1489298    

Comment 1 Raghavendra G 2017-08-29 07:06:30 UTC
Description of problem:

glusterfsd brick process crashed. Unfortunately core file limit was set to 0 on that node. So core file is not available.

Below is the backtrace of the crash
[2017-08-17 19:01:08.790160] I [MSGID: 115013] [server-helpers.c:294:do_fd_cleanup] 0-dv_repo-server: fd cleanup on /
pending frames:
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash:
2017-08-17 19:01:08
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.1
[2017-08-17 19:01:08.790499] E [server.c:200:server_submit_reply] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fac3ee299d6] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server
.so(server_submit_reply+0x2ee)[0x7fac34c6a02e] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server.so(server_lookup_cbk+0x55c)[0x7fac34c8641c] (--> /usr/lib64/glusterfs/3.7.1/xlator/debug/
io-stats.so(io_stats_lookup_cbk+0x154)[0x7fac34eaa6f4] (--> /lib64/libglusterfs.so.0(default_lookup_cbk+0xde)[0x7fac3ee2da4e] ))))) 0-: Reply submission failed
[2017-08-17 19:01:08.790536] E [server.c:200:server_submit_reply] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fac3ee299d6] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server
.so(server_submit_reply+0x2ee)[0x7fac34c6a02e] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server.so(server_lookup_cbk+0x55c)[0x7fac34c8641c] (--> /usr/lib64/glusterfs/3.7.1/xlator/debug/
io-stats.so(io_stats_lookup_cbk+0x154)[0x7fac34eaa6f4] (--> /lib64/libglusterfs.so.0(default_lookup_cbk+0xde)[0x7fac3ee2da4e] ))))) 0-: Reply submission failed
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7fac3ee27f62]
/lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7fac3ee463ed]
/lib64/libc.so.6(+0x35250)[0x7fac3d515250]
/lib64/libc.so.6(gsignal+0x37)[0x7fac3d5151d7]
/lib64/libc.so.6(abort+0x148)[0x7fac3d5168c8]
/lib64/libc.so.6(+0x2e146)[0x7fac3d50e146]
/lib64/libc.so.6(+0x2e1f2)[0x7fac3d50e1f2]
/lib64/libglusterfs.so.0(+0x8a072)[0x7fac3ee8c072]
/usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0x3814)[0x7fac39fbc814]
/usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0xba18)[0x7fac39fc4a18]
/lib64/libglusterfs.so.0(+0x8a77a)[0x7fac3ee8c77a]
/lib64/libpthread.so.0(+0x7dc5)[0x7fac3dc90dc5]
/lib64/libc.so.6(clone+0x6d)[0x7fac3d5d776d]
---------

Version-Release number of selected component (if applicable):

glusterfs-3.7.1-11

Comment 2 Raghavendra G 2017-08-29 07:07:11 UTC
Here is the verbatim posting of a mail from Nithya related to this issue:

<mail>

    Hi,


    I am not sure how correct this is but I tried some stuff with objdump and the stacktrace in the BZ. Here is what I have:

    /lib64/libc.so.6(+0x2e1f2)[0x7fac3d50e1f2]
    /lib64/libglusterfs.so.0(+0x8a072)[0x7fac3ee8c072]
    /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0x3814)[0x7fac39fbc814]
    /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0xba18)[0x7fac39fc4a18]


    Messing around with objdump and debuginfo I have come up with:

    /lib64/libc.so.6(best guess - assert)

    /lib64/libglusterfs.so.0(+0x8a072 = event_unregister_epoll_common + offset)

    static int                                                                      
    event_unregister_epoll_common (struct event_pool *event_pool, int fd,           
                                   int idx, int do_close)                           
    {                                                                               
    ...                                                                                
            slot = event_slot_get (event_pool, idx);                                
                                                                                    
            assert (slot->fd == fd);                                                
       8a053:       48 8d 0d 86 09 02 00    lea    0x20986(%rip),%rcx        # aa9e0 <_fini@@Base+0xbd9c>
       8a05a:       48 8d 35 d3 06 02 00    lea    0x206d3(%rip),%rsi        # aa734 <_fini@@Base+0xbaf0>
       8a061:       48 8d 3d f2 06 02 00    lea    0x206f2(%rip),%rdi        # aa75a <_fini@@Base+0xbb16>
       8a068:       ba 9d 01 00 00          mov    $0x19d,%edx                      
       8a06d:       e8 ee 0e f9 ff          callq  1af60 <__assert_fail@plt>        
       8a072:       0f 1f 40 00             nopl   0x0(%rax)                        
       8a076:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)             
       8a07d:       00 00 00  

    /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0x3814 = __socket_reset + offset)

    static void                                                                     
    __socket_reset (rpc_transport_t *this)   

    ...

            memset (&priv->incoming, 0, sizeof (priv->incoming));                   
                                                                                    
            event_unregister_close (this->ctx->event_pool, priv->sock, priv->idx);  
        37ff:       48 8b 45 60             mov    0x60(%rbp),%rax                  
        3803:       8b 53 04                mov    0x4(%rbx),%edx                   
        3806:       8b 33                   mov    (%rbx),%esi                      
        3808:       48 8b b8 b0 01 00 00    mov    0x1b0(%rax),%rdi                 
        380f:       e8 3c f7 ff ff          callq  2f50 <event_unregister_close@plt>
                                                                                    
            priv->sock = -1;                                                        
        3814:       c7 03 ff ff ff ff       movl   $0xffffffff,(%rbx)               
                                                                         


    /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0xba18 =socket_event_handler + offset)                                                                                                                                                   

    static int                                                                      
    socket_event_handler (int fd, int idx, void *data,                              

                          int poll_in, int poll_out, int poll_err)  

    ...

                    __socket_ioq_flush (this);                                      
        ba08:       48 89 df                mov    %rbx,%rdi                        
        ba0b:       e8 b0 a2 ff ff          callq  5cc0 <socket_connect_error_cbk@@Base+0x2980>
                    __socket_reset (this);                                          
        ba10:       48 89 df                mov    %rbx,%rdi                        
        ba13:       e8 48 7d ff ff          callq  3760 <socket_connect_error_cbk@@Base+0x420>
            }                                                                       
            pthread_mutex_unlock (&priv->lock);    



    In event_unregister_epoll_common, the cause of the crash seems to be a call to assert:

            slot = event_slot_get (event_pool, idx);                                
                                                                                    
            assert (slot->fd == fd);  <---- here!!!


    So the stack would be 

    0 assert
    1 event_unregister_epoll_common
    2 event_unregister_close_epoll
    3  __socket_reset
    4  socket_event_handler

    Many thanks to Shyam for helping me with objdump.


    Hope this helps.

    Regards,
    Nithya

</mail>

Comment 3 Raghavendra G 2017-08-29 07:21:16 UTC
Some observations by reading code:

* A slot won't be deallocated and hence the fd associated with it won't change till there is a positive non-zero refcount. Since we increment refcount by one before calling handler, slot's fd won't be changed till handler returns.

* From __socket_reset,

        event_unregister_close (this->ctx->event_pool, priv->sock, priv->idx);

        priv->sock = -1;
        priv->idx = -1;
        priv->connected = -1;
  
  As can be seen above, priv->sock that is passed as argument to event_unregister_close and it is set to -1 post that. So, I guess we got more than one POLLERR event on the socket. The first event set priv->sock to -1 and the second event resulted in this crash during event_unregister_close as slot->fd != -1.

Since socket_event_handler logs in DEBUG log-level, logs cannot help me to confirm this hypothesis. 

As to parallel POLLERR events, we register socket with EPOLLONESHOT, which means it has to be explicitly added back through epoll_ctl to receive more events. Normally we do this once the handler completes processing of current event. But event_select_on_epoll is one asynchronous codepath where socket can be added back for polling while an event on the same socket is being processed. event_select_on_epoll has a check whether an event is being processed in the form of slot->in_handler. But this check is not sufficient enough to prevent parallel events as slot->in_handler is not atomically incremented with respect to reception of the event. This means following imaginary sequence of events can happen:

* epoll_wait returns with a POLLERR - say POLLERR1 - on a socket (sock1) associated with slot s1. socket_event_handle_pollerr is yet to be invoked.
* an event_select_on called from __socket_ioq_churn which was called in request/reply/msg submission codepath (as opposed to __socket_ioq_churn called as part of POLLOUT handling - we cannot receive a POLLOUT due to EPOLLONESHOT) adds back sock1 for polling.
* since sock1 was added back for polling in step 2 and our polling is level-triggered, another thread picks up another POLLERR event - say POLLERR2. socket_event_handler is invoked as part of processing POLLERR2 and it completes execution setting priv->sock to -1.
* event_unregister_epoll called as part of __socket_reset due to POLLERR1 would receive fd as -1 resulting in assert failure.

Also, since the first pollerr event has done rpc_transport_unref, subsequent parallel events (not just pollerr, but other events too) could be acting on a freed up transport too.

Comment 4 Raghavendra G 2017-08-29 08:58:58 UTC
> * epoll_wait returns with a POLLERR - say POLLERR1 - on a socket (sock1) associated with slot s1. socket_event_handle_pollerr is yet to be invoked.

Correction. ... with slot s1. event_slot_get is done incrementing refcount of slot. But, slot->in_handler is yet to be incremented

Comment 5 Worker Ant 2017-08-29 09:43:54 UTC
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 6 Worker Ant 2017-08-29 09:51:09 UTC
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#2) for review on master by Raghavendra G (rgowdapp)

Comment 7 Worker Ant 2017-08-30 03:38:29 UTC
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#3) for review on master by Raghavendra G (rgowdapp)

Comment 8 Worker Ant 2017-08-30 06:40:37 UTC
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#4) for review on master by Raghavendra G (rgowdapp)

Comment 9 Worker Ant 2017-08-30 07:02:29 UTC
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#5) for review on master by Raghavendra G (rgowdapp)

Comment 10 Worker Ant 2017-08-31 04:48:45 UTC
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#6) for review on master by Raghavendra G (rgowdapp)

Comment 11 Worker Ant 2017-09-05 11:24:10 UTC
REVIEW: https://review.gluster.org/18129 (event/epoll: don't call handler for events received after a pollerr) posted (#7) for review on master by Raghavendra G (rgowdapp)

Comment 12 Worker Ant 2017-09-06 06:20:46 UTC
COMMIT: https://review.gluster.org/18129 committed in master by Raghavendra G (rgowdapp) 
------
commit b1b49997574eeb7c6a42e6e8257c81ac8d2d7578
Author: Raghavendra G <rgowdapp>
Date:   Tue Aug 29 15:07:53 2017 +0530

    event/epoll: don't call handler for events received after a pollerr
    
    we register socket with EPOLLONESHOT, which means it has to be
    explicitly added back through epoll_ctl to receive more
    events. Normally we do this once the handler completes processing of
    current event. But event_select_on_epoll is one asynchronous codepath
    where socket can be added back for polling while an event on the same
    socket is being processed. event_select_on_epoll has a check whether
    an event is being processed in the form of slot->in_handler. But this
    check is not sufficient enough to prevent parallel events as
    slot->in_handler is not atomically incremented with respect to
    reception of the event. This means following imaginary sequence of
    events can happen:
    
    * epoll_wait returns with a POLLERR - say POLLERR1 - on a socket
      (sock1) associated with slot s1. socket_event_handle_pollerr is yet
      to be invoked.
    * an event_select_on called from __socket_ioq_churn which was called
      in request/reply/msg submission codepath (as opposed to
      __socket_ioq_churn called as part of POLLOUT handling - we cannot
      receive a POLLOUT due to EPOLLONESHOT) adds back sock1 for polling.
    * since sock1 was added back for polling in step 2 and our polling is
      level-triggered, another thread picks up another POLLERR event - say
      POLLERR2. socket_event_handler is invoked as part of processing
      POLLERR2 and it completes execution setting priv->sock to -1.
    * event_unregister_epoll called as part of __socket_reset due to
      POLLERR1 would receive fd as -1 resulting in assert failure.
    
    Also, since the first pollerr event has done rpc_transport_unref,
    subsequent parallel events (not just pollerr, but other events too)
    could be acting on a freed up transport too.
    
    Change-Id: I5db755068e7890ec755b59f7a35a57da110339eb
    BUG: 1486134
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: https://review.gluster.org/18129
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: mohammed rafi  kc <rkavunga>

Comment 13 Shyamsundar 2017-12-08 17:39:41 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report.

glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html
[2] https://www.gluster.org/pipermail/gluster-users/