Bug 1483730

Summary: [GSS] glusterfsd (brick) process crashed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Raghavendra Bhat <rabhat>
Component: rpcAssignee: Raghavendra G <rgowdapp>
Status: CLOSED ERRATA QA Contact: Vinayak Papnoi <vpapnoi>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: amukherj, mchangir, nbalacha, rgowdapp, rhs-bugs, sheggodu, storage-qa-internal, vdas
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: rebase
Fixed In Version: glusterfs-3.12.2-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1486134 (view as bug list) Environment:
Last Closed: 2018-09-04 06:35:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1486134, 1489296, 1489297, 1489298, 1503135    

Comment 4 Raghavendra G 2017-08-24 03:40:42 UTC
Here is the verbatim posting of a mail from Nithya related to this issue:

<mail>

    Hi,


    I am not sure how correct this is but I tried some stuff with objdump and the stacktrace in the BZ. Here is what I have:

    /lib64/libc.so.6(+0x2e1f2)[0x7fac3d50e1f2]
    /lib64/libglusterfs.so.0(+0x8a072)[0x7fac3ee8c072]
    /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0x3814)[0x7fac39fbc814]
    /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0xba18)[0x7fac39fc4a18]


    Messing around with objdump and debuginfo I have come up with:

    /lib64/libc.so.6(best guess - assert)

    /lib64/libglusterfs.so.0(+0x8a072 = event_unregister_epoll_common + offset)

    static int                                                                      
    event_unregister_epoll_common (struct event_pool *event_pool, int fd,           
                                   int idx, int do_close)                           
    {                                                                               
    ...                                                                                
            slot = event_slot_get (event_pool, idx);                                
                                                                                    
            assert (slot->fd == fd);                                                
       8a053:       48 8d 0d 86 09 02 00    lea    0x20986(%rip),%rcx        # aa9e0 <_fini@@Base+0xbd9c>
       8a05a:       48 8d 35 d3 06 02 00    lea    0x206d3(%rip),%rsi        # aa734 <_fini@@Base+0xbaf0>
       8a061:       48 8d 3d f2 06 02 00    lea    0x206f2(%rip),%rdi        # aa75a <_fini@@Base+0xbb16>
       8a068:       ba 9d 01 00 00          mov    $0x19d,%edx                      
       8a06d:       e8 ee 0e f9 ff          callq  1af60 <__assert_fail@plt>        
       8a072:       0f 1f 40 00             nopl   0x0(%rax)                        
       8a076:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)             
       8a07d:       00 00 00  

    /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0x3814 = __socket_reset + offset)

    static void                                                                     
    __socket_reset (rpc_transport_t *this)   

    ...

            memset (&priv->incoming, 0, sizeof (priv->incoming));                   
                                                                                    
            event_unregister_close (this->ctx->event_pool, priv->sock, priv->idx);  
        37ff:       48 8b 45 60             mov    0x60(%rbp),%rax                  
        3803:       8b 53 04                mov    0x4(%rbx),%edx                   
        3806:       8b 33                   mov    (%rbx),%esi                      
        3808:       48 8b b8 b0 01 00 00    mov    0x1b0(%rax),%rdi                 
        380f:       e8 3c f7 ff ff          callq  2f50 <event_unregister_close@plt>
                                                                                    
            priv->sock = -1;                                                        
        3814:       c7 03 ff ff ff ff       movl   $0xffffffff,(%rbx)               
                                                                         


    /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so(+0xba18 =socket_event_handler + offset)                                                                                                                                                   

    static int                                                                      
    socket_event_handler (int fd, int idx, void *data,                              

                          int poll_in, int poll_out, int poll_err)  

    ...

                    __socket_ioq_flush (this);                                      
        ba08:       48 89 df                mov    %rbx,%rdi                        
        ba0b:       e8 b0 a2 ff ff          callq  5cc0 <socket_connect_error_cbk@@Base+0x2980>
                    __socket_reset (this);                                          
        ba10:       48 89 df                mov    %rbx,%rdi                        
        ba13:       e8 48 7d ff ff          callq  3760 <socket_connect_error_cbk@@Base+0x420>
            }                                                                       
            pthread_mutex_unlock (&priv->lock);    



    In event_unregister_epoll_common, the cause of the crash seems to be a call to assert:

            slot = event_slot_get (event_pool, idx);                                
                                                                                    
            assert (slot->fd == fd);  <---- here!!!


    So the stack would be 

    0 assert
    1 event_unregister_epoll_common
    2 event_unregister_close_epoll
    3  __socket_reset
    4  socket_event_handler

    Many thanks to Shyam for helping me with objdump.


    Hope this helps.

    Regards,
    Nithya

</mail>

Comment 5 Raghavendra G 2017-08-24 09:35:33 UTC
Failure of assertion (slot->fd == fd) means that slot is allocated to a different fd. Slot deallocation happens only after receiving a POLLERR event. This means we got at least one another parallel POLLERR event on the socket. However, we register socket with EPOLLONESHOT, which means it has to be explicitly added back through epoll_ctl to receive more events. Normally we do this once the handler completes processing of current event. But event_select_on_epoll is one asynchronous codepath where socket can be added back for polling while an event on the same socket is being processed. event_select_on_epoll has a check whether an event is being processed in the form of slot->in_handler. But this check is not sufficient enough to prevent parallel events as slot->in_handler is not atomically incremented with respect to reception of the event. This means following imaginary sequence of events can happen:

* epoll_wait returns with a POLLIN - say POLLIN1 - on a socket (sock1) associated with slot s1.
* an event_select_on called from __socket_ioq_churn which was called in request/reply/msg submission codepath (as opposed to __socket_ioq_churn called as part of POLLOUT handling - we cannot receive a POLLOUT due to EPOLLONESHOT) adds back sock1 for polling.
* since sock1 was added back for polling in step 2 and our polling is level-triggered, another thread picks up a POLLIN - say POLLIN2 - event.

Similarly we can argue for more than one POLLERR events too. However, every event process gets a reference on slot and slot is deallocated only when refcount goes to 0 and I cannot think of way this can happen even with parallel POLLERR events. So, as of now RCA is unknown.

While making event_unregister_epoll_common returning doing nothing when (slot->fd != fd) seems to be the only sane fix, I would like to figure out the RCA before doing so. It would be better if we can avoid running into such a situation than handling it.

Comment 7 Raghavendra G 2017-08-29 06:35:29 UTC
Some observations by reading code:

* A slot won't be deallocated and hence the fd associated with it won't change till there is a positive non-zero refcount. Since we increment refcount by one before calling handler, slot's fd won't be changed till handler returns.
* From __socket_reset,

        event_unregister_close (this->ctx->event_pool, priv->sock, priv->idx);

        priv->sock = -1;
        priv->idx = -1;
        priv->connected = -1;
  
  As can be seen above, priv->sock that is passed as argument to event_unregister_close and it is set to -1 post that. So, I guess we got more than one POLLERR event on the socket. The first event set priv->sock to -1 and the second event resulted in this crash during event_unregister_close as slot->fd != -1.

Since socket_event_handler logs in DEBUG log-level, logs cannot help me to confirm this hypothesis. As to parallel POLLERR events, comment #5 explains a possible scenario where this can happen.

Comment 8 Raghavendra G 2017-08-29 07:22:33 UTC
A more comprehensive analysis can be found at https://bugzilla.redhat.com/show_bug.cgi?id=1486134#c3

Comment 15 errata-xmlrpc 2018-09-04 06:35:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607