Bug 1489296

Summary: glusterfsd (brick) process crashed
Product: [Community] GlusterFS Reporter: Raghavendra G <rgowdapp>
Component: rpcAssignee: Raghavendra G <rgowdapp>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.12CC: bugs, nbalacha, rabhat, rgowdapp, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-glusterfs-3.12.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1486134 Environment:
Last Closed: 2017-09-14 07:43:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1483730, 1486134, 1489297, 1489298    
Bug Blocks:    

Comment 1 Worker Ant 2017-09-07 07:13:57 UTC
REVIEW: https://review.gluster.org/18223 (event/epoll: don't call handler for events received after a pollerr) posted (#1) for review on release-3.12 by Raghavendra G (rgowdapp)

Comment 2 Worker Ant 2017-09-11 04:59:38 UTC
COMMIT: https://review.gluster.org/18223 committed in release-3.12 by jiffin tony Thottan (jthottan) 
------
commit 4867647db935439abdd8fb19d39416ce1d83b081
Author: Raghavendra G <rgowdapp>
Date:   Tue Aug 29 15:07:53 2017 +0530

    event/epoll: don't call handler for events received after a pollerr
    
    we register socket with EPOLLONESHOT, which means it has to be
    explicitly added back through epoll_ctl to receive more
    events. Normally we do this once the handler completes processing of
    current event. But event_select_on_epoll is one asynchronous codepath
    where socket can be added back for polling while an event on the same
    socket is being processed. event_select_on_epoll has a check whether
    an event is being processed in the form of slot->in_handler. But this
    check is not sufficient enough to prevent parallel events as
    slot->in_handler is not atomically incremented with respect to
    reception of the event. This means following imaginary sequence of
    events can happen:
    
    * epoll_wait returns with a POLLERR - say POLLERR1 - on a socket
      (sock1) associated with slot s1. socket_event_handle_pollerr is yet
      to be invoked.
    * an event_select_on called from __socket_ioq_churn which was called
      in request/reply/msg submission codepath (as opposed to
      __socket_ioq_churn called as part of POLLOUT handling - we cannot
      receive a POLLOUT due to EPOLLONESHOT) adds back sock1 for polling.
    * since sock1 was added back for polling in step 2 and our polling is
      level-triggered, another thread picks up another POLLERR event - say
      POLLERR2. socket_event_handler is invoked as part of processing
      POLLERR2 and it completes execution setting priv->sock to -1.
    * event_unregister_epoll called as part of __socket_reset due to
      POLLERR1 would receive fd as -1 resulting in assert failure.
    
    Also, since the first pollerr event has done rpc_transport_unref,
    subsequent parallel events (not just pollerr, but other events too)
    could be acting on a freed up transport too.
    
    >Change-Id: I5db755068e7890ec755b59f7a35a57da110339eb
    >BUG: 1486134
    >Signed-off-by: Raghavendra G <rgowdapp>
    >Reviewed-on: https://review.gluster.org/18129
    >Smoke: Gluster Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: mohammed rafi  kc <rkavunga>
    
    (cherry picked from commit b1b49997574eeb7c6a42e6e8257c81ac8d2d7578)
    Change-Id: I5db755068e7890ec755b59f7a35a57da110339eb
    BUG: 1489296
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: https://review.gluster.org/18223
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: jiffin tony Thottan <jthottan>

Comment 3 Jiffin 2017-09-14 07:43:25 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-glusterfs-3.12.1, please open a new bug report.

glusterfs-glusterfs-3.12.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-September/032441.html
[2] https://www.gluster.org/pipermail/gluster-users/