Bug 1489296
Summary: | glusterfsd (brick) process crashed | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Raghavendra G <rgowdapp> |
Component: | rpc | Assignee: | Raghavendra G <rgowdapp> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.12 | CC: | bugs, nbalacha, rabhat, rgowdapp, rhinduja, rhs-bugs, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-glusterfs-3.12.1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1486134 | Environment: | |
Last Closed: | 2017-09-14 07:43:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1483730, 1486134, 1489297, 1489298 | ||
Bug Blocks: |
Comment 1
Worker Ant
2017-09-07 07:13:57 UTC
COMMIT: https://review.gluster.org/18223 committed in release-3.12 by jiffin tony Thottan (jthottan) ------ commit 4867647db935439abdd8fb19d39416ce1d83b081 Author: Raghavendra G <rgowdapp> Date: Tue Aug 29 15:07:53 2017 +0530 event/epoll: don't call handler for events received after a pollerr we register socket with EPOLLONESHOT, which means it has to be explicitly added back through epoll_ctl to receive more events. Normally we do this once the handler completes processing of current event. But event_select_on_epoll is one asynchronous codepath where socket can be added back for polling while an event on the same socket is being processed. event_select_on_epoll has a check whether an event is being processed in the form of slot->in_handler. But this check is not sufficient enough to prevent parallel events as slot->in_handler is not atomically incremented with respect to reception of the event. This means following imaginary sequence of events can happen: * epoll_wait returns with a POLLERR - say POLLERR1 - on a socket (sock1) associated with slot s1. socket_event_handle_pollerr is yet to be invoked. * an event_select_on called from __socket_ioq_churn which was called in request/reply/msg submission codepath (as opposed to __socket_ioq_churn called as part of POLLOUT handling - we cannot receive a POLLOUT due to EPOLLONESHOT) adds back sock1 for polling. * since sock1 was added back for polling in step 2 and our polling is level-triggered, another thread picks up another POLLERR event - say POLLERR2. socket_event_handler is invoked as part of processing POLLERR2 and it completes execution setting priv->sock to -1. * event_unregister_epoll called as part of __socket_reset due to POLLERR1 would receive fd as -1 resulting in assert failure. Also, since the first pollerr event has done rpc_transport_unref, subsequent parallel events (not just pollerr, but other events too) could be acting on a freed up transport too. >Change-Id: I5db755068e7890ec755b59f7a35a57da110339eb >BUG: 1486134 >Signed-off-by: Raghavendra G <rgowdapp> >Reviewed-on: https://review.gluster.org/18129 >Smoke: Gluster Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >Reviewed-by: mohammed rafi kc <rkavunga> (cherry picked from commit b1b49997574eeb7c6a42e6e8257c81ac8d2d7578) Change-Id: I5db755068e7890ec755b59f7a35a57da110339eb BUG: 1489296 Signed-off-by: Raghavendra G <rgowdapp> Reviewed-on: https://review.gluster.org/18223 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: jiffin tony Thottan <jthottan> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-glusterfs-3.12.1, please open a new bug report. glusterfs-glusterfs-3.12.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-September/032441.html [2] https://www.gluster.org/pipermail/gluster-users/ |