Bug 1346156

Summary: Possible crash due to a timer cancellation race
Product: [Community] GlusterFS Reporter: Xavi Hernandez <jahernan>
Component: disperseAssignee: Xavi Hernandez <jahernan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.7.11CC: bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.14 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1345855 Environment:
Last Closed: 2016-08-02 06:52:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1345855    
Bug Blocks:    

Description Xavi Hernandez 2016-06-14 06:47:15 UTC
+++ This bug was initially created as a clone of Bug #1345855 +++

Description of problem:

Incorrect management of timers failed to be cancelled could lead to crashes when the timer callback is executed and some resources have already been released by the cancelling thread.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2016-06-13 12:47:26 CEST ---

REVIEW: http://review.gluster.org/14712 (cluster/ec: Fix race in timer cancellation) posted (#1) for review on master by Xavier Hernandez (xhernandez)

--- Additional comment from Vijay Bellur on 2016-06-13 12:49:57 CEST ---

REVIEW: http://review.gluster.org/14712 (cluster/ec: Fix race in timer cancellation) posted (#2) for review on master by Xavier Hernandez (xhernandez)

--- Additional comment from Vijay Bellur on 2016-06-13 13:40:39 CEST ---

REVIEW: http://review.gluster.org/14712 (cluster/ec: Fix race in timer cancellation) posted (#3) for review on master by Xavier Hernandez (xhernandez)

--- Additional comment from Vijay Bellur on 2016-06-14 03:03:24 CEST ---

COMMIT: http://review.gluster.org/14712 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit fb013a9db2cc019d36b07644f24e6c15ed39725c
Author: Xavier Hernandez <xhernandez>
Date:   Mon Jun 13 12:42:47 2016 +0200

    cluster/ec: Fix race in timer cancellation
    
    A race in timer cancellation for delayed unlock could cause a crash
    if the cancelling thread fails to cancel the timer because it has
    already been fired but not executed, and the callback is scheduled
    out of the CPU, delaying it until the thread has released important
    resources needed by the callback.
    
    This patch improves the handling of this case to make it robust.
    
    Change-Id: I5c8a8c6610c5136f71b938aa78b5878ba05238d4
    BUG: 1345855
    Signed-off-by: Xavier Hernandez <xhernandez>
    Reviewed-on: http://review.gluster.org/14712
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 1 Vijay Bellur 2016-06-14 06:59:20 UTC
REVIEW: http://review.gluster.org/14724 (cluster/ec: Fix race in timer cancellation) posted (#1) for review on release-3.7 by Xavier Hernandez (xhernandez)

Comment 2 Vijay Bellur 2016-07-18 06:29:09 UTC
COMMIT: http://review.gluster.org/14724 committed in release-3.7 by Xavier Hernandez (xhernandez) 
------
commit 74d2aaf51c7ff601e4394cad9f8e23092267af55
Author: Xavier Hernandez <xhernandez>
Date:   Mon Jun 13 12:42:47 2016 +0200

    cluster/ec: Fix race in timer cancellation
    
    A race in timer cancellation for delayed unlock could cause a crash
    if the cancelling thread fails to cancel the timer because it has
    already been fired but not executed, and the callback is scheduled
    out of the CPU, delaying it until the thread has released important
    resources needed by the callback.
    
    This patch improves the handling of this case to make it robust.
    
    Backport of:
    > Change-Id: I5c8a8c6610c5136f71b938aa78b5878ba05238d4
    > BUG: 1345855
    > Signed-off-by: Xavier Hernandez <xhernandez>
    > Reviewed-on: http://review.gluster.org/14712
    > Smoke: Gluster Build System <jenkins.com>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.com>
    > Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    
    Change-Id: I5c8a8c6610c5136f71b938aa78b5878ba05238d4
    BUG: 1346156
    Signed-off-by: Xavier Hernandez <xhernandez>
    Reviewed-on: http://review.gluster.org/14724
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 3 Kaushal 2016-08-02 07:24:32 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.14, please open a new bug report.

glusterfs-3.7.14 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-August/050319.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user