Bug 1195120

Summary: DHT + epoll : client crashed
Product: [Community] GlusterFS Reporter: Raghavendra G <rgowdapp>
Component: distributeAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-bugs, mzywusko, nbalacha, rhs-bugs, shmohan, ssaha, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1194605 Environment:
Last Closed: 2015-05-14 17:29:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1194605    
Bug Blocks:    

Comment 1 Anand Avati 2015-02-24 05:03:40 UTC
REVIEW: http://review.gluster.org/9729 (cluster/dht: serialize execution of dht_discover_complete and STACK_DESTROY (frame).) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 2 Raghavendra G 2015-02-24 05:10:19 UTC
In the current code, dht_discover_complete can be invoked because of:
1. attempt_unwind is true
2. we are processing reply from the last subvolume

In scenario 1, following race is possible:
T1: calls dht_frame_return.
T2: calls dht_frame_return. This happens to be last call and hence it
    invokes dht_discover_complete, goes ahead and destroys frame
T1: since attempt_unwind is true, calls
    dht_discover_complete. However, since frame is already freed, call
    to dht_discover_complete can result in a crash.

The fix is to make sure that destruction of the frame is done only by
the thread executing dht_discover_complete.

Comment 3 Anand Avati 2015-02-26 07:15:26 UTC
COMMIT: http://review.gluster.org/9729 committed in master by Raghavendra G (rgowdapp) 
------
commit 2a60854e8360309347236852989d520a04975e9c
Author: Raghavendra G <rgowdapp>
Date:   Tue Feb 24 10:25:16 2015 +0530

    cluster/dht: serialize execution of dht_discover_complete and
    STACK_DESTROY (frame).
    
    In the current code, dht_discover_complete can be invoked because of:
    1. attempt_unwind is true
    2. we are processing reply from the last subvolume
    
    In scenario 1, following race is possible:
    
    T1: calls dht_frame_return.
    T2: calls dht_frame_return. This happens to be last call and hence it
        invokes dht_discover_complete, goes ahead and destroys frame
    T1: since attempt_unwind is true, calls
        dht_discover_complete. However, since frame is already freed, call
        to dht_discover_complete can result in a crash.
    
    The fix is to make sure that destruction of the frame is done only by
    the thread executing dht_discover_complete.
    
    Change-Id: I45765b90c4a9d0af0b33f8911b564d99e12d099e
    BUG: 1195120
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/9729
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Shyamsundar Ranganathan <srangana>
    Reviewed-by: N Balachandran <nbalacha>

Comment 4 Niels de Vos 2015-05-14 17:29:10 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 5 Niels de Vos 2015-05-14 17:35:51 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:38:13 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:46:00 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user