Bug 1399134

Summary: GlusterFS client crashes during remove-brick operation
Product: [Community] GlusterFS Reporter: Raghavendra G <rgowdapp>
Component: distributeAssignee: Raghavendra G <rgowdapp>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: bugs, nbalacha, rhs-bugs, storage-qa-internal, tdesala
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.10.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1399100
: 1399422 1399423 1399424 (view as bug list) Environment:
Last Closed: 2017-03-06 17:36:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1399100    
Bug Blocks: 1399422, 1399423, 1399424    

Comment 1 Worker Ant 2016-11-28 11:09:04 UTC
REVIEW: http://review.gluster.org/15945 (cluster/dht: Fix memory corruption during reconfigure) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 2 Worker Ant 2016-11-28 11:38:36 UTC
REVIEW: http://review.gluster.org/15945 (cluster/dht: Fix memory corruption while accessing regex stored in private) posted (#2) for review on master by Raghavendra G (rgowdapp)

Comment 3 Worker Ant 2016-12-02 07:26:29 UTC
REVIEW: http://review.gluster.org/15945 (cluster/dht: Fix memory corruption while accessing regex stored in private) posted (#3) for review on master by Raghavendra G (rgowdapp)

Comment 4 Worker Ant 2016-12-05 18:11:15 UTC
REVIEW: http://review.gluster.org/16030 (libglusterfs: serialize init/reconfigure calls) posted (#1) for review on master by Jeff Darcy (jdarcy)

Comment 5 Worker Ant 2016-12-05 21:57:28 UTC
REVIEW: http://review.gluster.org/16030 (libglusterfs: serialize init/reconfigure calls) posted (#2) for review on master by Jeff Darcy (jdarcy)

Comment 6 Worker Ant 2016-12-08 17:56:45 UTC
COMMIT: http://review.gluster.org/15945 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit 64451d0f25e7cc7aafc1b6589122648281e4310a
Author: Raghavendra G <rgowdapp>
Date:   Tue Nov 8 12:09:57 2016 +0530

    cluster/dht: Fix memory corruption while accessing regex stored in
    private
    
    If reconfigure is executed parallely (or concurrently with dht_init),
    there are races that can corrupt memory. One such race is modification
    of regexes stored in conf (conf->rsync_regex_valid and
    conf->extra_regex_valid) through dht_init_regex. With change [1],
    reconfigure codepath can get executed parallely (with itself or with
    dht_init) and this fix is needed.
    
    Also, a reconfigure can race with any thread doing dht_layout_search,
    resulting in dht_layout_search accessing regex freed up by reconfigure
    (like in bz 1399134).
    
    [1] http://review.gluster.org/15046
    
    Change-Id: I039422a65374cf0ccbe0073441f0e8c442ebf830
    BUG: 1399134
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/15945
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 7 Worker Ant 2017-01-06 04:38:54 UTC
COMMIT: http://review.gluster.org/16030 committed in master by Raghavendra G (rgowdapp) 
------
commit c6b0adb483c1d0c4922e6d4cb77abfb69d314a8e
Author: Jeff Darcy <jdarcy>
Date:   Mon Dec 5 13:01:41 2016 -0500

    libglusterfs: serialize init/reconfigure calls
    
    These functions do not generally "expect" to be called more than once
    in parallel, and many are likely to misbehave in that case (one case
    in DHT already).  Such parallel calls have not generally happened
    because there are only a few places where we call these functions, and
    those have been implicitly serialized until recently.  However, recent
    changes in the epoll layer change that, as does brick multiplexing.
    Therefore, the serialization is now explicit at the init/reconfigure
    level.
    
    It would be sufficient to serialize calls to a particular translator's
    init and reconfigure functions, but that would require per-translator
    locks and a bit more complexity in maintaining/using them.  Since
    there's no clear reason why we would need or want to support a higher
    level of parallelism, the simpler approach of a global lock should
    suffice.
    
    Change-Id: I26296c2826e91dc00b7f0c2061bcc2964ef90c4c
    BUG: 1399134
    Signed-off-by: Jeff Darcy <jdarcy>
    Reviewed-on: http://review.gluster.org/16030
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 8 Shyamsundar 2017-03-06 17:36:17 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/