Bug 1399134 - GlusterFS client crashes during remove-brick operation
Summary: GlusterFS client crashes during remove-brick operation
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On: 1399100
Blocks: 1399422 1399423 1399424
TreeView+ depends on / blocked
 
Reported: 2016-11-28 11:07 UTC by Raghavendra G
Modified: 2017-03-06 17:36 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.10.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1399100
: 1399422 1399423 1399424 (view as bug list)
Environment:
Last Closed: 2017-03-06 17:36:17 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Worker Ant 2016-11-28 11:09:04 UTC
REVIEW: http://review.gluster.org/15945 (cluster/dht: Fix memory corruption during reconfigure) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 2 Worker Ant 2016-11-28 11:38:36 UTC
REVIEW: http://review.gluster.org/15945 (cluster/dht: Fix memory corruption while accessing regex stored in private) posted (#2) for review on master by Raghavendra G (rgowdapp)

Comment 3 Worker Ant 2016-12-02 07:26:29 UTC
REVIEW: http://review.gluster.org/15945 (cluster/dht: Fix memory corruption while accessing regex stored in private) posted (#3) for review on master by Raghavendra G (rgowdapp)

Comment 4 Worker Ant 2016-12-05 18:11:15 UTC
REVIEW: http://review.gluster.org/16030 (libglusterfs: serialize init/reconfigure calls) posted (#1) for review on master by Jeff Darcy (jdarcy)

Comment 5 Worker Ant 2016-12-05 21:57:28 UTC
REVIEW: http://review.gluster.org/16030 (libglusterfs: serialize init/reconfigure calls) posted (#2) for review on master by Jeff Darcy (jdarcy)

Comment 6 Worker Ant 2016-12-08 17:56:45 UTC
COMMIT: http://review.gluster.org/15945 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit 64451d0f25e7cc7aafc1b6589122648281e4310a
Author: Raghavendra G <rgowdapp>
Date:   Tue Nov 8 12:09:57 2016 +0530

    cluster/dht: Fix memory corruption while accessing regex stored in
    private
    
    If reconfigure is executed parallely (or concurrently with dht_init),
    there are races that can corrupt memory. One such race is modification
    of regexes stored in conf (conf->rsync_regex_valid and
    conf->extra_regex_valid) through dht_init_regex. With change [1],
    reconfigure codepath can get executed parallely (with itself or with
    dht_init) and this fix is needed.
    
    Also, a reconfigure can race with any thread doing dht_layout_search,
    resulting in dht_layout_search accessing regex freed up by reconfigure
    (like in bz 1399134).
    
    [1] http://review.gluster.org/15046
    
    Change-Id: I039422a65374cf0ccbe0073441f0e8c442ebf830
    BUG: 1399134
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/15945
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 7 Worker Ant 2017-01-06 04:38:54 UTC
COMMIT: http://review.gluster.org/16030 committed in master by Raghavendra G (rgowdapp) 
------
commit c6b0adb483c1d0c4922e6d4cb77abfb69d314a8e
Author: Jeff Darcy <jdarcy>
Date:   Mon Dec 5 13:01:41 2016 -0500

    libglusterfs: serialize init/reconfigure calls
    
    These functions do not generally "expect" to be called more than once
    in parallel, and many are likely to misbehave in that case (one case
    in DHT already).  Such parallel calls have not generally happened
    because there are only a few places where we call these functions, and
    those have been implicitly serialized until recently.  However, recent
    changes in the epoll layer change that, as does brick multiplexing.
    Therefore, the serialization is now explicit at the init/reconfigure
    level.
    
    It would be sufficient to serialize calls to a particular translator's
    init and reconfigure functions, but that would require per-translator
    locks and a bit more complexity in maintaining/using them.  Since
    there's no clear reason why we would need or want to support a higher
    level of parallelism, the simpler approach of a global lock should
    suffice.
    
    Change-Id: I26296c2826e91dc00b7f0c2061bcc2964ef90c4c
    BUG: 1399134
    Signed-off-by: Jeff Darcy <jdarcy>
    Reviewed-on: http://review.gluster.org/16030
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 8 Shyamsundar 2017-03-06 17:36:17 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.