Bug 1342964

Summary: self heal deamon killed due to oom kills on a dist-disperse volume using nfs ganesha
Product: [Community] GlusterFS Reporter: Ashish Pandey <aspandey>
Component: disperseAssignee: Ashish Pandey <aspandey>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.7.11CC: amukherj, bugs, byarlaga, nchilaka, pkarampu, rcyriac, rhinduja
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.12 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1342954 Environment:
Last Closed: 2016-06-28 12:19:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1342426, 1342796, 1342954    
Bug Blocks:    

Comment 1 Vijay Bellur 2016-06-06 08:53:43 UTC
REVIEW: http://review.gluster.org/14652 (cluster/ec: Restrict the launch of replace brick heal) posted (#1) for review on release-3.7 by Ashish Pandey (aspandey)

Comment 2 Vijay Bellur 2016-06-08 06:58:11 UTC
REVIEW: http://review.gluster.org/14652 (cluster/ec: Restrict the launch of replace brick heal) posted (#2) for review on release-3.7 by Ashish Pandey (aspandey)

Comment 3 Vijay Bellur 2016-06-09 09:04:14 UTC
COMMIT: http://review.gluster.org/14652 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 37ac79fe54e91a8ffe23318855f2cda582a27798
Author: Ashish Pandey <aspandey>
Date:   Mon Jun 6 10:17:54 2016 +0530

    cluster/ec: Restrict the launch of replace brick heal
    
    Problem: When features.cache-invalidation is ON, a lot of
    ec_notify function gets called which leads to launch of
    too many heals. This leads to no heal completion,
    which causes accumulation of heals.
    
    Solution: ec_launch_replace_heal should not be launch
    for every event. Replace brick will trigger a child up
    event and then only this heal function should be called.
    
    master -
    http://review.gluster.org/#/c/14649/
    
    Change-Id: I57b44c6a279d57230daea1d93229be6069245b7d
    BUG: 1342964
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/14652
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Smoke: Gluster Build System <jenkins.com>
    Reviewed-by: Xavier Hernandez <xhernandez>

Comment 4 Kaushal 2016-06-28 12:19:47 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report.

glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user