Bug 1342954
| Summary: | self heal deamon killed due to oom kills on a dist-disperse volume using nfs ganesha | |||
|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Ashish Pandey <aspandey> | |
| Component: | disperse | Assignee: | Ashish Pandey <aspandey> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
| Severity: | urgent | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 3.8.0 | CC: | amukherj, bugs, byarlaga, nchilaka, pkarampu, rcyriac, rhinduja | |
| Target Milestone: | --- | Keywords: | ZStream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.8.0 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1342796 | |||
| : | 1342964 (view as bug list) | Environment: | ||
| Last Closed: | 2016-06-16 12:33:36 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1342426, 1342796 | |||
| Bug Blocks: | 1342964 | |||
|
Comment 1
Vijay Bellur
2016-06-06 08:40:04 UTC
Additional comment from Pranith Kumar K on 2016-06-04 07:52:19 EDT ---
Steps to re-create the issue without nfs-ganesha (The issue seems to be with cache-invalidation + ec. Cache invalidation is enabled when nfs-ganesha is enabled):
On single machine this issue can be re-created with the following steps:
1) glusterd && gluster v create ec2 redundancy 2 localhost.localdomain:/home/gfs/ec_{0..5} force && gluster v start ec2 && mount -t glusterfs localhost.localdomain:/ec2 /mnt/fuse1 && mount -t glusterfs localhost.localdomain:/ec2 /mnt/ec2
2) gluster volume set ec2 features.cache-invalidation on
3) On two different terminals which are in /mnt/ec2 and /mnt/fuse1 execute:
while true; do echo abc > a; done
4) Execute gluster volume heal ec2 in a loop for 10 times, it may hang in the middle when we do this.
5) Keep observing memory usage going up in shd by doing:
top -p <pid-of-shd>
REVIEW: http://review.gluster.org/14651 (cluster/ec: Restrict the launch of replace brick heal) posted (#2) for review on release-3.8 by Ashish Pandey (aspandey) COMMIT: http://review.gluster.org/14651 committed in release-3.8 by Niels de Vos (ndevos) ------ commit c8d78fa265b8b938bbaee5bc8a59b60a58ae0440 Author: Ashish Pandey <aspandey> Date: Mon Jun 6 10:17:54 2016 +0530 cluster/ec: Restrict the launch of replace brick heal Problem: When features.cache-invalidation is ON, a lot of ec_notify function gets called which leads to launch of too many heals. This leads to no heal completion, which causes accumulation of heals. Solution: ec_launch_replace_heal should not be launch for every event. Replace brick will trigger a child up event and then only this heal function should be called. master - http://review.gluster.org/#/c/14649/ Change-Id: I57b44c6a279d57230daea1d93229be6069245b7d BUG: 1342954 Signed-off-by: Ashish Pandey <aspandey> Reviewed-on: http://review.gluster.org/14651 Reviewed-by: Xavier Hernandez <xhernandez> Smoke: Gluster Build System <jenkins.com> CentOS-regression: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |