Bug 1348095
Summary: | GlusterFS (shd) memory leak on bricks reconnection | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Oleksandr Natalenko <oleksandr> |
Component: | core | Assignee: | Pranith Kumar K <pkarampu> |
Status: | CLOSED EOL | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.7.13 | CC: | bugs, einstcrazy, kdhananj, ndevos, oleksandr, pkarampu, ravishankar, rgowdapp, rhbugzilla |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-08 10:47:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Oleksandr Natalenko
2016-06-20 07:14:58 UTC
Also, I've tried to simulate bricks flapping by pkilling them. Indeed, client memory usage grows on each flap, but at the end it seems that everything gets cleaned, because I saw no difference between two valgrind outputs (one after flapping bricks and another without bricks flapping). So, the only more or less reliable source of info about leak should be runtime statedump. We've been using GlusterFS as a storage backend for OpenStack (Kilo) Cinder on CentOS 7 and have experienced quite a bit of memory leak issues starting with 3.6.9. We then upgraded to 3.7.11 in which the leak was roughly 10x less but still very much there as can be seen from these graphs http://ajaton.net/glusterfs/ - this unfortunately causes libvirtd to run out of memory over the time (hours/days) which again causes other issues. Our journey to 3.8.0 faced other major difficulties (bug filed 1348935) so we are back in 3.7.11. We can reproduce the issue fastest by simply creating an instance and volume and then keep attaching/detaching the volume to the instance until libvirtd dies. It is also enough to just have the mount there to have the leak visible. We have only observed so far the memory consumption of libvirtd (VmSize and VmRSS as graphed on above link). I'm just adding the info here as was discussed on IRC, please let me know if there is anything i can do to help debugging the issue! (In reply to Joonas Vilenius from comment #2) > We've been using GlusterFS as a storage backend for OpenStack (Kilo) Cinder > on CentOS 7 and have experienced quite a bit of memory leak issues starting > with 3.6.9. We then upgraded to 3.7.11 in which the leak was roughly 10x > less but still very much there as can be seen from these graphs > http://ajaton.net/glusterfs/ - this unfortunately causes libvirtd to run out > of memory over the time (hours/days) which again causes other issues. Our > journey to 3.8.0 faced other major difficulties (bug filed 1348935) so we > are back in 3.7.11. > > We can reproduce the issue fastest by simply creating an instance and volume > and then keep attaching/detaching the volume to the instance until libvirtd > dies. It is also enough to just have the mount there to have the leak > visible. We have only observed so far the memory consumption of libvirtd > (VmSize and VmRSS as graphed on above link). > > I'm just adding the info here as was discussed on IRC, please let me know if > there is anything i can do to help debugging the issue! Joonas, Are you using gfapi+libvirt for your setup? I believe the issue you are reporting is a bit different from what Oleksandr reported because no disconnects are involved. Do you think you can recreate the same issue with fuse mount? If yes, we can take statedumps of the mount process and then we know what could be the process that is leaking. Pranith Yes, i noticed once i added the comment. However this was the ticket i was referred to join from irc #gluster by Oleksandr Natalenko. I guess i could try to switch to fuse for testing purposes, we had that initially but since it belonged to same cgroup other components we got restarts that disconnected the mounts completely (https://bugs.launchpad.net/nova/+bug/1530860). Verified with 3.7.13: === root 4691 0.0 21.3 10847624 319440 ? Ssl Jul14 0:31 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/be1fed1e4f202cab410a1e89561d15d9.socket --xlator-option *replicate*.node-uuid=cd31965a-ea4a-4a7e-a4e7-b56a07de715 === 10G VSZ after multiple reconnections due to network issues. This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release. |