1348095 – GlusterFS (shd) memory leak on bricks reconnection

Bug 1348095 - GlusterFS (shd) memory leak on bricks reconnection

Summary: GlusterFS (shd) memory leak on bricks reconnection

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.7.13
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-06-20 07:14 UTC by Oleksandr Natalenko
Modified:	2017-03-08 10:47 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-03-08 10:47:55 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Oleksandr Natalenko 2016-06-20 07:14:58 UTC

We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for 
keeping volumes metadata.

We observe huge VSZ (VIRT) usage by glustershd on dummy node:

===
root     15109  0.0 13.7 76552820 535272 ?     Ssl  тра26   2:11 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
===

that is ~73G. RSS seems to be OK (~522M). Here is the statedump of 
glustershd process: https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6

Also, here is sum of sizes, presented in statedump:

===
# cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 
'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}'
353276406
===

That is ~337 MiB.

Also, I see lots of entries in pmap output:

===
00007ef9ff8f3000      4K -----   [ anon ]
00007ef9ff8f4000   8192K rw---   [ anon ]
00007efa000f4000      4K -----   [ anon ]
00007efa000f5000   8192K rw---   [ anon ]
===

If I sum them, I get the following:

===
# pmap 15109 | grep '[ anon ]' | grep 8192K | wc -l
9261
$ echo "9261*(8192+4)" | bc
75903156
===

Which is something like 70G+ I have got in VIRT.

Also, here are VIRT values from 2 replica nodes:

===
root     24659  0.0  0.3 5645836 451796 ?      Ssl  тра24   3:28 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option 
*replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87
root     18312  0.0  0.3 6137500 477472 ?      Ssl  тра19   6:37 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option 
*replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2
===

Those are 5 to 6G, which is much less than dummy node has, but still 
look too big for us.

Digging into logs showed that there are lots of reconnections to bricks (due to failing network). VSZ and RSS grows on each reconnection, VSZ by ~24M, RSS by ~500K.

I've taken 5 statedumps with 30 mins between each statedump. Also, 
before taking the statedump, I've recorded memory usage.

Memory consumption:

1. root      1010  0.0  9.6 7538188 374864 ?      Ssl  чер07   0:16 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
2. root      1010  0.0  9.6 7825048 375312 ?      Ssl  чер07   0:16 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
3. root      1010  0.0  9.6 7825048 375312 ?      Ssl  чер07   0:17 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
4. root      1010  0.0  9.6 8202064 375892 ?      Ssl  чер07   0:17 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
5. root      1010  0.0  9.6 8316808 376084 ?      Ssl  чер07   0:17 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7

As you may see VIRT constantly grows (except for one measurements), and 
RSS grows as well, although its increase is considerably smaller.

Now lets take a look at statedumps:

1. https://gist.github.com/3fa121c7531d05b210b84d9db763f359
2. https://gist.github.com/87f48b8ac8378262b84d448765730fd9
3. https://gist.github.com/f8780014d8430d67687c70cfd1df9c5c
4. https://gist.github.com/916ac788f806328bad9de5311ce319d7
5. https://gist.github.com/8ba5dbf27d2cc61c04ca954d7fb0a7fd

I'd go with comparing first statedump with last one, and here is diff 
output: https://gist.github.com/e94e7f17fe8b3688c6a92f49cbc15193

I see numbers changing, but now cannot conclude what is meaningful and 
what is meaningless.

Comment 1 Oleksandr Natalenko 2016-06-20 09:13:22 UTC

Also, I've tried to simulate bricks flapping by pkilling them. Indeed, client memory usage grows on each flap, but at the end it seems that everything gets cleaned, because I saw no difference between two valgrind outputs (one after flapping bricks and another without bricks flapping).

So, the only more or less reliable source of info about leak should be runtime statedump.

Comment 2 Joonas Vilenius 2016-06-22 20:49:38 UTC

We've been using GlusterFS as a storage backend for OpenStack (Kilo) Cinder on CentOS 7 and have experienced quite a bit of memory leak issues starting with 3.6.9. We then upgraded to 3.7.11 in which the leak was roughly 10x less but still very much there as can be seen from these graphs http://ajaton.net/glusterfs/ - this unfortunately causes libvirtd to run out of memory over the time (hours/days) which again causes other issues. Our journey to 3.8.0 faced other major difficulties (bug filed 1348935) so we are back in 3.7.11. 

We can reproduce the issue fastest by simply creating an instance and volume and then keep attaching/detaching the volume to the instance until libvirtd dies. It is also enough to just have the mount there to have the leak visible. We have only observed so far the memory consumption of libvirtd (VmSize and VmRSS as graphed on above link).

I'm just adding the info here as was discussed on IRC, please let me know if there is anything i can do to help debugging the issue!

Comment 3 Pranith Kumar K 2016-06-23 09:07:35 UTC

(In reply to Joonas Vilenius from comment #2)
> We've been using GlusterFS as a storage backend for OpenStack (Kilo) Cinder
> on CentOS 7 and have experienced quite a bit of memory leak issues starting
> with 3.6.9. We then upgraded to 3.7.11 in which the leak was roughly 10x
> less but still very much there as can be seen from these graphs
> http://ajaton.net/glusterfs/ - this unfortunately causes libvirtd to run out
> of memory over the time (hours/days) which again causes other issues. Our
> journey to 3.8.0 faced other major difficulties (bug filed 1348935) so we
> are back in 3.7.11. 
> 
> We can reproduce the issue fastest by simply creating an instance and volume
> and then keep attaching/detaching the volume to the instance until libvirtd
> dies. It is also enough to just have the mount there to have the leak
> visible. We have only observed so far the memory consumption of libvirtd
> (VmSize and VmRSS as graphed on above link).
> 
> I'm just adding the info here as was discussed on IRC, please let me know if
> there is anything i can do to help debugging the issue!

Joonas,
     Are you using gfapi+libvirt for your setup? I believe the issue you are reporting is a bit different from what Oleksandr reported because no disconnects are involved. Do you think you can recreate the same issue with fuse mount? If yes, we can take statedumps of the mount process and then we know what could be the process that is leaking.

Pranith

Comment 4 Joonas Vilenius 2016-06-23 10:37:17 UTC

Yes, i noticed once i added the comment. However this was the ticket i was referred to join from irc #gluster by Oleksandr Natalenko. I guess i could try to switch to fuse for testing purposes, we had that initially but since it belonged to same cgroup other components we got restarts that disconnected the mounts completely (https://bugs.launchpad.net/nova/+bug/1530860).

Comment 5 Oleksandr Natalenko 2016-07-22 14:57:33 UTC

Verified with 3.7.13:

===
root      4691  0.0 21.3 10847624 319440 ?     Ssl  Jul14   0:31 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/be1fed1e4f202cab410a1e89561d15d9.socket --xlator-option *replicate*.node-uuid=cd31965a-ea4a-4a7e-a4e7-b56a07de715
===

10G VSZ after multiple reconnections due to network issues.

Comment 6 Kaushal 2017-03-08 10:47:55 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.