1734027 – glusterd 6.4 memory leaks 2-3 GB per 24h (OOM)

Bug 1734027 - glusterd 6.4 memory leaks 2-3 GB per 24h (OOM)

Summary: glusterd 6.4 memory leaks 2-3 GB per 24h (OOM)

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Sanju
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-29 13:23 UTC by Alex
Modified:	2019-09-21 09:20 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-09-21 09:20:41 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
gluster volume statedump gluster (5.03 MB, text/plain) 2019-07-29 13:23 UTC, Alex	no flags	Details
Ram usage over a few weeks for node 002 (76.03 KB, image/png) 2019-08-12 14:33 UTC, Alex	no flags	Details
Ram usage over a few weeks for node 001 & 003 (58.49 KB, image/png) 2019-08-12 14:33 UTC, Alex	no flags	Details
kill -SIGUSR1 $(pidof glusterd) (15.06 KB, text/plain) 2019-08-14 14:08 UTC, Alex	no flags	Details
cmd_history.log node 1 (1006.13 KB, text/plain) 2019-08-14 14:15 UTC, Alex	no flags	Details
cmd_history.log node 2 (1.19 MB, text/plain) 2019-08-14 14:15 UTC, Alex	no flags	Details
cmd_history.log node 3 (1.01 MB, text/plain) 2019-08-14 14:16 UTC, Alex	no flags	Details
View All

Description Alex 2019-07-29 13:23:27 UTC

Created attachment 1594280 [details]
gluster volume statedump gluster

Description of problem:

3 node cluster with a single volume with 3 replicas that was running 3.12.15.
Had a brick corruption on node 2, but looking at the logs I saw that this node was leaking memory a lot in the glusterd process. Since it was EOL, we decided to upgrade to 6.4 after healing and realized the leak was still present and more agressive.

We have a cron to restart the glusterd every morning so not to OOM the machine. 


Version-Release number of selected component (if applicable):

https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-6

```
ii  glusterfs-client                                   6.4-ubuntu1~xenial1            amd64                          clustered file-system (client package)
ii  glusterfs-common                                   6.4-ubuntu1~xenial1            amd64                          GlusterFS common libraries and translator modules
ii  glusterfs-server                                   6.4-ubuntu1~xenial1            amd64                          clustered file-system (server package)
```

From the affected gluster node, volume named "gluster" (3 replicas):
```
> gluster peer status
Number of Peers: 2

Hostname: 172.27.39.82
Uuid: 2cc7ba6f-5478-4b27-b647-0c1527192f5a
State: Peer in Cluster (Connected)

Hostname: 172.27.39.84
Uuid: 180e8f78-fa85-4cb8-8bbd-b0924a16ba60
State: Peer in Cluster (Connected)

> gluster volume status gluster
Status of volume: gluster
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.27.39.82:/mnt/xfs-drive-gluster/b
rick                                        49152     0          Y       1435 
Brick 172.27.39.81:/mnt/xfs-drive-gluster/b
rick                                        49152     0          Y       1458 
Brick 172.27.39.84:/mnt/xfs-drive-gluster/b
rick                                        49152     0          Y       1496 
Self-heal Daemon on localhost               N/A       N/A        Y       38966
Bitrot Daemon on localhost                  N/A       N/A        Y       39084
Scrubber Daemon on localhost                N/A       N/A        Y       39119
Self-heal Daemon on 172.27.39.82            N/A       N/A        Y       1452 
Bitrot Daemon on 172.27.39.82               N/A       N/A        Y       1469 
Scrubber Daemon on 172.27.39.82             N/A       N/A        Y       1507 
Self-heal Daemon on 172.27.39.84            N/A       N/A        Y       1516 
Bitrot Daemon on 172.27.39.84               N/A       N/A        Y       1535 
Scrubber Daemon on 172.27.39.84             N/A       N/A        Y       1614 
 
Task Status of Volume gluster
------------------------------------------------------------------------------
There are no active volume tasks

```


How reproducible:
Until a bit before the brick recovery, it used to takes weeks to for the memory to grow, but now it's on every restart of glusterd


Steps to Reproduce:
1. Boot the 3 nodes cluster
2. Wait 12-24h
3. Realize node 2's glusterd process is eating RAM at a very fast pace (~2.5GB/day) and needs to be restarted while the other 2 nodes have only consumed < 500MB.

Actual results:
Machine will OOM process once glusterd consume all available memory.

Expected results:
glusterd should garbage collect and deal with the memory that is available.

Additional info:
Nodes are 8GB Ram, dual core, running only gluster as a kubernetes storage backend for pods to mount from.

Comment 2 Alex 2019-07-29 13:31:09 UTC

Some extra runtime context, let me know what else could be useful:

> lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.6 LTS
Release:	16.04
Codename:	xenial


> uname -a
Linux staging-glusterfs-delve-002 4.4.0-157-generic #185-Ubuntu SMP Tue Jul 23 09:17:01 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


> xfs_info /dev/mapper/vg_gluster-gluster 
meta-data=/dev/mapper/vg_gluster-gluster isize=512    agcount=32, agsize=6143984 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0
data     =                       bsize=4096   blocks=196607488, imaxpct=25
         =                       sunit=16     swidth=16 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=96000, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


> getfattr -d -m. -ehex /mnt/xfs-drive-gluster/brick/
getfattr: Removing leading '/' from absolute path names
# file: mnt/xfs-drive-gluster/brick/
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x60ae0ddf67d04b23b6940250c17a2f04


> df -Th
Filesystem                     Type            Size  Used Avail Use% Mounted on
udev                           devtmpfs        3,9G     0  3,9G   0% /dev
tmpfs                          tmpfs           799M   55M  745M   7% /run
/dev/vda1                      ext4             40G  5,1G   33G  14% /
tmpfs                          tmpfs           3,9G   12K  3,9G   1% /dev/shm
tmpfs                          tmpfs           5,0M     0  5,0M   0% /run/lock
tmpfs                          tmpfs           3,9G     0  3,9G   0% /sys/fs/cgroup
/dev/mapper/vg_gluster-gluster xfs             750G  306G  445G  41% /mnt/xfs-drive-gluster
tmpfs                          tmpfs           3,9G   12K  3,9G   1% /var/lib/kubelet/pods/1315d812-acc9-11e9-afb7-fa163e7471c5/volumes/kubernetes.io~secret/default-token-2hg82
tmpfs                          tmpfs           3,9G   12K  3,9G   1% /var/lib/kubelet/pods/1315b953-acc9-11e9-afb7-fa163e7471c5/volumes/kubernetes.io~secret/weave-net-token-rdg8q
overlay                        overlay          40G  5,1G   33G  14% /var/lib/docker/overlay2/892baf20d8f9577409f09da9edca4749212bb20253c78ea15d92ff02eb988f19/merged
shm                            tmpfs            64M     0   64M   0% /var/lib/docker/containers/890608c20ee39b1f41b41168c60c09694e0e8e8a0652d5b9477f991fcb10ef5a/mounts/shm
overlay                        overlay          40G  5,1G   33G  14% /var/lib/docker/overlay2/cf40ce40639e1f8a3159d084f2f3e23ec234e9f805a3e968dbf97451ee5252e7/merged
overlay                        overlay          40G  5,1G   33G  14% /var/lib/docker/overlay2/d1629abc8688af0de0934b7ba7639bc6c4082ef094de1c2472ff4bc836e73286/merged
shm                            tmpfs            64M     0   64M   0% /var/lib/docker/containers/4b21fe68abff88477711a1eb43898ec0d488c12043d7dc2511d56d14ca902c97/mounts/shm
overlay                        overlay          40G  5,1G   33G  14% /var/lib/docker/overlay2/007e79620249bffe757e2faf6e0c5239143061d5197ad9846bdcf76b83f5e627/merged
overlay                        overlay          40G  5,1G   33G  14% /var/lib/docker/overlay2/b1e359e20d3f81d89e1e20def9dafd1bf3e6aba51bb6d342d94b366ca832430a/merged
shm                            tmpfs            64M     0   64M   0% /var/lib/docker/containers/9ae0233e20879f88c3128584a3c7bf4a35100037a2580e7bdb4b960237a8a5b0/mounts/shm
overlay                        overlay          40G  5,1G   33G  14% /var/lib/docker/overlay2/88c1abb4dcb36b04e7755798bc067a85863a393cedfa22383bf6b25523d5cc45/merged
tmpfs                          tmpfs           799M     0  799M   0% /run/user/0
172.27.39.81:/gluster          fuse.glusterfs  750G  315G  435G  42% /mnt/gluster

Comment 3 Atin Mukherjee 2019-08-01 03:36:21 UTC

3.12 version is EOLed, we have made several fixes related to memory leak, if this issue persists in the latest releases (release-5 or release-6) kindly reopen. Since we don't have an active 3.12 version to change the bug from RHGS to GlusterFS I have to choose the mainline version but actually this isn't applicable in mainline though.

Comment 4 Alex 2019-08-01 11:39:45 UTC

GLUSTERD version affected: 6.4

Hi,
I've only mentioned 3.12 for the background, but if you read further you'll see this is a bug on 6.4.
Thanks for reopening this.

Comment 5 Nithya Balachandran 2019-08-12 05:35:50 UTC

(In reply to Alex from comment #4)
> GLUSTERD version affected: 6.4
> 
> Hi,
> I've only mentioned 3.12 for the background, but if you read further you'll
> see this is a bug on 6.4.
> Thanks for reopening this.


Please provide the following:

1. gluster volume info
2. ps <pid> of the process that is consuming memory
3. statedump of the the process that is consuming memory by doing the following:
   kill -SIGUSR1 <pid>

The statedump will be created in /var/run/gluster. This directory must exist - please create it if it does not.

Comment 6 Alex 2019-08-12 14:33:00 UTC

Created attachment 1602963 [details]
Ram usage over a few weeks for node 002

Comment 7 Alex 2019-08-12 14:33:34 UTC

Created attachment 1602964 [details]
Ram usage over a few weeks for node 001 & 003

Comment 8 Alex 2019-08-12 14:35:19 UTC

1.
> gluster volume info
 
Volume Name: gluster
Type: Replicate
Volume ID: 60ae0ddf-67d0-4b23-b694-0250c17a2f04
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.27.39.82:/mnt/xfs-drive-gluster/brick
Brick2: 172.27.39.81:/mnt/xfs-drive-gluster/brick
Brick3: 172.27.39.84:/mnt/xfs-drive-gluster/brick
Options Reconfigured:
cluster.self-heal-daemon: enable
cluster.consistent-metadata: off
ssl.dh-param: /etc/ssl/dhparam.pem
ssl.ca-list: /etc/ssl/glusterfs.ca
ssl.own-cert: /etc/ssl/glusterfs.pem
ssl.private-key: /etc/ssl/glusterfs.key
ssl.cipher-list: HIGH:!SSLv2:!SSLv3:!TLSv1:!TLSv1.1:TLSv1.2:!3DES:!RC4:!aNULL:!ADH
ssl.certificate-depth: 2
server.ssl: on
client.ssl: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
features.barrier: disable
features.bitrot: on
features.scrub: Active
auto-delete: enable

2.
Over the past month, since a recovery, I've had glusterd grow in ram on node 002 every 24h and every week on 001 and 003.
Interestingly, since last week, it seems to have stopped the rapid growth on glusterd and glusterfsd might now be the one consuming more ram.

See attached graph of ram over the month, fast ram freeing, or quick vertical lines, are due to the cron that restarted glusterd.


glusterfs-001:
root      1435  118 41.3 5878344 3379376 ?     Ssl  jui24 31930:51 /usr/sbin/glusterfsd -s 172.27.39.82 --volfile-id gluster.172.27.39.82.mnt-xfs-drive-gluster-brick -p /var/run/gluster/vols/gluster/172.27.39.82-mnt-xfs-drive-gluster-brick.pid -S /var/run/gluster/b9ec53e974e8d080.socket --brick-name /mnt/xfs-drive-gluster/brick -l /var/log/glusterfs/bricks/mnt-xfs-drive-gluster-brick.log --xlator-option *-posix.glusterd-uuid=2cc7ba6f-5478-4b27-b647-0c1527192f5a --process-name brick --brick-port 49152 --xlator-option gluster-server.listen-port=49152
root     45129  0.2 17.8 1890584 1457448 ?     Ssl  aoû06  17:33 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

glusterfs-002:
root      1458 47.5 50.6 5878492 4141664 ?     Ssl  jui24 12775:43 /usr/sbin/glusterfsd -s 172.27.39.81 --volfile-id gluster.172.27.39.81.mnt-xfs-drive-gluster-brick -p /var/run/gluster/vols/gluster/172.27.39.81-mnt-xfs-drive-gluster-brick.pid -S /var/run/gluster/dcbebdf486b846e2.socket --brick-name /mnt/xfs-drive-gluster/brick -l /var/log/glusterfs/bricks/mnt-xfs-drive-gluster-brick.log --xlator-option *-posix.glusterd-uuid=be4912ac-b0a5-4a02-b8d6-7bccd3e1f807 --process-name brick --brick-port 49152 --xlator-option gluster-server.listen-port=49152
root     20329  0.0  1.2 506132 99128 ?        Ssl  03:00   0:22 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

glusterfs-003:
root      1496 60.6 42.1 5878776 3443712 ?     Ssl  jui24 16308:37 /usr/sbin/glusterfsd -s 172.27.39.84 --volfile-id gluster.172.27.39.84.mnt-xfs-drive-gluster-brick -p /var/run/gluster/vols/gluster/172.27.39.84-mnt-xfs-drive-gluster-brick.pid -S /var/run/gluster/848c5dbe437c2451.socket --brick-name /mnt/xfs-drive-gluster/brick -l /var/log/glusterfs/bricks/mnt-xfs-drive-gluster-brick.log --xlator-option *-posix.glusterd-uuid=180e8f78-fa85-4cb8-8bbd-b0924a16ba60 --process-name brick --brick-port 49152 --xlator-option gluster-server.listen-port=49152
root     58242  0.2 17.6 1816852 1440608 ?     Ssl  aoû06  19:08 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

3.
The kill -1 doesn't create anything in the /var/run/gluster folder on either the glusterd or glusterfsd PID.
Is it creating a different dump than the one generated above via:
`gluster volume statedump gluster` ?
Anyhting I am missing to have 6.4 dump its state?

Thanks!

Comment 9 Atin Mukherjee 2019-08-14 05:30:14 UTC

Are you running any monitoring command? Could you attach the cmd_history.log files from all the nodes? Restarting GlusterD with out any constant series of incoming commands wouldn't lead to such massive leak, so there's something terribly wrong going on with your setup.

Statedump can be captured with 'kill -SIGUSR1 $(pidof glusterd)' command . If you still fail to see any output file in /var/run/gluster please send us glusterd.log file too along with output of gluster peer status. Also how are you monitoring the memory? Through ps command?

Comment 10 Alex 2019-08-14 14:08:59 UTC

Created attachment 1603770 [details]
kill -SIGUSR1 $(pidof glusterd)

Comment 11 Alex 2019-08-14 14:15:22 UTC

Created attachment 1603772 [details]
cmd_history.log node 1

Comment 12 Alex 2019-08-14 14:15:39 UTC

Created attachment 1603773 [details]
cmd_history.log node 2

Comment 13 Alex 2019-08-14 14:16:01 UTC

Created attachment 1603774 [details]
cmd_history.log node 3

Comment 14 Alex 2019-08-14 14:20:02 UTC

Statedump worked, my bad, I was thinking kill -1 and not -10... :)
I've attached it under the command's name to generate it as its description.


I do have a glusterd-exporter for prometheus running.
I just stopped them for a few days to see what happens.

I've also attached all 3 cmd_history.log.
Interestingly, since I've stopped the glusterd-exporter at ~10AM EDT, 15 minutes prior to copying the logs (~14h UTC), the repeating message ("tail" below):

[2019-08-14 13:59:35.027247]  : volume profile gluster info cumulative : FAILED : Profile on Volume gluster is not started
[2019-08-14 13:59:35.193063]  : volume status all detail : SUCCESS
[2019-08-14 13:59:35.199847]  : volume status all detail : SUCCESS

... seem to have stopped at the same time!

Is that what you meant by monitoring command?
Is it a problem with that exporter not clearing something after getting some data or with gluster accumulating some sort of cache for those commands?

Thanks!

Comment 15 Alex 2019-08-27 14:11:48 UTC

Well after 2 weeks without the exporter it seems the memory is staying stable on all 3 nodes.
So this would indicate there is something not getting cleaned up by gluster after the specific requests from the prometheus exporter?

Comment 16 Atin Mukherjee 2019-08-28 04:16:13 UTC

We observed a leak in 'volume status all detail' which was fixed through https://bugzilla.redhat.com/show_bug.cgi?id=1694610 and the fix went into 6.1. It's surprising that you're still seeing this leak in latest glusterfs-6 series. We'll try to reproduce this in house and get back.

Comment 17 Sanju 2019-09-19 11:56:14 UTC

Alex - Do you still see the leak with release-6.4? Can you please check your cluster.op-version is set to latest?

"gluster volume get all cluster.op-version" gives you the op-version at which your cluster is running.

Comment 18 Alex 2019-09-20 19:06:30 UTC

I did under 6.4, now on 6.5 I didn't retry, but here are the values:

> gluster volume get all cluster.op-version
Option                                  Value
------                                  -----
cluster.op-version                      31202

> gluster volume get all cluster.max-op-version
Option                                  Value
------                                  -----
cluster.max-op-version                  60000

Should I move them up to the max value?

Thanks

Comment 19 Sanju 2019-09-21 09:20:41 UTC

Alex, after every upgrade you have to bump up the op-version using volume set command. In this case, although you're upgraded to 6.4, your cluster is still running at lower op-versoin. Please set it to 60000. That can be done by "gluster v set all cluster.op-version 60000". Once this is done, your cluster will be running at an upgraded op-version and you don't see the leak.

I'm closing this bug with the resolution "not a bug".

Thanks,
Sanju

Note You need to log in before you can comment on or make changes to this bug.