Created attachment 1594280 [details] gluster volume statedump gluster Description of problem: 3 node cluster with a single volume with 3 replicas that was running 3.12.15. Had a brick corruption on node 2, but looking at the logs I saw that this node was leaking memory a lot in the glusterd process. Since it was EOL, we decided to upgrade to 6.4 after healing and realized the leak was still present and more agressive. We have a cron to restart the glusterd every morning so not to OOM the machine. Version-Release number of selected component (if applicable): https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-6 ``` ii glusterfs-client 6.4-ubuntu1~xenial1 amd64 clustered file-system (client package) ii glusterfs-common 6.4-ubuntu1~xenial1 amd64 GlusterFS common libraries and translator modules ii glusterfs-server 6.4-ubuntu1~xenial1 amd64 clustered file-system (server package) ``` From the affected gluster node, volume named "gluster" (3 replicas): ``` > gluster peer status Number of Peers: 2 Hostname: 172.27.39.82 Uuid: 2cc7ba6f-5478-4b27-b647-0c1527192f5a State: Peer in Cluster (Connected) Hostname: 172.27.39.84 Uuid: 180e8f78-fa85-4cb8-8bbd-b0924a16ba60 State: Peer in Cluster (Connected) > gluster volume status gluster Status of volume: gluster Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 172.27.39.82:/mnt/xfs-drive-gluster/b rick 49152 0 Y 1435 Brick 172.27.39.81:/mnt/xfs-drive-gluster/b rick 49152 0 Y 1458 Brick 172.27.39.84:/mnt/xfs-drive-gluster/b rick 49152 0 Y 1496 Self-heal Daemon on localhost N/A N/A Y 38966 Bitrot Daemon on localhost N/A N/A Y 39084 Scrubber Daemon on localhost N/A N/A Y 39119 Self-heal Daemon on 172.27.39.82 N/A N/A Y 1452 Bitrot Daemon on 172.27.39.82 N/A N/A Y 1469 Scrubber Daemon on 172.27.39.82 N/A N/A Y 1507 Self-heal Daemon on 172.27.39.84 N/A N/A Y 1516 Bitrot Daemon on 172.27.39.84 N/A N/A Y 1535 Scrubber Daemon on 172.27.39.84 N/A N/A Y 1614 Task Status of Volume gluster ------------------------------------------------------------------------------ There are no active volume tasks ``` How reproducible: Until a bit before the brick recovery, it used to takes weeks to for the memory to grow, but now it's on every restart of glusterd Steps to Reproduce: 1. Boot the 3 nodes cluster 2. Wait 12-24h 3. Realize node 2's glusterd process is eating RAM at a very fast pace (~2.5GB/day) and needs to be restarted while the other 2 nodes have only consumed < 500MB. Actual results: Machine will OOM process once glusterd consume all available memory. Expected results: glusterd should garbage collect and deal with the memory that is available. Additional info: Nodes are 8GB Ram, dual core, running only gluster as a kubernetes storage backend for pods to mount from.
Some extra runtime context, let me know what else could be useful: > lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.6 LTS Release: 16.04 Codename: xenial > uname -a Linux staging-glusterfs-delve-002 4.4.0-157-generic #185-Ubuntu SMP Tue Jul 23 09:17:01 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux > xfs_info /dev/mapper/vg_gluster-gluster meta-data=/dev/mapper/vg_gluster-gluster isize=512 agcount=32, agsize=6143984 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1 spinodes=0 data = bsize=4096 blocks=196607488, imaxpct=25 = sunit=16 swidth=16 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=96000, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 > getfattr -d -m. -ehex /mnt/xfs-drive-gluster/brick/ getfattr: Removing leading '/' from absolute path names # file: mnt/xfs-drive-gluster/brick/ trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x60ae0ddf67d04b23b6940250c17a2f04 > df -Th Filesystem Type Size Used Avail Use% Mounted on udev devtmpfs 3,9G 0 3,9G 0% /dev tmpfs tmpfs 799M 55M 745M 7% /run /dev/vda1 ext4 40G 5,1G 33G 14% / tmpfs tmpfs 3,9G 12K 3,9G 1% /dev/shm tmpfs tmpfs 5,0M 0 5,0M 0% /run/lock tmpfs tmpfs 3,9G 0 3,9G 0% /sys/fs/cgroup /dev/mapper/vg_gluster-gluster xfs 750G 306G 445G 41% /mnt/xfs-drive-gluster tmpfs tmpfs 3,9G 12K 3,9G 1% /var/lib/kubelet/pods/1315d812-acc9-11e9-afb7-fa163e7471c5/volumes/kubernetes.io~secret/default-token-2hg82 tmpfs tmpfs 3,9G 12K 3,9G 1% /var/lib/kubelet/pods/1315b953-acc9-11e9-afb7-fa163e7471c5/volumes/kubernetes.io~secret/weave-net-token-rdg8q overlay overlay 40G 5,1G 33G 14% /var/lib/docker/overlay2/892baf20d8f9577409f09da9edca4749212bb20253c78ea15d92ff02eb988f19/merged shm tmpfs 64M 0 64M 0% /var/lib/docker/containers/890608c20ee39b1f41b41168c60c09694e0e8e8a0652d5b9477f991fcb10ef5a/mounts/shm overlay overlay 40G 5,1G 33G 14% /var/lib/docker/overlay2/cf40ce40639e1f8a3159d084f2f3e23ec234e9f805a3e968dbf97451ee5252e7/merged overlay overlay 40G 5,1G 33G 14% /var/lib/docker/overlay2/d1629abc8688af0de0934b7ba7639bc6c4082ef094de1c2472ff4bc836e73286/merged shm tmpfs 64M 0 64M 0% /var/lib/docker/containers/4b21fe68abff88477711a1eb43898ec0d488c12043d7dc2511d56d14ca902c97/mounts/shm overlay overlay 40G 5,1G 33G 14% /var/lib/docker/overlay2/007e79620249bffe757e2faf6e0c5239143061d5197ad9846bdcf76b83f5e627/merged overlay overlay 40G 5,1G 33G 14% /var/lib/docker/overlay2/b1e359e20d3f81d89e1e20def9dafd1bf3e6aba51bb6d342d94b366ca832430a/merged shm tmpfs 64M 0 64M 0% /var/lib/docker/containers/9ae0233e20879f88c3128584a3c7bf4a35100037a2580e7bdb4b960237a8a5b0/mounts/shm overlay overlay 40G 5,1G 33G 14% /var/lib/docker/overlay2/88c1abb4dcb36b04e7755798bc067a85863a393cedfa22383bf6b25523d5cc45/merged tmpfs tmpfs 799M 0 799M 0% /run/user/0 172.27.39.81:/gluster fuse.glusterfs 750G 315G 435G 42% /mnt/gluster
3.12 version is EOLed, we have made several fixes related to memory leak, if this issue persists in the latest releases (release-5 or release-6) kindly reopen. Since we don't have an active 3.12 version to change the bug from RHGS to GlusterFS I have to choose the mainline version but actually this isn't applicable in mainline though.
GLUSTERD version affected: 6.4 Hi, I've only mentioned 3.12 for the background, but if you read further you'll see this is a bug on 6.4. Thanks for reopening this.
(In reply to Alex from comment #4) > GLUSTERD version affected: 6.4 > > Hi, > I've only mentioned 3.12 for the background, but if you read further you'll > see this is a bug on 6.4. > Thanks for reopening this. Please provide the following: 1. gluster volume info 2. ps <pid> of the process that is consuming memory 3. statedump of the the process that is consuming memory by doing the following: kill -SIGUSR1 <pid> The statedump will be created in /var/run/gluster. This directory must exist - please create it if it does not.
Created attachment 1602963 [details] Ram usage over a few weeks for node 002
Created attachment 1602964 [details] Ram usage over a few weeks for node 001 & 003
1. > gluster volume info Volume Name: gluster Type: Replicate Volume ID: 60ae0ddf-67d0-4b23-b694-0250c17a2f04 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.27.39.82:/mnt/xfs-drive-gluster/brick Brick2: 172.27.39.81:/mnt/xfs-drive-gluster/brick Brick3: 172.27.39.84:/mnt/xfs-drive-gluster/brick Options Reconfigured: cluster.self-heal-daemon: enable cluster.consistent-metadata: off ssl.dh-param: /etc/ssl/dhparam.pem ssl.ca-list: /etc/ssl/glusterfs.ca ssl.own-cert: /etc/ssl/glusterfs.pem ssl.private-key: /etc/ssl/glusterfs.key ssl.cipher-list: HIGH:!SSLv2:!SSLv3:!TLSv1:!TLSv1.1:TLSv1.2:!3DES:!RC4:!aNULL:!ADH ssl.certificate-depth: 2 server.ssl: on client.ssl: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off features.barrier: disable features.bitrot: on features.scrub: Active auto-delete: enable 2. Over the past month, since a recovery, I've had glusterd grow in ram on node 002 every 24h and every week on 001 and 003. Interestingly, since last week, it seems to have stopped the rapid growth on glusterd and glusterfsd might now be the one consuming more ram. See attached graph of ram over the month, fast ram freeing, or quick vertical lines, are due to the cron that restarted glusterd. glusterfs-001: root 1435 118 41.3 5878344 3379376 ? Ssl jui24 31930:51 /usr/sbin/glusterfsd -s 172.27.39.82 --volfile-id gluster.172.27.39.82.mnt-xfs-drive-gluster-brick -p /var/run/gluster/vols/gluster/172.27.39.82-mnt-xfs-drive-gluster-brick.pid -S /var/run/gluster/b9ec53e974e8d080.socket --brick-name /mnt/xfs-drive-gluster/brick -l /var/log/glusterfs/bricks/mnt-xfs-drive-gluster-brick.log --xlator-option *-posix.glusterd-uuid=2cc7ba6f-5478-4b27-b647-0c1527192f5a --process-name brick --brick-port 49152 --xlator-option gluster-server.listen-port=49152 root 45129 0.2 17.8 1890584 1457448 ? Ssl aoû06 17:33 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO glusterfs-002: root 1458 47.5 50.6 5878492 4141664 ? Ssl jui24 12775:43 /usr/sbin/glusterfsd -s 172.27.39.81 --volfile-id gluster.172.27.39.81.mnt-xfs-drive-gluster-brick -p /var/run/gluster/vols/gluster/172.27.39.81-mnt-xfs-drive-gluster-brick.pid -S /var/run/gluster/dcbebdf486b846e2.socket --brick-name /mnt/xfs-drive-gluster/brick -l /var/log/glusterfs/bricks/mnt-xfs-drive-gluster-brick.log --xlator-option *-posix.glusterd-uuid=be4912ac-b0a5-4a02-b8d6-7bccd3e1f807 --process-name brick --brick-port 49152 --xlator-option gluster-server.listen-port=49152 root 20329 0.0 1.2 506132 99128 ? Ssl 03:00 0:22 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO glusterfs-003: root 1496 60.6 42.1 5878776 3443712 ? Ssl jui24 16308:37 /usr/sbin/glusterfsd -s 172.27.39.84 --volfile-id gluster.172.27.39.84.mnt-xfs-drive-gluster-brick -p /var/run/gluster/vols/gluster/172.27.39.84-mnt-xfs-drive-gluster-brick.pid -S /var/run/gluster/848c5dbe437c2451.socket --brick-name /mnt/xfs-drive-gluster/brick -l /var/log/glusterfs/bricks/mnt-xfs-drive-gluster-brick.log --xlator-option *-posix.glusterd-uuid=180e8f78-fa85-4cb8-8bbd-b0924a16ba60 --process-name brick --brick-port 49152 --xlator-option gluster-server.listen-port=49152 root 58242 0.2 17.6 1816852 1440608 ? Ssl aoû06 19:08 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO 3. The kill -1 doesn't create anything in the /var/run/gluster folder on either the glusterd or glusterfsd PID. Is it creating a different dump than the one generated above via: `gluster volume statedump gluster` ? Anyhting I am missing to have 6.4 dump its state? Thanks!
Are you running any monitoring command? Could you attach the cmd_history.log files from all the nodes? Restarting GlusterD with out any constant series of incoming commands wouldn't lead to such massive leak, so there's something terribly wrong going on with your setup. Statedump can be captured with 'kill -SIGUSR1 $(pidof glusterd)' command . If you still fail to see any output file in /var/run/gluster please send us glusterd.log file too along with output of gluster peer status. Also how are you monitoring the memory? Through ps command?
Created attachment 1603770 [details] kill -SIGUSR1 $(pidof glusterd)
Created attachment 1603772 [details] cmd_history.log node 1
Created attachment 1603773 [details] cmd_history.log node 2
Created attachment 1603774 [details] cmd_history.log node 3
Statedump worked, my bad, I was thinking kill -1 and not -10... :) I've attached it under the command's name to generate it as its description. I do have a glusterd-exporter for prometheus running. I just stopped them for a few days to see what happens. I've also attached all 3 cmd_history.log. Interestingly, since I've stopped the glusterd-exporter at ~10AM EDT, 15 minutes prior to copying the logs (~14h UTC), the repeating message ("tail" below): [2019-08-14 13:59:35.027247] : volume profile gluster info cumulative : FAILED : Profile on Volume gluster is not started [2019-08-14 13:59:35.193063] : volume status all detail : SUCCESS [2019-08-14 13:59:35.199847] : volume status all detail : SUCCESS ... seem to have stopped at the same time! Is that what you meant by monitoring command? Is it a problem with that exporter not clearing something after getting some data or with gluster accumulating some sort of cache for those commands? Thanks!
Well after 2 weeks without the exporter it seems the memory is staying stable on all 3 nodes. So this would indicate there is something not getting cleaned up by gluster after the specific requests from the prometheus exporter?
We observed a leak in 'volume status all detail' which was fixed through https://bugzilla.redhat.com/show_bug.cgi?id=1694610 and the fix went into 6.1. It's surprising that you're still seeing this leak in latest glusterfs-6 series. We'll try to reproduce this in house and get back.
Alex - Do you still see the leak with release-6.4? Can you please check your cluster.op-version is set to latest? "gluster volume get all cluster.op-version" gives you the op-version at which your cluster is running.
I did under 6.4, now on 6.5 I didn't retry, but here are the values: > gluster volume get all cluster.op-version Option Value ------ ----- cluster.op-version 31202 > gluster volume get all cluster.max-op-version Option Value ------ ----- cluster.max-op-version 60000 Should I move them up to the max value? Thanks
Alex, after every upgrade you have to bump up the op-version using volume set command. In this case, although you're upgraded to 6.4, your cluster is still running at lower op-versoin. Please set it to 60000. That can be done by "gluster v set all cluster.op-version 60000". Once this is done, your cluster will be running at an upgraded op-version and you don't see the leak. I'm closing this bug with the resolution "not a bug". Thanks, Sanju