Description of problem: Our host are configured to access/use a glusterfs volume. Periodically the glusterfs service is oom killed due to lack of available ram. It seems like this was a problem fixed in glusterfs 3.13 which is no longer available. The version of gluster that ovirt 4.2.3 uses is 3.12. Version-Release number of selected component (if applicable): Ovirt 4.2.3.1-1 glusterfs 3.12.9-1 How reproducible: Leave host running and over time it will run out of ram. We haven't found any steps to follow to force this condition to happen. Actual results: # grep glusterfs messages* |grep -i kill messages:Jun 20 23:52:11 il-hv3 kernel: Out of memory: Kill process 11659 (glusterfs) score 516 or sacrifice child messages:Jun 20 23:52:11 il-hv3 kernel: Killed process 11659 (glusterfs) total-vm:27910560kB, anon-rss:26193264kB, file-rss:964kB, shmem-rss:0kB messages-20180527:May 27 01:05:28 il-hv3 kernel: Out of memory: Kill process 11806 (glusterfs) score 476 or sacrifice child messages-20180527:May 27 01:05:28 il-hv3 kernel: Killed process 11806 (glusterfs) total-vm:25821604kB, anon-rss:23623264kB, file-rss:964kB, shmem-rss:0kB Expected results: No killing of glusterfs Additional info:
I just noticed that one of our Host was showing 70% memory consumption in the web interfac. When I moved the 4 vms from it it's still showing 40% memory consumption. Looking at top for memory consumption it's showing 45% of the systems memory is consumed by glusterfs. 11958 45.2 21.7g 21.3g 92 21.6g 3916 0 0 0.0 glusterfs In the ovirt web app a event was showing that powermanagement failed on this host as well. After putting the host in maintenance mode and then activating it the glusterfs service is no longer consuming 45% of ram. ps results show 0.1%. root 25893 0.1 0.0 449104 14200 ? S<sl 10:06 0:00 /usr/sbin/glusterfs --volfile-server=10.4.16.11 --volfile-server=10.4.16.12 --volfile-server=10.4.16.19 --volfile-id=/EXPORTB /rhev/data-center/mnt/glusterSD/10.4.16.11:_EXPORTB root 25984 0.1 0.0 440644 12408 ? S<sl 10:06 0:00 /usr/sbin/glusterfs --volfile-server=10.4.16.11 --volfile-server=10.4.16.19 --volfile-id=gv1 /rhev/data-center/mnt/glusterSD/10.4.16.11:gv1
GlusterFS team, any chance the fix can be backported to 3.12 from 3.13?
> It seems like this was a problem fixed in glusterfs 3.13 which is no longer available Do you have the patch or the bugzilla which has the fix in glusterfs 3.13?
This is what one of my co-workers pointed me to that seemed to be related to our memory consumption issue. https://bugzilla.redhat.com/show_bug.cgi?id=1496379
Edward answered the needinfo on me on comment #4
I've encountered this memory use issue as well, client side. It's the glusterfs process mounting gluster volumes that has consumed the memory, seems to be fuse mounts using it, not glfsapi mounts (it occurred on my Hosted Engine). Probably began appearing after upgrading to 3.12.9, and is also present in 3.12.11. I was able to downgrade my hosts and confirmed it's not present in 3.12.8 or 3.12.6. Attaching a glusterdump from a processes using a lot of ram before I restarted/downgraded. As I'm currently back on 3.12.6, I can't do a time differential set, but if you needed one I can update a node and provide in fairly short order.
Created attachment 1457569 [details] state dump showing large memory use
I just cleared a host with high memory consumption this morning. I did create a dump of it after this and have 3 vms running and will need to wait till it consumes more memory to do another dump. I will upload both once I have them.
Hari, could you take a look?
Hi, The above bug was backported to 3.12.2 https://bugzilla.redhat.com/show_bug.cgi?id=1497084 This could be because of some other memory leak that would be existing in the gluster.The component that causes the leak has to be identified. As its seen in the mount the chances of glusterd beig the component is lesser. Budic, thanks for the statedump. Edward, The statedump in intervals will be a lot helpful to see if there is a potential leak and to check where it is. Do update the statedumps will take a look into them. Regards, Hari.
The protocol clients seem to be consuming a lot of resource: Client 0: [protocol/client.gDBv1-client-0 - usage-type gf_common_mt_memdup memusage] size=151041876 num_allocs=12586878 max_size=151042352 max_num_allocs=12587210 total_allocs=159274593 [protocol/client.gDBv1-client-0 - usage-type gf_common_mt_char memusage] size=311525526 num_allocs=12586886 max_size=311534067 max_num_allocs=12587217 total_allocs=226411352 Client 2: [protocol/client.gDBv1-client-2 - usage-type gf_common_mt_memdup memusage] size=151042232 num_allocs=12586936 max_size=151042348 max_num_allocs=12587210 total_allocs=159380212 [protocol/client.gDBv1-client-2 - usage-type gf_common_mt_char memusage] size=311526832 num_allocs=12586946 max_size=311533894 max_num_allocs=12587219 total_allocs=226444611 and so on for the other clients too. As seen its high with gf_common_mt_memdup and gf_common_mt_char To verify this, the statedumps at interval from Edward will be helpful. It would be great if someone from protocol team can take a look.
Had a little time, so switched one of my ovirt hosted engine hosts back up to 3.12.9 and let it run for a 7 hours, glusterfs is up to 1G RS. Dumps will be attached, first one is right after rebooting and mounting the volume, 2nd is after moving the engine to it, 3 is ~1 hour later, 4th is 7 hours after moving the engine to it. The hosted engine is the only VM on it, and is mounting it via fuse: -drive file=/var/run/vdsm/storage/5a261b58-94a0-4783-a797-0cd457364de2/85d5a435-5176-4ffb-adad-0eb1a5ac1776/6615508f-946f-4656-8542-7c3823829d50,format=raw,if=none,id=drive-virtio-disk0 glusterfs-libs-3.12.9-1.el7.x86_64 centos-release-gluster312-1.0-1.el7.centos.noarch glusterfs-api-3.12.9-1.el7.x86_64 libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64 glusterfs-client-xlators-3.12.9-1.el7.x86_64 glusterfs-cli-3.12.9-1.el7.x86_64 glusterfs-3.12.9-1.el7.x86_64 glusterfs-rdma-3.12.9-1.el7.x86_64 glusterfs-fuse-3.12.9-1.el7.x86_64 qemu-img-ev-2.10.0-21.el7_5.4.1.x86_64 qemu-kvm-common-ev-2.10.0-21.el7_5.4.1.x86_64 libvirt-daemon-driver-qemu-3.9.0-14.el7_5.6.x86_64 qemu-kvm-tools-ev-2.10.0-21.el7_5.4.1.x86_64 qemu-kvm-ev-2.10.0-21.el7_5.4.1.x86_64 Let me know if you need any other info, this is easily reproducible for me.
Created attachment 1458616 [details] statedump series 1 of 4
Created attachment 1458617 [details] statedump series 2 of 4
Created attachment 1458618 [details] statedump series 3 of 4
Created attachment 1458619 [details] statedump series 4 of 4
Whups, those wound up being uploaded in reverse order, 1 is the last one I took, 4 was the first.
Amar, who can take a look at this?
Created attachment 1459214 [details] 1st state dump with low memory consumption
Created attachment 1459215 [details] 2nd state dump with large memory consumption
I've uploaded two state dumps. The servers memory consumption jumped to 92% from last tuesday to today. I believe it was running around 30% initially with 3 vms. After migrating the vms off of this host Memory consumption is still at 71%.
I am experiencing the same behavior. Glusterfs is on a weekly basis consuming all RAM of the servers and I need to put hosts to maintenance then activate to restore normal levels of RAM utilization.
Same issue with 3.12.12 on with gluster's xenial PPA. Clients used by k8s will, for a single mount, consume 5-6GB and eventually kill the machine. Is this bug fixed since 3.12? ie: If we upgrade to 4.1, would it resolve the issue?
The patch https://review.gluster.org/#/c/16706/3 created a memory leak in the protocol client and this was fixed as a part of the https://review.gluster.org/#/c/20437/ If you still see the issue persist, do let us know.
In glusterfs 4.1.2 (and also 3.12.9) we still see memory leaks on the glusterfs fuse client part, so for me (but I'm not the reporter) do not consider this solved. Though I have to say that I'm not sure we're talking about the same bug here, at least memory leak(s) are still there. We see this on a k8s cluster where the glusterfs fuse client for a jenkins pod, which does plenty of builds (which again has lots of files and dirs and does plenty of fs actions) will cause the fuse client proces to slowly keep on growing till the node starts hanging.
The fix I have mentioned above got it lately. I'm not sure if it got in 4.1.2(can check and update you once the gerrit is available) I think its only in master. It needs backporting to any of the newer branches. You can try with the above patch and then see if the memory is still leaking. Or wait for the next release to happen with the patch and test on it. without the patch the memory leak will still be there. The bug hasn't be closed as the fix hasn't been made available with any of the release. Once the bug is being closed, we will be able to see in which version the fix is available.
Hi Hari, do we have any estimation when the patch will be included on 3.12 and up? Thanx
I tried to apt-get source the xenial 3.12.12 package and include the modifications in those 2 files but it wouldn't build, so I guess someone who understand the code base need to backport it indeed. Hopefully this happen before the next LTS minor release.
Alex, I'm a bit held up, can you people wait till Monday/Tuesday? We will back-port it to the next release.
@hari, I see that the leak is definitely present in 3.12.x bits, but I am not sure why the patch you suggested above (about memcpy) is the reason for this. I removed the lines added by the patch https://review.gluster.org/#/c/glusterfs/+/16706/. I reverted only that patch in the branch, and still see the leak.
Tried to check the leak using valgrind on a 3.12 branch latest master... ==31275== 2,291,328 (313,344 direct, 1,977,984 indirect) bytes in 1,224 blocks are definitely lost in loss record 524 of 530 ==31275== at 0x4C2CB6B: malloc (vg_replace_malloc.c:299) ==31275== by 0x4E888CB: mem_get (mem-pool.c:843) ==31275== by 0x4E8897D: mem_get0 (mem-pool.c:741) ==31275== by 0x12381B30: afr_fsync (afr-common.c:3607) ==31275== by 0x1261201E: dht_fsync (dht-inode-read.c:998) ==31275== by 0x12845F05: wb_fsync_helper (write-behind.c:1978) ==31275== by 0x4E8499C: call_resume_keep_stub (call-stub.c:2582) ==31275== by 0x1284B164: wb_do_winds (write-behind.c:1676) ==31275== by 0x1284B27A: wb_process_queue (write-behind.c:1713) ==31275== by 0x1284B336: wb_fulfill_cbk (write-behind.c:1058) ==31275== by 0x12603DE5: dht_writev_cbk (dht-inode-write.c:119) ==31275== by 0x1234DD20: afr_writev_unwind (afr-inode-write.c:246) ==31275== ==31275== 5,263,980 (719,872 direct, 4,544,108 indirect) bytes in 2,812 blocks are definitely lost in loss record 529 of 530 ==31275== at 0x4C2CB6B: malloc (vg_replace_malloc.c:299) ==31275== by 0x4E888CB: mem_get (mem-pool.c:843) ==31275== by 0x4E8897D: mem_get0 (mem-pool.c:741) ==31275== by 0x1235B93F: afr_changelog_do (afr-transaction.c:1685) ==31275== by 0x1235C9DC: afr_changelog_post_op_now (afr-transaction.c:992) ==31275== by 0x123815C5: afr_fsync_cbk (afr-common.c:3569) ==31275== by 0x120ECFE1: client3_3_fsync_cbk (client-rpc-fops.c:954) ==31275== by 0x513CE7F: rpc_clnt_handle_reply (rpc-clnt.c:778) ==31275== by 0x513D17A: rpc_clnt_notify (rpc-clnt.c:971) ==31275== by 0x5139742: rpc_transport_notify (rpc-transport.c:538) ==31275== by 0x1103AB01: socket_event_poll_in (socket.c:2315) ==31275== by 0x1103CF72: socket_event_handler (socket.c:2467) ------ The script used to reproduce the leak is: python << EOF import os, time with open("/mnt/fuse/tmpfile", "w") as fh: while True: os.lseek(fh.fileno(), 0, os.SEEK_SET) os.write(fh.fileno(), str(time.time()).encode("ascii")) os.fsync(fh) EOF
(In reply to Amar Tumballi from comment #30) > @hari, I see that the leak is definitely present in 3.12.x bits, but I am > not sure why the patch you suggested above (about memcpy) is the reason for > this. I removed the lines added by the patch > https://review.gluster.org/#/c/glusterfs/+/16706/. I reverted only that > patch in the branch, and still see the leak. Hi Amar, I came to know that the leak got in with the 3.12 version. Checked the patches that went it 3.12 after 3.8 in this section. while inspecting the patches the above patch does a memcpy for gfid which is a char. It confirms the leak with [protocol/client.gDBv1-client-0 - usage-type gf_common_mt_char memusage] size=311525526 num_allocs=12586886 max_size=311534067 max_num_allocs=12587217 total_allocs=226411352 This portion was fixed with https://review.gluster.org/#/c/20437/ The portion for [protocol/client.gDBv1-client-2 - usage-type gf_common_mt_memdup memusage] size=151042232 num_allocs=12586936 max_size=151042348 max_num_allocs=12587210 total_allocs=159380212 is yet to be identified. This might be the reason for leak still seen without the patch.
Sorry. Just noticed that memcpy doesn't do an malloc. The patch identified was wrong for RCA.
*** Bug 1615443 has been marked as a duplicate of this bug. ***
Anything new? Can I help with anything? I am getting tired of having to restart my Hypervisor every 6 days due to Gluster using all the Memory....
Hi, this is fixed in 3.12.13 already, with the patch submitted. And also the fix was available in 4.1 series since sometime
OK, Thanks, It seems OVIRT is behind and does not have gluster 3.12.13. I will write there.