Bug 1593826 - Glusterfs being killed due to out of memory situation
Summary: Glusterfs being killed due to out of memory situation
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: 3.12
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
: 1615443 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-21 15:58 UTC by Edward
Modified: 2018-08-28 15:44 UTC (History)
16 users (show)

Fixed In Version: glusterfs-3.12.13
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-27 12:06:25 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
state dump showing large memory use (365.95 KB, text/plain)
2018-07-09 19:35 UTC, Darrell
no flags Details
statedump series 1 of 4 (122.34 KB, text/plain)
2018-07-13 03:21 UTC, Darrell
no flags Details
statedump series 2 of 4 (111.65 KB, text/plain)
2018-07-13 03:22 UTC, Darrell
no flags Details
statedump series 3 of 4 (100.63 KB, text/plain)
2018-07-13 03:22 UTC, Darrell
no flags Details
statedump series 4 of 4 (58.03 KB, text/plain)
2018-07-13 03:22 UTC, Darrell
no flags Details
1st state dump with low memory consumption (94.93 KB, text/plain)
2018-07-16 15:39 UTC, Edward
no flags Details
2nd state dump with large memory consumption (126.36 KB, text/plain)
2018-07-16 15:40 UTC, Edward
no flags Details

Description Edward 2018-06-21 15:58:31 UTC
Description of problem:
Our host are configured to access/use a glusterfs volume.  Periodically the glusterfs service is oom killed due to lack of available ram.  It seems like this was a problem fixed in glusterfs 3.13 which is no longer available.  The version of gluster that ovirt 4.2.3 uses is 3.12.

Version-Release number of selected component (if applicable):
Ovirt 4.2.3.1-1
glusterfs 3.12.9-1

How reproducible:
Leave host running and over time it will run out of ram.  We haven't found any steps to follow to force this condition to happen.


Actual results:

# grep glusterfs messages* |grep -i kill
messages:Jun 20 23:52:11 il-hv3 kernel: Out of memory: Kill process 11659 (glusterfs) score 516 or sacrifice child
messages:Jun 20 23:52:11 il-hv3 kernel: Killed process 11659 (glusterfs) total-vm:27910560kB, anon-rss:26193264kB, file-rss:964kB, shmem-rss:0kB
messages-20180527:May 27 01:05:28 il-hv3 kernel: Out of memory: Kill process 11806 (glusterfs) score 476 or sacrifice child
messages-20180527:May 27 01:05:28 il-hv3 kernel: Killed process 11806 (glusterfs) total-vm:25821604kB, anon-rss:23623264kB, file-rss:964kB, shmem-rss:0kB

Expected results:
No killing of glusterfs

Additional info:

Comment 1 Edward 2018-06-21 16:12:03 UTC
I just noticed that one of our Host was showing 70% memory consumption in the web interfac.  When I moved the 4 vms from it it's still showing 40% memory consumption.  Looking at top for memory consumption it's showing 45% of the systems memory is consumed by glusterfs.

 11958 45.2   21.7g  21.3g     92   21.6g   3916    0    0   0.0 glusterfs                                                                                                                                       

In the ovirt web app a event was showing that powermanagement failed on this host as well.  After putting the host in maintenance mode and then activating it the glusterfs service is no longer consuming 45% of ram.  ps results show 0.1%.

root     25893  0.1  0.0 449104 14200 ?        S<sl 10:06   0:00 /usr/sbin/glusterfs --volfile-server=10.4.16.11 --volfile-server=10.4.16.12 --volfile-server=10.4.16.19 --volfile-id=/EXPORTB /rhev/data-center/mnt/glusterSD/10.4.16.11:_EXPORTB
root     25984  0.1  0.0 440644 12408 ?        S<sl 10:06   0:00 /usr/sbin/glusterfs --volfile-server=10.4.16.11 --volfile-server=10.4.16.19 --volfile-id=gv1 /rhev/data-center/mnt/glusterSD/10.4.16.11:gv1

Comment 2 Sandro Bonazzola 2018-06-22 06:10:47 UTC
GlusterFS team, any chance the fix can be backported to 3.12 from 3.13?

Comment 3 Atin Mukherjee 2018-06-22 07:30:54 UTC
>  It seems like this was a problem fixed in glusterfs 3.13 which is no longer available

Do you have the patch or the bugzilla which has the fix in glusterfs 3.13?

Comment 4 Edward 2018-06-22 15:04:32 UTC
This is what one of my co-workers pointed me to that seemed to be related to our memory consumption issue.

https://bugzilla.redhat.com/show_bug.cgi?id=1496379

Comment 5 Sandro Bonazzola 2018-07-02 15:09:29 UTC
Edward answered the needinfo on me on comment #4

Comment 6 Darrell 2018-07-09 19:34:14 UTC
I've encountered this memory use issue as well, client side. It's the glusterfs process mounting gluster volumes that has consumed the memory, seems to be fuse mounts using it, not glfsapi mounts (it occurred on my Hosted Engine). Probably began appearing after upgrading to 3.12.9, and is also present in 3.12.11. I was able to downgrade my hosts and confirmed it's not present in 3.12.8 or 3.12.6. 

Attaching a glusterdump from a processes using a lot of ram before I restarted/downgraded. As I'm currently back on 3.12.6, I can't do a time differential set, but if you needed one I can update a node and provide in fairly short order.

Comment 7 Darrell 2018-07-09 19:35:05 UTC
Created attachment 1457569 [details]
state dump showing large memory use

Comment 8 Edward 2018-07-09 19:48:46 UTC
I just cleared a host with high memory consumption this morning.  I did create a dump of it after this and have 3 vms running and will need to wait till it consumes more memory to do another dump.  I will upload both once I have them.

Comment 9 Sahina Bose 2018-07-10 05:14:34 UTC
Hari, could you take a look?

Comment 10 hari gowtham 2018-07-10 06:07:11 UTC
Hi,

The above bug was backported to 3.12.2 https://bugzilla.redhat.com/show_bug.cgi?id=1497084 

This could be because of some other memory leak that would be existing in the gluster.The component that causes the leak has to be identified. As its seen in the mount the chances of glusterd beig the component is lesser.

Budic, thanks for the statedump.

Edward, The statedump in intervals will be a lot helpful to see if there is a potential leak and to check where it is.

Do update the statedumps will take a look into them.

Regards,
Hari.

Comment 11 hari gowtham 2018-07-10 07:47:26 UTC
The protocol clients seem to be consuming a lot of resource:

Client 0:
[protocol/client.gDBv1-client-0 - usage-type gf_common_mt_memdup memusage]           
size=151041876                                                                       
num_allocs=12586878                                                                  
max_size=151042352                                                              
max_num_allocs=12587210                                                         
total_allocs=159274593 

[protocol/client.gDBv1-client-0 - usage-type gf_common_mt_char memusage]        
size=311525526                                                                  
num_allocs=12586886                                                                  
max_size=311534067                                                              
max_num_allocs=12587217                                                         
total_allocs=226411352 

Client 2:
[protocol/client.gDBv1-client-2 - usage-type gf_common_mt_memdup memusage]      
size=151042232                                                                  
num_allocs=12586936                                                                                     
max_size=151042348                                                              
max_num_allocs=12587210                                                         
total_allocs=159380212

[protocol/client.gDBv1-client-2 - usage-type gf_common_mt_char memusage]        
size=311526832                                                                  
num_allocs=12586946                                                             
max_size=311533894                                                              
max_num_allocs=12587219                                                         
total_allocs=226444611 

and so on for the other clients too.
As seen its high with gf_common_mt_memdup and gf_common_mt_char 

To verify this, the statedumps at interval from Edward will be helpful.

It would be great if someone from protocol team can take a look.

Comment 12 Darrell 2018-07-13 03:20:20 UTC
Had a little time, so switched one of my ovirt hosted engine hosts back up to 3.12.9 and let it run for a 7 hours, glusterfs is up to 1G RS. Dumps will be attached, first one is right after rebooting and mounting the volume, 2nd is after moving the engine to it, 3 is ~1 hour later, 4th is 7 hours after moving the engine to it.

The hosted engine is the only VM on it, and is mounting it via fuse:
-drive file=/var/run/vdsm/storage/5a261b58-94a0-4783-a797-0cd457364de2/85d5a435-5176-4ffb-adad-0eb1a5ac1776/6615508f-946f-4656-8542-7c3823829d50,format=raw,if=none,id=drive-virtio-disk0

glusterfs-libs-3.12.9-1.el7.x86_64
centos-release-gluster312-1.0-1.el7.centos.noarch
glusterfs-api-3.12.9-1.el7.x86_64
libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64
glusterfs-client-xlators-3.12.9-1.el7.x86_64
glusterfs-cli-3.12.9-1.el7.x86_64
glusterfs-3.12.9-1.el7.x86_64
glusterfs-rdma-3.12.9-1.el7.x86_64
glusterfs-fuse-3.12.9-1.el7.x86_64
qemu-img-ev-2.10.0-21.el7_5.4.1.x86_64
qemu-kvm-common-ev-2.10.0-21.el7_5.4.1.x86_64
libvirt-daemon-driver-qemu-3.9.0-14.el7_5.6.x86_64
qemu-kvm-tools-ev-2.10.0-21.el7_5.4.1.x86_64
qemu-kvm-ev-2.10.0-21.el7_5.4.1.x86_64

Let me know if you need any other info, this is easily reproducible for me.

Comment 13 Darrell 2018-07-13 03:21:40 UTC
Created attachment 1458616 [details]
statedump series  1 of 4

Comment 14 Darrell 2018-07-13 03:22:02 UTC
Created attachment 1458617 [details]
statedump series 2 of 4

Comment 15 Darrell 2018-07-13 03:22:26 UTC
Created attachment 1458618 [details]
statedump series 3 of 4

Comment 16 Darrell 2018-07-13 03:22:53 UTC
Created attachment 1458619 [details]
statedump series 4 of 4

Comment 17 Darrell 2018-07-13 03:24:08 UTC
Whups, those wound up being uploaded in reverse order, 1 is the last one I took, 4 was the first.

Comment 18 Sahina Bose 2018-07-13 03:49:25 UTC
Amar, who can take a look at this?

Comment 19 Edward 2018-07-16 15:39:52 UTC
Created attachment 1459214 [details]
1st state dump with low memory consumption

Comment 20 Edward 2018-07-16 15:40:24 UTC
Created attachment 1459215 [details]
2nd state dump with large memory consumption

Comment 21 Edward 2018-07-16 15:42:12 UTC
I've uploaded two state dumps.  The servers memory consumption jumped to 92% from last tuesday to today.  I believe it was running around 30% initially with 3 vms.  After migrating the vms off of this host Memory consumption is still at 71%.

Comment 22 Alex Kaouris 2018-07-25 21:44:39 UTC
I am experiencing the same behavior. Glusterfs is on a weekly basis consuming all RAM of the servers and I need to put hosts to maintenance then activate to restore normal levels of RAM utilization.

Comment 23 Alex 2018-08-03 18:02:23 UTC
Same issue with 3.12.12 on with gluster's xenial PPA.
Clients used by k8s will, for a single mount, consume 5-6GB and eventually kill the machine.

Is this bug fixed since 3.12?
ie: If we upgrade to 4.1, would it resolve the issue?

Comment 24 hari gowtham 2018-08-07 06:01:33 UTC
The patch https://review.gluster.org/#/c/16706/3 created a memory leak in the protocol client and this was fixed as a part of the https://review.gluster.org/#/c/20437/ 

If you still see the issue persist, do let us know.

Comment 25 Dimitri Ars 2018-08-08 08:01:09 UTC
In glusterfs 4.1.2 (and also 3.12.9) we still see memory leaks on the glusterfs fuse client part, so for me (but I'm not the reporter) do not consider this solved. Though I have to say that I'm not sure we're talking about the same bug here, at least memory leak(s) are still there. We see this on a k8s cluster where the glusterfs fuse client for a jenkins pod, which does plenty of builds (which again has lots of files and dirs and does plenty of fs actions) will cause the fuse client proces to slowly keep on growing till the node starts hanging.

Comment 26 hari gowtham 2018-08-08 08:41:37 UTC
The fix I have mentioned above got it lately. I'm not sure if it got in 4.1.2(can check and update you once the gerrit is available) I think its only in master. It needs backporting to any of the newer branches. 

You can try with the above patch and then see if the memory is still leaking.
Or wait for the next release to happen with the patch and test on it. without the patch the memory leak will still be there.

The bug hasn't be closed as the fix hasn't been made available with any of the release.
Once the bug is being closed, we will be able to see in which version the fix is available.

Comment 27 Alex Kaouris 2018-08-08 10:43:51 UTC
Hi Hari, do we have any estimation when the patch will be included on 3.12 and up?
Thanx

Comment 28 Alex 2018-08-09 18:42:15 UTC
I tried to apt-get source the xenial 3.12.12 package and include the modifications in those 2 files but it wouldn't build, so I guess someone who understand the code base need to backport it indeed.

Hopefully this happen before the next LTS minor release.

Comment 29 hari gowtham 2018-08-10 05:59:36 UTC
Alex, 

I'm a bit held up, can you people wait till Monday/Tuesday?

We will back-port it to the next release.

Comment 30 Amar Tumballi 2018-08-11 19:39:20 UTC
@hari, I see that the leak is definitely present in 3.12.x bits, but I am not sure why the patch you suggested above (about memcpy) is the reason for this. I removed the lines added by the patch https://review.gluster.org/#/c/glusterfs/+/16706/. I reverted only that patch in the branch, and still see the leak.

Comment 31 Amar Tumballi 2018-08-11 19:47:46 UTC
Tried to check the leak using valgrind on a 3.12 branch latest master...

==31275== 2,291,328 (313,344 direct, 1,977,984 indirect) bytes in 1,224 blocks are definitely lost in loss record 524 of
 530
==31275==    at 0x4C2CB6B: malloc (vg_replace_malloc.c:299)
==31275==    by 0x4E888CB: mem_get (mem-pool.c:843)
==31275==    by 0x4E8897D: mem_get0 (mem-pool.c:741)
==31275==    by 0x12381B30: afr_fsync (afr-common.c:3607)
==31275==    by 0x1261201E: dht_fsync (dht-inode-read.c:998)
==31275==    by 0x12845F05: wb_fsync_helper (write-behind.c:1978)
==31275==    by 0x4E8499C: call_resume_keep_stub (call-stub.c:2582)
==31275==    by 0x1284B164: wb_do_winds (write-behind.c:1676)
==31275==    by 0x1284B27A: wb_process_queue (write-behind.c:1713)
==31275==    by 0x1284B336: wb_fulfill_cbk (write-behind.c:1058)
==31275==    by 0x12603DE5: dht_writev_cbk (dht-inode-write.c:119)
==31275==    by 0x1234DD20: afr_writev_unwind (afr-inode-write.c:246)
==31275== 
==31275== 5,263,980 (719,872 direct, 4,544,108 indirect) bytes in 2,812 blocks are definitely lost in loss record 529 of
 530
==31275==    at 0x4C2CB6B: malloc (vg_replace_malloc.c:299)
==31275==    by 0x4E888CB: mem_get (mem-pool.c:843)
==31275==    by 0x4E8897D: mem_get0 (mem-pool.c:741)
==31275==    by 0x1235B93F: afr_changelog_do (afr-transaction.c:1685)
==31275==    by 0x1235C9DC: afr_changelog_post_op_now (afr-transaction.c:992)
==31275==    by 0x123815C5: afr_fsync_cbk (afr-common.c:3569)
==31275==    by 0x120ECFE1: client3_3_fsync_cbk (client-rpc-fops.c:954)
==31275==    by 0x513CE7F: rpc_clnt_handle_reply (rpc-clnt.c:778)
==31275==    by 0x513D17A: rpc_clnt_notify (rpc-clnt.c:971)
==31275==    by 0x5139742: rpc_transport_notify (rpc-transport.c:538)
==31275==    by 0x1103AB01: socket_event_poll_in (socket.c:2315)
==31275==    by 0x1103CF72: socket_event_handler (socket.c:2467)



------

The script used to reproduce the leak is:

python << EOF
import os, time

with open("/mnt/fuse/tmpfile", "w") as fh:
  while True:
    os.lseek(fh.fileno(), 0, os.SEEK_SET)
    os.write(fh.fileno(), str(time.time()).encode("ascii"))
    os.fsync(fh)
EOF

Comment 32 hari gowtham 2018-08-13 05:49:59 UTC
(In reply to Amar Tumballi from comment #30)
> @hari, I see that the leak is definitely present in 3.12.x bits, but I am
> not sure why the patch you suggested above (about memcpy) is the reason for
> this. I removed the lines added by the patch
> https://review.gluster.org/#/c/glusterfs/+/16706/. I reverted only that
> patch in the branch, and still see the leak.

Hi Amar,

I came to know that the leak got in with the 3.12 version. Checked the patches that went it 3.12 after 3.8 in this section. while inspecting the patches the above patch does a memcpy for gfid which is a char.

It confirms the leak with
 
[protocol/client.gDBv1-client-0 - usage-type gf_common_mt_char memusage]        
size=311525526                                                                  
num_allocs=12586886                                                                  
max_size=311534067                                                              
max_num_allocs=12587217                                                         
total_allocs=226411352

This portion was fixed with https://review.gluster.org/#/c/20437/ 

The portion for 

[protocol/client.gDBv1-client-2 - usage-type gf_common_mt_memdup memusage]      
size=151042232                                                                  
num_allocs=12586936                                                                                     
max_size=151042348                                                              
max_num_allocs=12587210                                                         
total_allocs=159380212

is yet to be identified.

This might be the reason for leak still seen without the patch.

Comment 33 hari gowtham 2018-08-13 06:00:42 UTC
Sorry. Just noticed that memcpy doesn't do an malloc. The patch identified was wrong for RCA.

Comment 34 okane 2018-08-14 09:22:21 UTC
*** Bug 1615443 has been marked as a duplicate of this bug. ***

Comment 35 okane 2018-08-27 09:43:53 UTC
Anything new? Can I help with anything?  I am getting tired of having to restart my Hypervisor every 6 days due to Gluster using all the Memory....

Comment 36 Amar Tumballi 2018-08-27 12:06:25 UTC
Hi, this is fixed in 3.12.13 already, with the patch submitted. And also the fix was available in 4.1 series since sometime

Comment 37 okane 2018-08-27 12:15:43 UTC
OK, Thanks, It seems OVIRT is behind and does not have gluster 3.12.13.  I will write there.


Note You need to log in before you can comment on or make changes to this bug.