1338068 – OOM kill of ganesha process while creation and removal of files.

Bug 1338068 - OOM kill of ganesha process while creation and removal of files.

Summary: OOM kill of ganesha process while creation and removal of files.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Soumya Koduri
QA Contact:	Arthy Loganathan
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1282669 (view as bug list)
Depends On:	1339553 1358608 1361665 1362010
Blocks:	1311843 1337867 1351522
TreeView+	depends on / blocked

Reported:	2016-05-21 12:06 UTC by Shashank Raj
Modified:	2020-06-11 12:52 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, there were few memory leaks while creating and removing files on a volume exported via NFS-Ganesha server. Due to this, NFS-Ganesha server might have gotten OOM killed, depending on the system memory limits. With this fix, the memory leaks issue has been addressed and the server shall no longer become unavailable while creating/removing large number of files.
Clone Of:
Environment:
Last Closed:	2017-03-23 06:22:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
valgrind_resuls_without_io.log (128.56 KB, text/plain) 2016-05-23 11:58 UTC, Soumya Koduri	no flags	Details
valgrind_resuls_with_io_using_physical_ip.log (145.83 KB, text/plain) 2016-05-23 11:58 UTC, Soumya Koduri	no flags	Details
mem_leak_fix1.patch (2.10 KB, application/mbox) 2016-05-24 11:03 UTC, Soumya Koduri	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	2619401	0	None	None	None	2016-09-12 10:20:01 UTC
Red Hat Product Errata	RHEA-2017:0493	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.2.0 nfs-ganesha bug fix and enhancement update	2017-03-23 09:19:13 UTC

Description Shashank Raj 2016-05-21 12:06:06 UTC

Description of problem:

OOM kill of ganesha process while creation and removal of files.

Version-Release number of selected component (if applicable):

[root@dhcp46-247 ~]# rpm -qa|grep glusterfs
glusterfs-debuginfo-3.7.9-3.el7rhgs.x86_64
glusterfs-api-3.7.9-5.el7rhgs.x86_64
glusterfs-ganesha-3.7.9-5.el7rhgs.x86_64
glusterfs-libs-3.7.9-5.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-5.el7rhgs.x86_64
glusterfs-fuse-3.7.9-5.el7rhgs.x86_64
glusterfs-server-3.7.9-5.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-5.el7rhgs.x86_64
glusterfs-3.7.9-5.el7rhgs.x86_64
glusterfs-cli-3.7.9-5.el7rhgs.x86_64
glusterfs-rdma-3.7.9-5.el7rhgs.x86_64

[root@dhcp46-247 ~]# rpm -qa|grep ganesha
glusterfs-ganesha-3.7.9-5.el7rhgs.x86_64
nfs-ganesha-2.3.1-7.el7rhgs.x86_64
nfs-ganesha-gluster-2.3.1-7.el7rhgs.x86_64

How reproducible:

Always

Steps to Reproduce:

1.Create a 4 node ganesha gluster.
2.Create several volumes and export it using ganesha.
3.Mount one of the volume on 2 clients with vers=4
4.Start creation and removal of files from the 2 mount points and observe that the ganesha process memory consumption keeps increasing as we perform more and more IO's.
5.At some point of time, when oom_score of ganesha process on the mounted node reached 601, observed oom kill of ganesha process. 

Actual results:

OOM kill of nfs-ganesha process, when oom_score reaches ~600

Expected results:

There should not be any oom_kill.

Additional info:

Comment 2 Shashank Raj 2016-05-21 12:07:36 UTC

dmesg from the node where oom kill is observed:

[4326841.284855] lrmd invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
[4326841.284863] lrmd cpuset=/ mems_allowed=0
[4326841.284867] CPU: 2 PID: 5582 Comm: lrmd Not tainted 3.10.0-327.el7.x86_64 #1
[4326841.284868] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[4326841.284871]  ffff880210c15080 00000000e02db7a9 ffff8800bbdefb68 ffffffff816351f1
[4326841.284875]  ffff8800bbdefbf8 ffffffff81630191 ffff880213a520c0 ffff880213a520d8
[4326841.284878]  092834ff00000206 fffeefff00000000 000000000000000f ffffffff81128803
[4326841.284880] Call Trace:
[4326841.284892]  [<ffffffff816351f1>] dump_stack+0x19/0x1b
[4326841.284896]  [<ffffffff81630191>] dump_header+0x8e/0x214
[4326841.284901]  [<ffffffff81128803>] ? delayacct_end+0x63/0xb0
[4326841.284906]  [<ffffffff8116cdee>] oom_kill_process+0x24e/0x3b0
[4326841.284909]  [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
[4326841.284913]  [<ffffffff811737f5>] __alloc_pages_nodemask+0xa95/0xb90
[4326841.284918]  [<ffffffff81078d73>] copy_process.part.25+0x163/0x1610
[4326841.284921]  [<ffffffff8107a401>] do_fork+0xe1/0x320
[4326841.284927]  [<ffffffff811e3b5e>] ? SYSC_newstat+0x3e/0x60
[4326841.284929]  [<ffffffff8107a6c6>] SyS_clone+0x16/0x20
[4326841.284934]  [<ffffffff81645c59>] stub_clone+0x69/0x90
[4326841.284937]  [<ffffffff81645909>] ? system_call_fastpath+0x16/0x1b
[4326841.284938] Mem-Info:
[4326841.284940] Node 0 DMA per-cpu:
[4326841.284942] CPU    0: hi:    0, btch:   1 usd:   0
[4326841.284944] CPU    1: hi:    0, btch:   1 usd:   0
[4326841.284945] CPU    2: hi:    0, btch:   1 usd:   0
[4326841.284946] CPU    3: hi:    0, btch:   1 usd:   0
[4326841.284947] Node 0 DMA32 per-cpu:
[4326841.284949] CPU    0: hi:  186, btch:  31 usd:   0
[4326841.284950] CPU    1: hi:  186, btch:  31 usd:   0
[4326841.284951] CPU    2: hi:  186, btch:  31 usd:   0
[4326841.284952] CPU    3: hi:  186, btch:  31 usd:   0
[4326841.284953] Node 0 Normal per-cpu:
[4326841.284954] CPU    0: hi:  186, btch:  31 usd:   0
[4326841.284955] CPU    1: hi:  186, btch:  31 usd:   0
[4326841.284956] CPU    2: hi:  186, btch:  31 usd:   0
[4326841.284957] CPU    3: hi:  186, btch:  31 usd:   0
[4326841.284961] active_anon:1404313 inactive_anon:331748 isolated_anon:0
 active_file:71 inactive_file:336 isolated_file:0
 unevictable:13186 dirty:8 writeback:159 unstable:0
 free:32924 slab_reclaimable:121102 slab_unreclaimable:65420
 mapped:10154 shmem:53080 pagetables:7814 bounce:0
 free_cma:0
[4326841.284965] Node 0 DMA free:15852kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15936kB managed:15852kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[4326841.284971] lowmem_reserve[]: 0 3329 7807 7807
[4326841.284973] Node 0 DMA32 free:66904kB min:28752kB low:35940kB high:43128kB active_anon:2378220kB inactive_anon:679188kB active_file:84kB inactive_file:0kB unevictable:24428kB isolated(anon):0kB isolated(file):0kB present:3653620kB managed:3409252kB mlocked:24428kB dirty:32kB writeback:316kB mapped:36792kB shmem:207928kB slab_reclaimable:185100kB slab_unreclaimable:47356kB kernel_stack:4048kB pagetables:7948kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:13323 all_unreclaimable? yes
[4326841.284977] lowmem_reserve[]: 0 0 4478 4478
[4326841.284979] Node 0 Normal free:48940kB min:38692kB low:48364kB high:58036kB active_anon:3239032kB inactive_anon:647804kB active_file:200kB inactive_file:1400kB unevictable:28316kB isolated(anon):0kB isolated(file):0kB present:4718592kB managed:4585788kB mlocked:28316kB dirty:0kB writeback:320kB mapped:3824kB shmem:4392kB slab_reclaimable:299308kB slab_unreclaimable:214324kB kernel_stack:7040kB pagetables:23308kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:45564 all_unreclaimable? yes
[4326841.284983] lowmem_reserve[]: 0 0 0 0
[4326841.284985] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15852kB
[4326841.284994] Node 0 DMA32: 11275*4kB (UEMR) 2291*8kB (UEMR) 3*16kB (R) 0*32kB 1*64kB (R) 2*128kB (R) 1*256kB (R) 2*512kB (R) 0*1024kB 1*2048kB (R) 0*4096kB = 67124kB
[4326841.285019] Node 0 Normal: 12245*4kB (UEM) 5*8kB (E) 3*16kB (E) 1*32kB (E) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49100kB
[4326841.285027] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[4326841.285029] 115076 total pagecache pages
[4326841.285031] 59440 pages in swap cache
[4326841.285032] Swap cache stats: add 10821688, delete 10762248, find 5997415/7172346
[4326841.285033] Free swap  = 0kB
[4326841.285034] Total swap = 3145724kB
[4326841.285035] 2097037 pages RAM
[4326841.285036] 0 pages HighMem/MovableOnly
[4326841.285037] 94314 pages reserved
[4326841.285038] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[4326841.285043] [  663]     0   663    17497     5231      37       52             0 systemd-journal
[4326841.285045] [  688]     0   688   185885      326      44      597             0 lvmetad
[4326841.285048] [  702]     0   702    11532      281      23      531         -1000 systemd-udevd
[4326841.285050] [  932]     0   932   181339    13139      58        0         -1000 dmeventd
[4326841.285052] [ 1215]     0  1215    29179      207      25       88         -1000 auditd
[4326841.285054] [ 1238]     0  1238     6618      261      17       40             0 systemd-logind
[4326841.285056] [ 1242]     0  1242     4807      219      13       39             0 irqbalance
[4326841.285059] [ 1243]    81  1243    25153      386      19       42          -900 dbus-daemon
[4326841.285061] [ 1244]     0  1244    50838      164      41      116             0 gssproxy
[4326841.285070] [ 1245]    32  1245    16225      180      33      114             0 rpcbind
[4326841.285072] [ 1253]     0  1253    81068     1358      79     3297             0 firewalld
[4326841.285074] [ 1316]     0  1316   108686      462      64      291             0 NetworkManager
[4326841.285076] [ 1508]     0  1508    13264      185      28      148             0 wpa_supplicant
[4326841.285078] [ 1509]   997  1509   130884      350      53     1928             0 polkitd
[4326841.285080] [ 1679]     0  1679    27627      307      55     3068             0 dhclient
[4326841.285082] [ 1876]     0  1876    20636      235      42      199         -1000 sshd
[4326841.285084] [ 1879]     0  1879   138266      755      90     2267             0 tuned
[4326841.285086] [ 1882]     0  1882    28334      119      12       19             0 rhsmcertd
[4326841.285087] [ 1999]     0  1999    22782      190      43      240             0 master
[4326841.285089] [ 2001]    89  2001    22852      237      45      243             0 qmgr
[4326841.285091] [ 2292]     0  2292    31581      228      19      132             0 crond
[4326841.285093] [ 2307]     0  2307    26977      130       8        9             0 rhnsd
[4326841.285095] [ 2318]     0  2318    22551      298      49      157             0 login
[4326841.285097] [24723]     0 24723    28876      186      14      123             0 bash
[4326841.285099] [ 2710]     0  2710   137654     1923     116      113             0 rsyslogd
[4326841.285101] [ 4594]     0  4594   167021     1538      81     7568             0 glusterd
[4326841.285103] [ 5563]     0  5563    50791     8337      64      949             0 corosync
[4326841.285105] [ 5578]     0  5578    32622      329      66      848             0 pacemakerd
[4326841.285107] [ 5580]   189  5580    33394     2157      69     1070             0 cib
[4326841.285109] [ 5581]     0  5581    33407      671      66      624             0 stonithd
[4326841.285110] [ 5582]     0  5582    25734      370      54      229             0 lrmd
[4326841.285112] [ 5583]   189  5583    31194     1108      63      381             0 attrd
[4326841.285114] [ 5584]   189  5584    28724      242      59      308             0 pengine
[4326841.285116] [ 5585]   189  5585    35790      610      72      792             0 crmd
[4326841.285117] [ 5805]     0  5805    28843      186      13      108             0 pcsd
[4326841.285119] [ 5811]     0  5811    28809      174      15       49             0 bash
[4326841.285121] [ 5812]     0  5812   244778     9393     101     1300             0 ruby
[4326841.285123] [ 7780]     0  7780    11147      224      25      183             0 rpc.statd
[4326841.285131] [ 7273]     0  7273   624556    90931     413    34724             0 glusterfsd
[4326841.285133] [ 7292]     0  7292   657325   120256     479    51678             0 glusterfsd
[4326841.285135] [ 7311]     0  7311   685253    86834     417    29850             0 glusterfsd
[4326841.285137] [ 7454]     0  7454   309861     3037     107     5493             0 glusterfsd
[4326841.285139] [ 7534]     0  7534   326504     2925     108     5575             0 glusterfsd
[4326841.285141] [ 7618]     0  7618   326759     4830     110     4112             0 glusterfsd
[4326841.285143] [ 7722]     0  7722   328809     5324     109     3346             0 glusterfsd
[4326841.285144] [ 7812]     0  7812   311912     4769     108     4191             0 glusterfsd
[4326841.285146] [ 7971]     0  7971   328295     3988     109     5451             0 glusterfsd
[4326841.285148] [ 8072]     0  8072   326246     2914     108     5586             0 glusterfsd
[4326841.285150] [ 8092]     0  8092   557396    20320     195    25200             0 glusterfs
[4326841.285152] [15968]     0 15968   156229     1761     117    33468             0 glusterfs
[4326841.285154] [28337]     0 28337  2277022  1254348    3591   471903             0 ganesha.nfsd
[4326841.285156] [27929]    89 27929    22808      339      45      140             0 pickup
[4326841.285159] Out of memory: Kill process 28337 (ganesha.nfsd) score 601 or sacrifice child
[4326841.287340] Killed process 28337 (ganesha.nfsd) total-vm:9108088kB, anon-rss:5017392kB, file-rss:0kB

Comment 3 Shashank Raj 2016-05-21 12:15:25 UTC

sosreports and logs are placed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1338068

Comment 6 Soumya Koduri 2016-05-23 11:58:05 UTC

Created attachment 1160553 [details]
valgrind_resuls_without_io.log

Comment 7 Soumya Koduri 2016-05-23 11:58:41 UTC

Created attachment 1160554 [details]
valgrind_resuls_with_io_using_physical_ip.log

Comment 8 Shashank Raj 2016-05-24 07:43:30 UTC

Since, the ganesha memory consumption never comes down (even if we remove the files), the chance of hitting this issue is very likely with continuous IO's.

Moreover, we have a customer bug reported with the same problem (https://bugzilla.redhat.com/show_bug.cgi?id=1282669), So this is a must fix for 3.1.3

Raising a blocker flag.

Comment 9 Soumya Koduri 2016-05-24 10:59:00 UTC

Strangely when started the process under valgrind to check for memory leaks, we observed that the process memory usage never grows beyond even 3GB (unlike ~9GB observed when this issue is reported) when created and removed ~100000 files.

One thing which we observed is that, for each rpc request sent to client nfs-ganesha memory increases (can be checked by issuing showmount cmd) by few (k)bytes. 

So looking at nfs_rpc_execute related leaks in valgrind_resuls_without_io.log (attached), 

==831== 17 bytes in 1 blocks are possibly lost in loss record 66 of 758
==831==    at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==831==    by 0x6D37FE9: strdup (in /usr/lib64/libc-2.17.so)
==831==    by 0x161647: get_gsh_client (in /usr/bin/ganesha.nfsd)
==831==    by 0x1560E3: nfs_rpc_execute (in /usr/bin/ganesha.nfsd)
==831==    by 0x157D79: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==831==    by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so)
==831== 

==831== 192 bytes in 1 blocks are possibly lost in loss record 413 of 758
==831==    at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==831==    by 0x1FDA19: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1FDED0: server_stats_nfs_done (in /usr/bin/ganesha.nfsd)
==831==    by 0x156513: nfs_rpc_execute (in /usr/bin/ganesha.nfsd)
==831==    by 0x157D79: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==831==    by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so)
==831== 
==831== 208 bytes in 1 blocks are possibly lost in loss record 417 of 758
==831==    at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==831==    by 0x1615EB: get_gsh_client (in /usr/bin/ganesha.nfsd)
==831==    by 0x1560E3: nfs_rpc_execute (in /usr/bin/ganesha.nfsd)
==831==    by 0x157D79: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==831==    by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so)

==831== 270 bytes in 40 blocks are definitely lost in loss record 427 of 758
==831==    at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==831==    by 0x6D37FE9: strdup (in /usr/lib64/libc-2.17.so)
==831==    by 0x164103: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x202D10: foreach_gsh_export (in /usr/bin/ganesha.nfsd)
==831==    by 0x16448B: mnt_Export (in /usr/bin/ganesha.nfsd)
==831==    by 0x1564ED: nfs_rpc_execute (in /usr/bin/ganesha.nfsd)
==831==    by 0x157D79: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==831==    by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so)

==831== 1,400 bytes in 5 blocks are possibly lost in loss record 559 of 758
==831==    at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==831==    by 0x686DC05: rbtx_init (in /usr/lib64/libntirpc.so.1.3.1)
==831==    by 0x1E1F88: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1E38CF: nfs_dupreq_start (in /usr/bin/ganesha.nfsd)
==831==    by 0x15613E: nfs_rpc_execute (in /usr/bin/ganesha.nfsd)
==831==    by 0x157D79: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==831==    by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so)

==831== 1,560 bytes in 5 blocks are possibly lost in loss record 565 of 758
==831==    at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==831==    by 0x1E1EF0: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1E38CF: nfs_dupreq_start (in /usr/bin/ganesha.nfsd)
==831==    by 0x15613E: nfs_rpc_execute (in /usr/bin/ganesha.nfsd)
==831==    by 0x157D79: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==831==    by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so)


==831== 5,080 bytes in 5 blocks are possibly lost in loss record 659 of 758
==831==    at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==831==    by 0x1E1FED: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1E38CF: nfs_dupreq_start (in /usr/bin/ganesha.nfsd)
==831==    by 0x15613E: nfs_rpc_execute (in /usr/bin/ganesha.nfsd)
==831==    by 0x157D79: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==831==    by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so)


From the above list, major chunk seems to be from "nfs_rpc_execute->nfs_dupreq_start". From the code looks like this code-path can be avoided by disabling drc. 


Shashank,

Could you please test below scenarios
1) add below config parameter to '/etc/ganesha/ganesha.conf'

NFS_Core_Param {
.....
        DRC_Disabled = true;
.....
}
2) Restart nfs-ganesha service
3) Now execute the I/O (same as mentioned in bug description)
4) Verify the memory usage.

Comment 10 Soumya Koduri 2016-05-24 11:02:19 UTC

==831== 208 bytes in 1 blocks are possibly lost in loss record 417 of 758
==831==    at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==831==    by 0x1615EB: get_gsh_client (in /usr/bin/ganesha.nfsd)
==831==    by 0x1560E3: nfs_rpc_execute (in /usr/bin/ganesha.nfsd)
==831==    by 0x157D79: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd)
==831==    by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==831==    by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so)

While looking at nfs_rpc_execute->get_gsh_client() path, observed that after executing rpc req, the code calls put_gsh_client() which doesn't free the memory if refcnt is '0'. Attaching a pseudo fix (mem_leak_fix1.patch). But since it is a very intricate fix, need thorough testing.

Comment 11 Soumya Koduri 2016-05-24 11:03:21 UTC

Created attachment 1160982 [details]
mem_leak_fix1.patch

Comment 12 Atin Mukherjee 2016-05-25 06:04:17 UTC

Soumya,

Why can't we apply this patch and give a private build to QE to test whether this leak has caused this OOM kill?

~Atin

Comment 14 Soumya Koduri 2016-05-25 06:39:36 UTC

Resetting needinfo on Shashank to test with the above mentioned option configured. Thanks!

Comment 15 Shashank Raj 2016-05-27 03:14:05 UTC

Verified this by setting DRC_Disabled = true in ganesha.conf and below are the observations:

>>>>> tested with 200000 file creation and removal from 2 mount points

>> When DRC_Disabled = true, then below is the memory consumption of ganesha process

[root@dhcp46-247 ~]# ps aux|grep ganesha
root      8852  7.2 24.2 4182068 1944876 ?     Ssl  May26  44:38 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6289034494432444416

[root@dhcp46-247 ~]# cat /proc/8852/oom_score
169

top command output shows ganesha uses ~24% of memory

>> When DRC_Disabled is not set in ganesha.conf, then

[root@dhcp46-247 ~]# ps aux|grep ganesha
root     18830 30.3 24.7 4124524 1984040 ?     Ssl  18:16  44:40 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6288978071447076864

[root@dhcp46-247 ~]# cat /proc/18830/oom_score
172

top command output shows ganesha uses ~25 of memory

>>>> Conclusion:

There is no major difference in memory consumption of ganesha when DRC_Disabled is set to true and this will not rule out ganesha getting killed with higher/continuous workloads as seen with previous tests.

Comment 16 Shashank Raj 2016-05-27 03:15:04 UTC

Removing the needinfo

Comment 17 Rahul Hinduja 2016-05-27 12:26:40 UTC

Additional Updates:
===================

>>>>>>>>>> With drc_disabled=true and 8 volumes exported through ganesha

>>>> system in initial state and no IO's done so far:

virtual memory usage by ganesha process on the mounted node is 3582164KiB (3.6G) and oom_score is 47

[root@dhcp46-247 ~]# ps aux|grep ganesha
root     19579  0.3  6.7 3582164 538760 ?      Ssl  14:40   0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6288922412965888000

[root@dhcp46-247 ~]# cat /proc/19579/oom_score
47

>>>> After creating 200000 files from 2 mount points and removing those files:

virtual memory usage by ganesha process on the mounted node is 4182068KiB (4.2G) and oom_score is 169

[root@dhcp46-247 ~]# ps aux|grep ganesha
root      8852  7.2 24.2 4182068 1944876 ?     Ssl  May26  44:38 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6289034494432444416

[root@dhcp46-247 ~]# cat /proc/8852/oom_score
169

***********************************************************************************************************

>>>>>>>>>>> With drc disabled false and 8 volumes exported:

>>>> system in initial state and no IO's done so far:

virtual memory usage by ganesha process on the mounted node is 3573828KiB (3.6G) and oom_score is 47

[root@dhcp46-247 ~]# ps aux|grep ganesha
root     18830  7.4  6.7 3573828 540212 ?      Ssl  18:16   0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6288978071447076864

[root@dhcp46-247 ~]# cat /proc/18830/oom_score
47

>>>> After creating 200000 files from 2 mount points and removing those files:

virtual memory usage by ganesha process on the mounted node is 4124524KiB (4.2G) and oom_score is 172

[root@dhcp46-247 ~]# ps aux|grep ganesha
root     18830 30.3 24.7 4124524 1984040 ?     Ssl  18:16  44:40 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6288978071447076864

[root@dhcp46-247 ~]# cat /proc/18830/oom_score
172

Comment 18 Rahul Hinduja 2016-05-27 12:31:40 UTC

Conclusion:
===========

1. There is no major difference in memory consumption, with DRC_DISABLED=True

2. For every 200000 files being written and removed, ganesha process consumes around ~18-20% of total memory and oom_score increases by ~120-125

=> No IO, Initial Memory 3.6G consumed. After Creating 2L files and removing them the memory consumption becomes 4.2G. 
=> We observed the OOM when we created 10L (100KB) files and removed on 8G Systems.

3. oom_score of ganesha process was ~600, when OOM kill issue was seen in our environment. 

4. The customer environment has ~800 oom_score for ganesha process, when they saw this issue

Comment 29 Soumya Koduri 2016-07-14 12:29:10 UTC

Post removal of file, if I check statedumps of the ganesha/gfapi process, though the number of upcall entry allocations are high, I see that there is a decline in the number of outstanding allocations (can be confirmed from the statedumps copied to 10.70.43.208://var/run/gluster/statedump/ )

I suspect below seem to be the cause of high memory usage during remove-

*For each file removed, each brick process will send cache-invalidation upcall request. So for ~1 million files, since it 2*2 volume, there will be ~2 million requests sent.

*Each cache-invalidation request sent is stored in a list whose entry is dynamically allocated with the mempool type "glfs_mt_upcall_entry_t". And the entries in the list will be freed only when poll'ed from the application/NFS-Ganesha

* NFS-Ganesha polls for each upcall event with a gap of 1 sec in between. So that would mean for ~2million upcall events to be processed/freed, it shall take around ~2million seconds (~20 days) to free up.

Since the test was to do 'rm -rf *', it resulted in huge number of invalidation requests sent. If I try to delete small number of entries, I do not see increase in the memory usage.

There is an effort going on to filter out duplicate upcall events generated from replica bricks. Still, even with that support, to accommodate large number of upcall events being sent, I have changed the gap between each poll from 1sec to 10us. I shall provide a test build with those changes. Kindly update the ganesha rpms and verify if it reduces the memory footprint.

Comment 30 Soumya Koduri 2016-07-14 14:05:13 UTC

Below link contains the test packages -

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11352530


Kindly test if it fixes the issue.

Comment 31 Shashank Raj 2016-07-19 13:19:03 UTC

tested the latest rpms provided by Soumya and the leaks are still seen during removal of files:

>>>> Initial setup:

[root@dhcp43-208 ~]# cat /proc/1866/oom_score
7

>>>> After 200000 file creations:

[root@dhcp43-208 ~]# cat /proc/1866/oom_score
23

>>>> After 200000 file removal:

[root@dhcp43-208 ~]# cat /proc/1866/oom_score
179

Comment 32 Soumya Koduri 2016-07-22 07:03:35 UTC

There is another issue found with upcall processing in gfapi. The entries are not being removed under mutex lock which may result in list corruption and memory leak. This is being addressed as part of bug1358608. When re-ran the tests with the below fixes - 

http://review.gluster.org/#/c/14532/
https://review.gerrithub.io/284684
http://review.gluster.org/14972

observed that there are no upcall entries leak post removal of files (statedump: /var/run/gluster/glusterdump.3142.dump.1469080876 - 10.70.43.208).

However, there is still increase in oom_score while removing the files. Shashank is continuing the testing to check what operations (suspects LOOKUP) or entries lead to increase in memory usage.

Comment 33 Soumya Koduri 2016-07-22 07:23:13 UTC

After the above fixes are applied, the oom score was

* ~21 (post creation of 100,000 files) [statedump - glusterdump.10831.dump.1469084578 ]
* ~49 (post removal of those files) [ statedump - glusterdump.10831.dump.1469086623]

From comment#28, I am assuming this is the same case (/oom_score observed) even with cache-invalidation off. Shashank , could you please confirm that?

When compared these two statedumps, the major difference I see is the inode count.

==post creation==
pool-name=gfapi:inode_t                                                         
hot-count=24998
cold-count=106074
             
==post removal==
pool-name=gfapi:inode_t                                                         hot-count=99979                                                                   cold-count=31093  

And the inode_ctx memory usage is high in the second statedump.

We have cache-inode layer limit set to 25,000 in the ganesha.conf. We can see that reflected during the creation time but not during removal.

Comment 34 Shashank Raj 2016-07-22 07:33:22 UTC

In comment 8, the IO's were done for 1000000 files while in comment 33 its for 100000 files which is 1/10 of what i tried.

Probably i can give a try for the same set (1000000 files) with the updated fixes and confirm the results.

Comment 35 Shashank Raj 2016-07-22 07:35:00 UTC

i meant comment 28 rather than 8

Comment 36 Soumya Koduri 2016-07-22 07:52:56 UTC

Thanks Shashank. I suspect a refleak in glfs_h_handle() used in upcall processing. I am building gluster RPMs with the fix. We can re-test with that build.

Comment 37 Soumya Koduri 2016-07-22 09:55:32 UTC

The fix mentioned above is submitted for review:
  - http://review.gluster.org/14984

Update from Shashank on the latest build :

> oom_score starts from 7
> after file creation; oom_score 22
> -rw-------. 1 root root 58337 Jul 22 05:03 glusterdump.25850.dump.1469178206
> after removal 33
> -rw-------. 1 root root 58316 Jul 22 05:42 glusterdump.25850.dump.1469180577

I do not see active inodes anymore in the latest statedump. In fact I see most of the memory freed in statedump-1469180577 compared to statedump-1469178206.

So probably the leak is not any more with gfAPI layer. Also have requested Shashank to update the test results with cache-invalidation feature off as well.

Comment 38 Shashank Raj 2016-07-22 12:53:38 UTC

Ran the test with the latest build for 100000 files and below are the results:

>>> With cache-invalidation off (v4 mount):

oom_score starts from 7

after file creation; oom_score 21

-rw-------. 1 root root 58337 Jul 22 06:22 glusterdump.11894.dump.1469182928

after removal; oom_score 23

-rw-------. 1 root root 57373 Jul 22 06:47 glusterdump.11894.dump.1469184431


>>> With cache-invalidation on (v4 mount):

oom_score starts from 11

after file creation; oom_score 21

-rw-------. 1 root root 58337 Jul 22 07:25 glusterdump.6267.dump.1469186747

after removal; oom_score 26

-rw-------. 1 root root 57404 Jul 22 08:07 glusterdump.6267.dump.1469189235

>>> With cache-invalidation on (v3 mount):

oom_score starts from 7

after file creation; oom_score 25

-rw-------. 1 root root 2384348 Jul 22 08:42 glusterdump.7025.dump.1469191375

after removal; oom_score 26

-rw-------. 1 root root   59331 Jul 22 08:50 glusterdump.7025.dump.1469191806

Comment 39 Oonkwee Lim 2016-08-02 01:02:29 UTC

*** Bug 1282669 has been marked as a duplicate of this bug. ***

Comment 46 Arthy Loganathan 2016-11-17 05:45:55 UTC

Tried creating/removing ~200000 files. Issue is not seen in the latest build.

Verified the fix in,

glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64

Comment 47 Bhavana 2017-03-08 07:25:16 UTC

Edited the doc text for the errata.

Comment 49 errata-xmlrpc 2017-03-23 06:22:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0493.html

Note You need to log in before you can comment on or make changes to this bug.