Description of problem: OOM kill of ganesha process while creation and removal of files. Version-Release number of selected component (if applicable): [root@dhcp46-247 ~]# rpm -qa|grep glusterfs glusterfs-debuginfo-3.7.9-3.el7rhgs.x86_64 glusterfs-api-3.7.9-5.el7rhgs.x86_64 glusterfs-ganesha-3.7.9-5.el7rhgs.x86_64 glusterfs-libs-3.7.9-5.el7rhgs.x86_64 glusterfs-client-xlators-3.7.9-5.el7rhgs.x86_64 glusterfs-fuse-3.7.9-5.el7rhgs.x86_64 glusterfs-server-3.7.9-5.el7rhgs.x86_64 glusterfs-geo-replication-3.7.9-5.el7rhgs.x86_64 glusterfs-3.7.9-5.el7rhgs.x86_64 glusterfs-cli-3.7.9-5.el7rhgs.x86_64 glusterfs-rdma-3.7.9-5.el7rhgs.x86_64 [root@dhcp46-247 ~]# rpm -qa|grep ganesha glusterfs-ganesha-3.7.9-5.el7rhgs.x86_64 nfs-ganesha-2.3.1-7.el7rhgs.x86_64 nfs-ganesha-gluster-2.3.1-7.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1.Create a 4 node ganesha gluster. 2.Create several volumes and export it using ganesha. 3.Mount one of the volume on 2 clients with vers=4 4.Start creation and removal of files from the 2 mount points and observe that the ganesha process memory consumption keeps increasing as we perform more and more IO's. 5.At some point of time, when oom_score of ganesha process on the mounted node reached 601, observed oom kill of ganesha process. Actual results: OOM kill of nfs-ganesha process, when oom_score reaches ~600 Expected results: There should not be any oom_kill. Additional info:
dmesg from the node where oom kill is observed: [4326841.284855] lrmd invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0 [4326841.284863] lrmd cpuset=/ mems_allowed=0 [4326841.284867] CPU: 2 PID: 5582 Comm: lrmd Not tainted 3.10.0-327.el7.x86_64 #1 [4326841.284868] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [4326841.284871] ffff880210c15080 00000000e02db7a9 ffff8800bbdefb68 ffffffff816351f1 [4326841.284875] ffff8800bbdefbf8 ffffffff81630191 ffff880213a520c0 ffff880213a520d8 [4326841.284878] 092834ff00000206 fffeefff00000000 000000000000000f ffffffff81128803 [4326841.284880] Call Trace: [4326841.284892] [<ffffffff816351f1>] dump_stack+0x19/0x1b [4326841.284896] [<ffffffff81630191>] dump_header+0x8e/0x214 [4326841.284901] [<ffffffff81128803>] ? delayacct_end+0x63/0xb0 [4326841.284906] [<ffffffff8116cdee>] oom_kill_process+0x24e/0x3b0 [4326841.284909] [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0 [4326841.284913] [<ffffffff811737f5>] __alloc_pages_nodemask+0xa95/0xb90 [4326841.284918] [<ffffffff81078d73>] copy_process.part.25+0x163/0x1610 [4326841.284921] [<ffffffff8107a401>] do_fork+0xe1/0x320 [4326841.284927] [<ffffffff811e3b5e>] ? SYSC_newstat+0x3e/0x60 [4326841.284929] [<ffffffff8107a6c6>] SyS_clone+0x16/0x20 [4326841.284934] [<ffffffff81645c59>] stub_clone+0x69/0x90 [4326841.284937] [<ffffffff81645909>] ? system_call_fastpath+0x16/0x1b [4326841.284938] Mem-Info: [4326841.284940] Node 0 DMA per-cpu: [4326841.284942] CPU 0: hi: 0, btch: 1 usd: 0 [4326841.284944] CPU 1: hi: 0, btch: 1 usd: 0 [4326841.284945] CPU 2: hi: 0, btch: 1 usd: 0 [4326841.284946] CPU 3: hi: 0, btch: 1 usd: 0 [4326841.284947] Node 0 DMA32 per-cpu: [4326841.284949] CPU 0: hi: 186, btch: 31 usd: 0 [4326841.284950] CPU 1: hi: 186, btch: 31 usd: 0 [4326841.284951] CPU 2: hi: 186, btch: 31 usd: 0 [4326841.284952] CPU 3: hi: 186, btch: 31 usd: 0 [4326841.284953] Node 0 Normal per-cpu: [4326841.284954] CPU 0: hi: 186, btch: 31 usd: 0 [4326841.284955] CPU 1: hi: 186, btch: 31 usd: 0 [4326841.284956] CPU 2: hi: 186, btch: 31 usd: 0 [4326841.284957] CPU 3: hi: 186, btch: 31 usd: 0 [4326841.284961] active_anon:1404313 inactive_anon:331748 isolated_anon:0 active_file:71 inactive_file:336 isolated_file:0 unevictable:13186 dirty:8 writeback:159 unstable:0 free:32924 slab_reclaimable:121102 slab_unreclaimable:65420 mapped:10154 shmem:53080 pagetables:7814 bounce:0 free_cma:0 [4326841.284965] Node 0 DMA free:15852kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15936kB managed:15852kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [4326841.284971] lowmem_reserve[]: 0 3329 7807 7807 [4326841.284973] Node 0 DMA32 free:66904kB min:28752kB low:35940kB high:43128kB active_anon:2378220kB inactive_anon:679188kB active_file:84kB inactive_file:0kB unevictable:24428kB isolated(anon):0kB isolated(file):0kB present:3653620kB managed:3409252kB mlocked:24428kB dirty:32kB writeback:316kB mapped:36792kB shmem:207928kB slab_reclaimable:185100kB slab_unreclaimable:47356kB kernel_stack:4048kB pagetables:7948kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:13323 all_unreclaimable? yes [4326841.284977] lowmem_reserve[]: 0 0 4478 4478 [4326841.284979] Node 0 Normal free:48940kB min:38692kB low:48364kB high:58036kB active_anon:3239032kB inactive_anon:647804kB active_file:200kB inactive_file:1400kB unevictable:28316kB isolated(anon):0kB isolated(file):0kB present:4718592kB managed:4585788kB mlocked:28316kB dirty:0kB writeback:320kB mapped:3824kB shmem:4392kB slab_reclaimable:299308kB slab_unreclaimable:214324kB kernel_stack:7040kB pagetables:23308kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:45564 all_unreclaimable? yes [4326841.284983] lowmem_reserve[]: 0 0 0 0 [4326841.284985] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15852kB [4326841.284994] Node 0 DMA32: 11275*4kB (UEMR) 2291*8kB (UEMR) 3*16kB (R) 0*32kB 1*64kB (R) 2*128kB (R) 1*256kB (R) 2*512kB (R) 0*1024kB 1*2048kB (R) 0*4096kB = 67124kB [4326841.285019] Node 0 Normal: 12245*4kB (UEM) 5*8kB (E) 3*16kB (E) 1*32kB (E) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49100kB [4326841.285027] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [4326841.285029] 115076 total pagecache pages [4326841.285031] 59440 pages in swap cache [4326841.285032] Swap cache stats: add 10821688, delete 10762248, find 5997415/7172346 [4326841.285033] Free swap = 0kB [4326841.285034] Total swap = 3145724kB [4326841.285035] 2097037 pages RAM [4326841.285036] 0 pages HighMem/MovableOnly [4326841.285037] 94314 pages reserved [4326841.285038] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [4326841.285043] [ 663] 0 663 17497 5231 37 52 0 systemd-journal [4326841.285045] [ 688] 0 688 185885 326 44 597 0 lvmetad [4326841.285048] [ 702] 0 702 11532 281 23 531 -1000 systemd-udevd [4326841.285050] [ 932] 0 932 181339 13139 58 0 -1000 dmeventd [4326841.285052] [ 1215] 0 1215 29179 207 25 88 -1000 auditd [4326841.285054] [ 1238] 0 1238 6618 261 17 40 0 systemd-logind [4326841.285056] [ 1242] 0 1242 4807 219 13 39 0 irqbalance [4326841.285059] [ 1243] 81 1243 25153 386 19 42 -900 dbus-daemon [4326841.285061] [ 1244] 0 1244 50838 164 41 116 0 gssproxy [4326841.285070] [ 1245] 32 1245 16225 180 33 114 0 rpcbind [4326841.285072] [ 1253] 0 1253 81068 1358 79 3297 0 firewalld [4326841.285074] [ 1316] 0 1316 108686 462 64 291 0 NetworkManager [4326841.285076] [ 1508] 0 1508 13264 185 28 148 0 wpa_supplicant [4326841.285078] [ 1509] 997 1509 130884 350 53 1928 0 polkitd [4326841.285080] [ 1679] 0 1679 27627 307 55 3068 0 dhclient [4326841.285082] [ 1876] 0 1876 20636 235 42 199 -1000 sshd [4326841.285084] [ 1879] 0 1879 138266 755 90 2267 0 tuned [4326841.285086] [ 1882] 0 1882 28334 119 12 19 0 rhsmcertd [4326841.285087] [ 1999] 0 1999 22782 190 43 240 0 master [4326841.285089] [ 2001] 89 2001 22852 237 45 243 0 qmgr [4326841.285091] [ 2292] 0 2292 31581 228 19 132 0 crond [4326841.285093] [ 2307] 0 2307 26977 130 8 9 0 rhnsd [4326841.285095] [ 2318] 0 2318 22551 298 49 157 0 login [4326841.285097] [24723] 0 24723 28876 186 14 123 0 bash [4326841.285099] [ 2710] 0 2710 137654 1923 116 113 0 rsyslogd [4326841.285101] [ 4594] 0 4594 167021 1538 81 7568 0 glusterd [4326841.285103] [ 5563] 0 5563 50791 8337 64 949 0 corosync [4326841.285105] [ 5578] 0 5578 32622 329 66 848 0 pacemakerd [4326841.285107] [ 5580] 189 5580 33394 2157 69 1070 0 cib [4326841.285109] [ 5581] 0 5581 33407 671 66 624 0 stonithd [4326841.285110] [ 5582] 0 5582 25734 370 54 229 0 lrmd [4326841.285112] [ 5583] 189 5583 31194 1108 63 381 0 attrd [4326841.285114] [ 5584] 189 5584 28724 242 59 308 0 pengine [4326841.285116] [ 5585] 189 5585 35790 610 72 792 0 crmd [4326841.285117] [ 5805] 0 5805 28843 186 13 108 0 pcsd [4326841.285119] [ 5811] 0 5811 28809 174 15 49 0 bash [4326841.285121] [ 5812] 0 5812 244778 9393 101 1300 0 ruby [4326841.285123] [ 7780] 0 7780 11147 224 25 183 0 rpc.statd [4326841.285131] [ 7273] 0 7273 624556 90931 413 34724 0 glusterfsd [4326841.285133] [ 7292] 0 7292 657325 120256 479 51678 0 glusterfsd [4326841.285135] [ 7311] 0 7311 685253 86834 417 29850 0 glusterfsd [4326841.285137] [ 7454] 0 7454 309861 3037 107 5493 0 glusterfsd [4326841.285139] [ 7534] 0 7534 326504 2925 108 5575 0 glusterfsd [4326841.285141] [ 7618] 0 7618 326759 4830 110 4112 0 glusterfsd [4326841.285143] [ 7722] 0 7722 328809 5324 109 3346 0 glusterfsd [4326841.285144] [ 7812] 0 7812 311912 4769 108 4191 0 glusterfsd [4326841.285146] [ 7971] 0 7971 328295 3988 109 5451 0 glusterfsd [4326841.285148] [ 8072] 0 8072 326246 2914 108 5586 0 glusterfsd [4326841.285150] [ 8092] 0 8092 557396 20320 195 25200 0 glusterfs [4326841.285152] [15968] 0 15968 156229 1761 117 33468 0 glusterfs [4326841.285154] [28337] 0 28337 2277022 1254348 3591 471903 0 ganesha.nfsd [4326841.285156] [27929] 89 27929 22808 339 45 140 0 pickup [4326841.285159] Out of memory: Kill process 28337 (ganesha.nfsd) score 601 or sacrifice child [4326841.287340] Killed process 28337 (ganesha.nfsd) total-vm:9108088kB, anon-rss:5017392kB, file-rss:0kB
sosreports and logs are placed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1338068
Created attachment 1160553 [details] valgrind_resuls_without_io.log
Created attachment 1160554 [details] valgrind_resuls_with_io_using_physical_ip.log
Since, the ganesha memory consumption never comes down (even if we remove the files), the chance of hitting this issue is very likely with continuous IO's. Moreover, we have a customer bug reported with the same problem (https://bugzilla.redhat.com/show_bug.cgi?id=1282669), So this is a must fix for 3.1.3 Raising a blocker flag.
Strangely when started the process under valgrind to check for memory leaks, we observed that the process memory usage never grows beyond even 3GB (unlike ~9GB observed when this issue is reported) when created and removed ~100000 files. One thing which we observed is that, for each rpc request sent to client nfs-ganesha memory increases (can be checked by issuing showmount cmd) by few (k)bytes. So looking at nfs_rpc_execute related leaks in valgrind_resuls_without_io.log (attached), ==831== 17 bytes in 1 blocks are possibly lost in loss record 66 of 758 ==831== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==831== by 0x6D37FE9: strdup (in /usr/lib64/libc-2.17.so) ==831== by 0x161647: get_gsh_client (in /usr/bin/ganesha.nfsd) ==831== by 0x1560E3: nfs_rpc_execute (in /usr/bin/ganesha.nfsd) ==831== by 0x157D79: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so) ==831== by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so) ==831== ==831== 192 bytes in 1 blocks are possibly lost in loss record 413 of 758 ==831== at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==831== by 0x1FDA19: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1FDED0: server_stats_nfs_done (in /usr/bin/ganesha.nfsd) ==831== by 0x156513: nfs_rpc_execute (in /usr/bin/ganesha.nfsd) ==831== by 0x157D79: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so) ==831== by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so) ==831== ==831== 208 bytes in 1 blocks are possibly lost in loss record 417 of 758 ==831== at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==831== by 0x1615EB: get_gsh_client (in /usr/bin/ganesha.nfsd) ==831== by 0x1560E3: nfs_rpc_execute (in /usr/bin/ganesha.nfsd) ==831== by 0x157D79: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so) ==831== by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so) ==831== 270 bytes in 40 blocks are definitely lost in loss record 427 of 758 ==831== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==831== by 0x6D37FE9: strdup (in /usr/lib64/libc-2.17.so) ==831== by 0x164103: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x202D10: foreach_gsh_export (in /usr/bin/ganesha.nfsd) ==831== by 0x16448B: mnt_Export (in /usr/bin/ganesha.nfsd) ==831== by 0x1564ED: nfs_rpc_execute (in /usr/bin/ganesha.nfsd) ==831== by 0x157D79: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so) ==831== by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so) ==831== 1,400 bytes in 5 blocks are possibly lost in loss record 559 of 758 ==831== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==831== by 0x686DC05: rbtx_init (in /usr/lib64/libntirpc.so.1.3.1) ==831== by 0x1E1F88: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1E38CF: nfs_dupreq_start (in /usr/bin/ganesha.nfsd) ==831== by 0x15613E: nfs_rpc_execute (in /usr/bin/ganesha.nfsd) ==831== by 0x157D79: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so) ==831== by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so) ==831== 1,560 bytes in 5 blocks are possibly lost in loss record 565 of 758 ==831== at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==831== by 0x1E1EF0: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1E38CF: nfs_dupreq_start (in /usr/bin/ganesha.nfsd) ==831== by 0x15613E: nfs_rpc_execute (in /usr/bin/ganesha.nfsd) ==831== by 0x157D79: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so) ==831== by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so) ==831== 5,080 bytes in 5 blocks are possibly lost in loss record 659 of 758 ==831== at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==831== by 0x1E1FED: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1E38CF: nfs_dupreq_start (in /usr/bin/ganesha.nfsd) ==831== by 0x15613E: nfs_rpc_execute (in /usr/bin/ganesha.nfsd) ==831== by 0x157D79: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so) ==831== by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so) From the above list, major chunk seems to be from "nfs_rpc_execute->nfs_dupreq_start". From the code looks like this code-path can be avoided by disabling drc. Shashank, Could you please test below scenarios 1) add below config parameter to '/etc/ganesha/ganesha.conf' NFS_Core_Param { ..... DRC_Disabled = true; ..... } 2) Restart nfs-ganesha service 3) Now execute the I/O (same as mentioned in bug description) 4) Verify the memory usage.
==831== 208 bytes in 1 blocks are possibly lost in loss record 417 of 758 ==831== at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==831== by 0x1615EB: get_gsh_client (in /usr/bin/ganesha.nfsd) ==831== by 0x1560E3: nfs_rpc_execute (in /usr/bin/ganesha.nfsd) ==831== by 0x157D79: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x1F5658: ??? (in /usr/bin/ganesha.nfsd) ==831== by 0x643EDC4: start_thread (in /usr/lib64/libpthread-2.17.so) ==831== by 0x6DA81CC: clone (in /usr/lib64/libc-2.17.so) While looking at nfs_rpc_execute->get_gsh_client() path, observed that after executing rpc req, the code calls put_gsh_client() which doesn't free the memory if refcnt is '0'. Attaching a pseudo fix (mem_leak_fix1.patch). But since it is a very intricate fix, need thorough testing.
Created attachment 1160982 [details] mem_leak_fix1.patch
Soumya, Why can't we apply this patch and give a private build to QE to test whether this leak has caused this OOM kill? ~Atin
Resetting needinfo on Shashank to test with the above mentioned option configured. Thanks!
Verified this by setting DRC_Disabled = true in ganesha.conf and below are the observations: >>>>> tested with 200000 file creation and removal from 2 mount points >> When DRC_Disabled = true, then below is the memory consumption of ganesha process [root@dhcp46-247 ~]# ps aux|grep ganesha root 8852 7.2 24.2 4182068 1944876 ? Ssl May26 44:38 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6289034494432444416 [root@dhcp46-247 ~]# cat /proc/8852/oom_score 169 top command output shows ganesha uses ~24% of memory >> When DRC_Disabled is not set in ganesha.conf, then [root@dhcp46-247 ~]# ps aux|grep ganesha root 18830 30.3 24.7 4124524 1984040 ? Ssl 18:16 44:40 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6288978071447076864 [root@dhcp46-247 ~]# cat /proc/18830/oom_score 172 top command output shows ganesha uses ~25 of memory >>>> Conclusion: There is no major difference in memory consumption of ganesha when DRC_Disabled is set to true and this will not rule out ganesha getting killed with higher/continuous workloads as seen with previous tests.
Removing the needinfo
Additional Updates: =================== >>>>>>>>>> With drc_disabled=true and 8 volumes exported through ganesha >>>> system in initial state and no IO's done so far: virtual memory usage by ganesha process on the mounted node is 3582164KiB (3.6G) and oom_score is 47 [root@dhcp46-247 ~]# ps aux|grep ganesha root 19579 0.3 6.7 3582164 538760 ? Ssl 14:40 0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6288922412965888000 [root@dhcp46-247 ~]# cat /proc/19579/oom_score 47 >>>> After creating 200000 files from 2 mount points and removing those files: virtual memory usage by ganesha process on the mounted node is 4182068KiB (4.2G) and oom_score is 169 [root@dhcp46-247 ~]# ps aux|grep ganesha root 8852 7.2 24.2 4182068 1944876 ? Ssl May26 44:38 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6289034494432444416 [root@dhcp46-247 ~]# cat /proc/8852/oom_score 169 *********************************************************************************************************** >>>>>>>>>>> With drc disabled false and 8 volumes exported: >>>> system in initial state and no IO's done so far: virtual memory usage by ganesha process on the mounted node is 3573828KiB (3.6G) and oom_score is 47 [root@dhcp46-247 ~]# ps aux|grep ganesha root 18830 7.4 6.7 3573828 540212 ? Ssl 18:16 0:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6288978071447076864 [root@dhcp46-247 ~]# cat /proc/18830/oom_score 47 >>>> After creating 200000 files from 2 mount points and removing those files: virtual memory usage by ganesha process on the mounted node is 4124524KiB (4.2G) and oom_score is 172 [root@dhcp46-247 ~]# ps aux|grep ganesha root 18830 30.3 24.7 4124524 1984040 ? Ssl 18:16 44:40 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -E 6288978071447076864 [root@dhcp46-247 ~]# cat /proc/18830/oom_score 172
Conclusion: =========== 1. There is no major difference in memory consumption, with DRC_DISABLED=True 2. For every 200000 files being written and removed, ganesha process consumes around ~18-20% of total memory and oom_score increases by ~120-125 => No IO, Initial Memory 3.6G consumed. After Creating 2L files and removing them the memory consumption becomes 4.2G. => We observed the OOM when we created 10L (100KB) files and removed on 8G Systems. 3. oom_score of ganesha process was ~600, when OOM kill issue was seen in our environment. 4. The customer environment has ~800 oom_score for ganesha process, when they saw this issue
Post removal of file, if I check statedumps of the ganesha/gfapi process, though the number of upcall entry allocations are high, I see that there is a decline in the number of outstanding allocations (can be confirmed from the statedumps copied to 10.70.43.208://var/run/gluster/statedump/ ) I suspect below seem to be the cause of high memory usage during remove- *For each file removed, each brick process will send cache-invalidation upcall request. So for ~1 million files, since it 2*2 volume, there will be ~2 million requests sent. *Each cache-invalidation request sent is stored in a list whose entry is dynamically allocated with the mempool type "glfs_mt_upcall_entry_t". And the entries in the list will be freed only when poll'ed from the application/NFS-Ganesha * NFS-Ganesha polls for each upcall event with a gap of 1 sec in between. So that would mean for ~2million upcall events to be processed/freed, it shall take around ~2million seconds (~20 days) to free up. Since the test was to do 'rm -rf *', it resulted in huge number of invalidation requests sent. If I try to delete small number of entries, I do not see increase in the memory usage. There is an effort going on to filter out duplicate upcall events generated from replica bricks. Still, even with that support, to accommodate large number of upcall events being sent, I have changed the gap between each poll from 1sec to 10us. I shall provide a test build with those changes. Kindly update the ganesha rpms and verify if it reduces the memory footprint.
Below link contains the test packages - https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11352530 Kindly test if it fixes the issue.
tested the latest rpms provided by Soumya and the leaks are still seen during removal of files: >>>> Initial setup: [root@dhcp43-208 ~]# cat /proc/1866/oom_score 7 >>>> After 200000 file creations: [root@dhcp43-208 ~]# cat /proc/1866/oom_score 23 >>>> After 200000 file removal: [root@dhcp43-208 ~]# cat /proc/1866/oom_score 179
There is another issue found with upcall processing in gfapi. The entries are not being removed under mutex lock which may result in list corruption and memory leak. This is being addressed as part of bug1358608. When re-ran the tests with the below fixes - http://review.gluster.org/#/c/14532/ https://review.gerrithub.io/284684 http://review.gluster.org/14972 observed that there are no upcall entries leak post removal of files (statedump: /var/run/gluster/glusterdump.3142.dump.1469080876 - 10.70.43.208). However, there is still increase in oom_score while removing the files. Shashank is continuing the testing to check what operations (suspects LOOKUP) or entries lead to increase in memory usage.
After the above fixes are applied, the oom score was * ~21 (post creation of 100,000 files) [statedump - glusterdump.10831.dump.1469084578 ] * ~49 (post removal of those files) [ statedump - glusterdump.10831.dump.1469086623] From comment#28, I am assuming this is the same case (/oom_score observed) even with cache-invalidation off. Shashank , could you please confirm that? When compared these two statedumps, the major difference I see is the inode count. ==post creation== pool-name=gfapi:inode_t hot-count=24998 cold-count=106074 ==post removal== pool-name=gfapi:inode_t hot-count=99979 cold-count=31093 And the inode_ctx memory usage is high in the second statedump. We have cache-inode layer limit set to 25,000 in the ganesha.conf. We can see that reflected during the creation time but not during removal.
In comment 8, the IO's were done for 1000000 files while in comment 33 its for 100000 files which is 1/10 of what i tried. Probably i can give a try for the same set (1000000 files) with the updated fixes and confirm the results.
i meant comment 28 rather than 8
Thanks Shashank. I suspect a refleak in glfs_h_handle() used in upcall processing. I am building gluster RPMs with the fix. We can re-test with that build.
The fix mentioned above is submitted for review: - http://review.gluster.org/14984 Update from Shashank on the latest build : > oom_score starts from 7 > after file creation; oom_score 22 > -rw-------. 1 root root 58337 Jul 22 05:03 glusterdump.25850.dump.1469178206 > after removal 33 > -rw-------. 1 root root 58316 Jul 22 05:42 glusterdump.25850.dump.1469180577 I do not see active inodes anymore in the latest statedump. In fact I see most of the memory freed in statedump-1469180577 compared to statedump-1469178206. So probably the leak is not any more with gfAPI layer. Also have requested Shashank to update the test results with cache-invalidation feature off as well.
Ran the test with the latest build for 100000 files and below are the results: >>> With cache-invalidation off (v4 mount): oom_score starts from 7 after file creation; oom_score 21 -rw-------. 1 root root 58337 Jul 22 06:22 glusterdump.11894.dump.1469182928 after removal; oom_score 23 -rw-------. 1 root root 57373 Jul 22 06:47 glusterdump.11894.dump.1469184431 >>> With cache-invalidation on (v4 mount): oom_score starts from 11 after file creation; oom_score 21 -rw-------. 1 root root 58337 Jul 22 07:25 glusterdump.6267.dump.1469186747 after removal; oom_score 26 -rw-------. 1 root root 57404 Jul 22 08:07 glusterdump.6267.dump.1469189235 >>> With cache-invalidation on (v3 mount): oom_score starts from 7 after file creation; oom_score 25 -rw-------. 1 root root 2384348 Jul 22 08:42 glusterdump.7025.dump.1469191375 after removal; oom_score 26 -rw-------. 1 root root 59331 Jul 22 08:50 glusterdump.7025.dump.1469191806
*** Bug 1282669 has been marked as a duplicate of this bug. ***
Tried creating/removing ~200000 files. Issue is not seen in the latest build. Verified the fix in, glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64
Edited the doc text for the errata.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2017-0493.html