Description of problem: We export GlusterFS volume through NFS-Ganesha for TibCo EMS shared storage requirement. The requirement is ASYNC NFSv4 for its message queue data. It requires shared data since the implementation is having a standby node for DR/HA. Our implementation is in UAT environment where volume stress testing always recurs. And we have a lot of Volume stress testing going on in two weeks before it happens. The problem is, overtime, ganesha.nfsd continuously increase the memory usage. and in this incident, it was killed by OOM killer at OOM score is 912 when it reach to the point that there is no more memory to use in the system. See the kernel logs related to the incident as below: Nov 18 01:38:05 iahdvlgfsb001 kernel: CPU: 7 PID: 1661 Comm: nbdisco Kdump: loaded Not tainted 3.10.0-862.14.4.el7.x86_64 #1 Nov 18 01:38:05 iahdvlgfsb001 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 Nov 18 01:38:05 iahdvlgfsb001 kernel: Call Trace: Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f13754>] dump_stack+0x19/0x1b Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f0e91f>] dump_header+0x90/0x229 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa68fa982>] ? ktime_get_ts64+0x52/0xf0 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa695141f>] ? delayacct_end+0x8f/0xb0 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa699ac94>] oom_kill_process+0x254/0x3d0 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa699a73d>] ? oom_unkillable_task+0xcd/0x120 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa699a7e6>] ? find_lock_task_mm+0x56/0xc0 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa699b4d6>] out_of_memory+0x4b6/0x4f0 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f0f423>] __alloc_pages_slowpath+0x5d6/0x724 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69a18b5>] __alloc_pages_nodemask+0x405/0x420 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69ec058>] alloc_pages_current+0x98/0x110 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6997087>] __page_cache_alloc+0x97/0xb0 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6999788>] filemap_fault+0x298/0x490 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffc05d285f>] xfs_filemap_fault+0x5f/0xe0 [xfs] Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69c35ea>] __do_fault.isra.58+0x8a/0x100 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69c3b9c>] do_read_fault.isra.60+0x4c/0x1b0 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69c8544>] handle_pte_fault+0x2f4/0xd10 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69caefd>] handle_mm_fault+0x39d/0x9b0 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f20547>] __do_page_fault+0x197/0x4f0 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f208d5>] do_page_fault+0x35/0x90 Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f1c758>] page_fault+0x28/0x30 Nov 18 01:38:05 iahdvlgfsb001 kernel: Mem-Info: Nov 18 01:38:05 iahdvlgfsb001 kernel: active_anon:3502127 inactive_anon:432868 isolated_anon:0#012 active_file:641 inactive_file:2158 isolated_file:32#012 unevictable:15782 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:10165 slab_unreclaimable:14689#012 mapped:10992 shmem:17124 pagetables:13400 bounce:0#012 free:33797 free_pcp:1947 free_cma:0 Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Nov 18 01:38:05 iahdvlgfsb001 kernel: lowmem_reserve[]: 0 2815 15869 15869 Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 DMA32 free:64048kB min:11968kB low:14960kB high:17952kB active_anon:2203064kB inactive_anon:550840kB active_file:0kB inactive_file:2588kB unevictable:13316kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2883480kB mlocked:13316kB dirty:0kB writeback:0kB mapped:11860kB shmem:18416kB slab_reclaimable:7168kB slab_unreclaimable:9804kB kernel_stack:4160kB pagetables:10052kB unstable:0kB bounce:0kB free_pcp:2248kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:977 all_unreclaimable? yes Nov 18 01:38:05 iahdvlgfsb001 kernel: lowmem_reserve[]: 0 0 13054 13054 Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 Normal free:55248kB min:55548kB low:69432kB high:83320kB active_anon:11805444kB inactive_anon:1180632kB active_file:2876kB inactive_file:6044kB unevictable:49812kB isolated(anon):0kB isolated(file):128kB present:13631488kB managed:13367316kB mlocked:49812kB dirty:0kB writeback:0kB mapped:32108kB shmem:50080kB slab_reclaimable:33492kB slab_unreclaimable:48936kB kernel_stack:8560kB pagetables:43548kB unstable:0kB bounce:0kB free_pcp:5540kB local_pcp:628kB free_cma:0kB writeback_tmp:0kB pages_scanned:14381 all_unreclaimable? yes Nov 18 01:38:05 iahdvlgfsb001 kernel: lowmem_reserve[]: 0 0 0 0 Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 DMA32: 1656*4kB (UE) 1399*8kB (UE) 859*16kB (UE) 392*32kB (UEM) 209*64kB (UEM) 38*128kB (UEM) 5*256kB (UM) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 64136kB Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 Normal: 13778*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55112kB Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Nov 18 01:38:05 iahdvlgfsb001 kernel: 25296 total pagecache pages Nov 18 01:38:05 iahdvlgfsb001 kernel: 3426 pages in swap cache Nov 18 01:38:05 iahdvlgfsb001 kernel: Swap cache stats: add 2228357, delete 2224938, find 2455975/2474063 Nov 18 01:38:05 iahdvlgfsb001 kernel: Free swap = 0kB Nov 18 01:38:05 iahdvlgfsb001 kernel: Total swap = 4194300kB Nov 18 01:38:05 iahdvlgfsb001 kernel: 4194174 pages RAM Nov 18 01:38:05 iahdvlgfsb001 kernel: 0 pages HighMem/MovableOnly Nov 18 01:38:05 iahdvlgfsb001 kernel: 127498 pages reserved Nov 18 01:38:05 iahdvlgfsb001 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 601] 0 601 9876 1567 23 39 0 systemd-journal Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 625] 0 625 102881 174 35 173 0 lvmetad Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 647] 0 647 11114 218 24 169 -1000 systemd-udevd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 922] 0 922 13877 107 28 81 -1000 auditd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 947] 998 947 2144 148 9 29 0 lsmd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 949] 0 949 3304 130 13 28 0 rngd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 950] 0 950 5419 206 16 41 0 irqbalance Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 951] 0 951 74540 359 92 350 0 sssd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 953] 0 953 24914 182 42 403 0 VGAuthService Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 954] 0 954 56244 336 56 244 0 vmtoolsd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 955] 999 955 137704 304 62 1797 0 polkitd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 957] 0 957 55884 271 65 421 0 abrtd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 960] 0 960 55308 240 64 345 0 abrt-watch-log Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 965] 0 965 6108 169 16 143 0 smartd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 967] 996 967 29438 194 30 85 0 chronyd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 971] 81 971 19665 284 35 77 -900 dbus-daemon Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 976] 32 976 17305 185 37 109 0 rpcbind Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 980] 0 980 67767 133 45 658 0 gssproxy Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1001] 0 1001 119044 557 85 332 0 NetworkManager Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1009] 0 1009 30077 210 29 223 0 rpc.gssd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1012] 0 1012 132802 1794 189 1496 0 sssd_be Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1234] 0 1234 56451 632 66 975 0 snmpd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1235] 0 1235 28203 308 59 231 -1000 sshd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1236] 0 1236 13702 206 31 111 0 oddjobd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1240] 0 1240 143454 362 96 2638 0 tuned Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1248] 0 1248 200916 46164 157 1004 0 glusterd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1254] 0 1254 76829 742 100 252 0 sssd_nss Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1255] 0 1255 71355 366 91 246 0 sssd_pam Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1262] 0 1262 9168 247 21 36 0 systemd-logind Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1293] 29 1293 11145 235 26 176 0 rpc.statd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1294] 0 1294 66134 1351 53 184 0 rsyslogd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1520] 0 1520 22929 213 43 242 0 master Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1524] 89 1524 25541 341 48 236 0 qmgr Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1537] 0 1537 41213 218 27 242 0 pbx_exchange Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1594] 0 1594 44515 177 31 107 0 vnetd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1616] 0 1616 54700 228 50 232 0 bpcd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1647] 0 1647 199453 917 112 953 0 nbdisco Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1667] 0 1667 77801 90 36 222 0 mtstrmd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1697] 0 1697 6476 160 18 47 0 atd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1705] 0 1705 31570 202 18 130 0 crond Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1708] 0 1708 26991 108 9 8 0 rhnsd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1712] 0 1712 27522 171 10 31 0 agetty Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 2956] 0 2956 55681 1980 65 541 0 osad Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 5999] 0 5999 344870 1815 108 3218 0 glusterfsd Nov 18 01:38:05 iahdvlgfsb001 kernel: [12007] 0 12007 154731 202 66 2000 0 glusterfs Nov 18 01:38:05 iahdvlgfsb001 kernel: [12023] 0 12023 171115 1414 68 1896 0 glusterfs Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 5580] 0 5580 631625 45546 218 6402 0 glusterfsd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 5604] 0 5604 207103 2342 63 503 0 glusterfs Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9869] 0 9869 280052 13933 132 1528 0 pcsd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9949] 0 9949 48561 23998 70 0 0 corosync Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9956] 0 9956 26065 305 55 1271 0 pacemakerd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9957] 189 9957 26866 548 56 1592 0 cib Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9958] 0 9958 26953 513 54 549 0 stonithd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9959] 0 9959 24694 370 51 143 0 lrmd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9960] 189 9960 32325 594 62 607 0 attrd Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9961] 189 9961 20120 231 42 217 0 pengine Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9962] 189 9962 35282 459 72 949 0 crmd Nov 18 01:38:05 iahdvlgfsb001 kernel: [28730] 0 28730 6051631 3799789 9893 1002189 0 ganesha.nfsd Nov 18 01:38:05 iahdvlgfsb001 kernel: [18820] 89 18820 25496 549 46 0 0 pickup Nov 18 01:38:05 iahdvlgfsb001 kernel: [26295] 0 26295 42888 3933 87 0 0 pcs Nov 18 01:38:05 iahdvlgfsb001 kernel: [26300] 0 26300 28824 258 14 0 0 IPaddr2 Nov 18 01:38:05 iahdvlgfsb001 kernel: [26305] 0 26305 2920 226 11 0 0 tibco-async Nov 18 01:38:05 iahdvlgfsb001 kernel: [26307] 0 26307 2920 87 10 0 0 tibco-async Nov 18 01:38:05 iahdvlgfsb001 kernel: [26308] 0 26308 6744 59 17 0 0 systemctl Nov 18 01:38:05 iahdvlgfsb001 kernel: [26309] 0 26309 2272 142 10 0 0 grep Nov 18 01:38:05 iahdvlgfsb001 kernel: [26310] 0 26310 1092 80 8 0 0 wc Nov 18 01:38:05 iahdvlgfsb001 kernel: Out of memory: Kill process 28730 (ganesha.nfsd) score 912 or sacrifice child Nov 18 01:38:05 iahdvlgfsb001 kernel: Killed process 28730 (ganesha.nfsd) total-vm:24206524kB, anon-rss:15199156kB, file-rss:0kB, shmem-rss:0kB Version-Release number of selected component (if applicable): [root@iahdvlgfsb001 log]# rpm -qa | grep -iE "ganesha|gluster" centos-release-gluster41-1.0-1.el7.centos.x86_64 glusterfs-4.1.5-1.el7.x86_64 glusterfs-fuse-4.1.5-1.el7.x86_64 nfs-ganesha-gluster-2.6.3-1.el7.x86_64 glusterfs-client-xlators-4.1.5-1.el7.x86_64 glusterfs-api-4.1.5-1.el7.x86_64 nfs-ganesha-2.6.3-1.el7.x86_64 glusterfs-cli-4.1.5-1.el7.x86_64 glusterfs-server-4.1.5-1.el7.x86_64 glusterfs-libs-4.1.5-1.el7.x86_64 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: See the export configuration we have below: EXPORT{ Export_Id= 3 ; Path = "/tibUAT"; FSAL { name = GLUSTER; hostname="localhost"; volume="tibUAT"; } CLIENT { Clients = 127.0.0.1,10.1.25.5,10.1.25.7,10.1.25.8,10.1.25.9,10.1.25.10,10.1.25.74,10.1.25.75,10.1.25.177,10.1.25.178,10.1.25.76,10.1.25.254; Access_type = RW; } # Access_type = RW; # Disable_ACL = true; Disable_ACL = false; Squash="No_root_squash"; Pseudo="/tibUAT"; Protocols = "3", "4" ; Transports = "UDP","TCP"; SecType = "sys"; } Note: we have no special configuration for NFS_CORE_PARAM or set any thresholds. Meaning we run NFS_Ganesha with its default system configuration except other than the Export config. If it is just need to set some threshold of configuration change, please advise. On the other note: is min_free_kbytes set between 2 to 4% could help? On our current test environment, I set 4% (160Mb out of 4Gb) of memory to vm.min_free_kbytes. Since then, the memory usage% of ganesha.nfsd is maxing ~75%. See top to specific PID of ganesha.nfsd below: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6576 root 20 0 5109404 2.8g 3208 S 64.5 75.5 2802:53 ganesha.nfsd But there is a noticeable continuously growing in /proc/<pid>/oom_score. though it never reached to OOM kill ganesha but oom_score of ganesha continue growing. I got the idea of setting vm.min_free_kbytes from Veritas recommendation in below URL: https://sort.veritas.com/public/documents/isa/7.3/linux/productguides/html/vcs_access_admg/ch08s05.htm in RHEL virtual memory tuning it was mentioned not to set it to low to prevent OOM from killing multiple processes and setting it higher (5-10%) will results in the system spending too much time reclaiming memory. Refer to URL below: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-tunables I was trying to find a better memory tuning but never got found a concise one to suite our needs. Do you have one? Can you share with us? Was there a configuration need to be set in NFS-Ganesha configuration that will handle the flushing of memory used by ganesha daemon but will not harm the shared data?
I hope u have raised below issue as well. https://github.com/nfs-ganesha/nfs-ganesha/issues/377 Since u have mentioned about default configuration, the default value for cache inode layer in ganesha is 100000 which can consume upto 1-2GB(from my testing) plus u have turn on the mdcache settings for gluster volume which will again consume memory for its caching. If ur setup does not have much memory. I assume from the above command ur machine has 4GB RAM, I suggest u to decrease those values accordingly. Also please share the statedump of ganesha process in the begining , in the middle, in peak (before oom kill).
I can see alot of discussions happening on ganesha issue than here. Hence closing this since issue is tracker properly in https://github.com/nfs-ganesha/nfs-ganesha/issues/377