Bug 1652408

Summary: Ganesha,nfsd consumed all available memory and was killed by OOM.
Product: [Retired] nfs-ganesha Reporter: Jaime Dulzura <jaime.dulzura>
Component: FSAL_GLUSTERAssignee: Jiffin <jthottan>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.6   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-06 09:11:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaime Dulzura 2018-11-22 02:07:38 UTC
Description of problem:

We export GlusterFS volume through NFS-Ganesha for TibCo EMS shared storage requirement. The requirement is ASYNC NFSv4 for its message queue data. It requires shared data since the implementation is having a standby node for DR/HA. Our implementation is in UAT environment where volume stress testing always recurs. And we have a lot of Volume stress testing going on in two weeks before it happens.

The problem is, overtime, ganesha.nfsd continuously increase the memory usage. and in this incident, it was killed by OOM killer at OOM score is 912 when it reach to the point that there is no more memory to use in the system.

See the kernel logs related to the incident as below:

Nov 18 01:38:05 iahdvlgfsb001 kernel: CPU: 7 PID: 1661 Comm: nbdisco Kdump: loaded Not tainted 3.10.0-862.14.4.el7.x86_64 #1
Nov 18 01:38:05 iahdvlgfsb001 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
Nov 18 01:38:05 iahdvlgfsb001 kernel: Call Trace:
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f13754>] dump_stack+0x19/0x1b
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f0e91f>] dump_header+0x90/0x229
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa68fa982>] ? ktime_get_ts64+0x52/0xf0
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa695141f>] ? delayacct_end+0x8f/0xb0
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa699ac94>] oom_kill_process+0x254/0x3d0
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa699a73d>] ? oom_unkillable_task+0xcd/0x120
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa699a7e6>] ? find_lock_task_mm+0x56/0xc0
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa699b4d6>] out_of_memory+0x4b6/0x4f0
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f0f423>] __alloc_pages_slowpath+0x5d6/0x724
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69a18b5>] __alloc_pages_nodemask+0x405/0x420
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69ec058>] alloc_pages_current+0x98/0x110
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6997087>] __page_cache_alloc+0x97/0xb0
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6999788>] filemap_fault+0x298/0x490
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffc05d285f>] xfs_filemap_fault+0x5f/0xe0 [xfs]
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69c35ea>] __do_fault.isra.58+0x8a/0x100
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69c3b9c>] do_read_fault.isra.60+0x4c/0x1b0
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69c8544>] handle_pte_fault+0x2f4/0xd10
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa69caefd>] handle_mm_fault+0x39d/0x9b0
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f20547>] __do_page_fault+0x197/0x4f0
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f208d5>] do_page_fault+0x35/0x90
Nov 18 01:38:05 iahdvlgfsb001 kernel: [<ffffffffa6f1c758>] page_fault+0x28/0x30
Nov 18 01:38:05 iahdvlgfsb001 kernel: Mem-Info:
Nov 18 01:38:05 iahdvlgfsb001 kernel: active_anon:3502127 inactive_anon:432868 isolated_anon:0#012 active_file:641 inactive_file:2158 isolated_file:32#012 unevictable:15782 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:10165 slab_unreclaimable:14689#012 mapped:10992 shmem:17124 pagetables:13400 bounce:0#012 free:33797 free_pcp:1947 free_cma:0
Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov 18 01:38:05 iahdvlgfsb001 kernel: lowmem_reserve[]: 0 2815 15869 15869
Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 DMA32 free:64048kB min:11968kB low:14960kB high:17952kB active_anon:2203064kB inactive_anon:550840kB active_file:0kB inactive_file:2588kB unevictable:13316kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2883480kB mlocked:13316kB dirty:0kB writeback:0kB mapped:11860kB shmem:18416kB slab_reclaimable:7168kB slab_unreclaimable:9804kB kernel_stack:4160kB pagetables:10052kB unstable:0kB bounce:0kB free_pcp:2248kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:977 all_unreclaimable? yes
Nov 18 01:38:05 iahdvlgfsb001 kernel: lowmem_reserve[]: 0 0 13054 13054
Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 Normal free:55248kB min:55548kB low:69432kB high:83320kB active_anon:11805444kB inactive_anon:1180632kB active_file:2876kB inactive_file:6044kB unevictable:49812kB isolated(anon):0kB isolated(file):128kB present:13631488kB managed:13367316kB mlocked:49812kB dirty:0kB writeback:0kB mapped:32108kB shmem:50080kB slab_reclaimable:33492kB slab_unreclaimable:48936kB kernel_stack:8560kB pagetables:43548kB unstable:0kB bounce:0kB free_pcp:5540kB local_pcp:628kB free_cma:0kB writeback_tmp:0kB pages_scanned:14381 all_unreclaimable? yes
Nov 18 01:38:05 iahdvlgfsb001 kernel: lowmem_reserve[]: 0 0 0 0
Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 DMA32: 1656*4kB (UE) 1399*8kB (UE) 859*16kB (UE) 392*32kB (UEM) 209*64kB (UEM) 38*128kB (UEM) 5*256kB (UM) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 64136kB
Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 Normal: 13778*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55112kB
Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Nov 18 01:38:05 iahdvlgfsb001 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov 18 01:38:05 iahdvlgfsb001 kernel: 25296 total pagecache pages
Nov 18 01:38:05 iahdvlgfsb001 kernel: 3426 pages in swap cache
Nov 18 01:38:05 iahdvlgfsb001 kernel: Swap cache stats: add 2228357, delete 2224938, find 2455975/2474063
Nov 18 01:38:05 iahdvlgfsb001 kernel: Free swap  = 0kB
Nov 18 01:38:05 iahdvlgfsb001 kernel: Total swap = 4194300kB
Nov 18 01:38:05 iahdvlgfsb001 kernel: 4194174 pages RAM
Nov 18 01:38:05 iahdvlgfsb001 kernel: 0 pages HighMem/MovableOnly
Nov 18 01:38:05 iahdvlgfsb001 kernel: 127498 pages reserved
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  601]     0   601     9876     1567      23       39             0 systemd-journal
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  625]     0   625   102881      174      35      173             0 lvmetad
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  647]     0   647    11114      218      24      169         -1000 systemd-udevd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  922]     0   922    13877      107      28       81         -1000 auditd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  947]   998   947     2144      148       9       29             0 lsmd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  949]     0   949     3304      130      13       28             0 rngd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  950]     0   950     5419      206      16       41             0 irqbalance
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  951]     0   951    74540      359      92      350             0 sssd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  953]     0   953    24914      182      42      403             0 VGAuthService
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  954]     0   954    56244      336      56      244             0 vmtoolsd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  955]   999   955   137704      304      62     1797             0 polkitd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  957]     0   957    55884      271      65      421             0 abrtd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  960]     0   960    55308      240      64      345             0 abrt-watch-log
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  965]     0   965     6108      169      16      143             0 smartd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  967]   996   967    29438      194      30       85             0 chronyd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  971]    81   971    19665      284      35       77          -900 dbus-daemon
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  976]    32   976    17305      185      37      109             0 rpcbind
Nov 18 01:38:05 iahdvlgfsb001 kernel: [  980]     0   980    67767      133      45      658             0 gssproxy
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1001]     0  1001   119044      557      85      332             0 NetworkManager
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1009]     0  1009    30077      210      29      223             0 rpc.gssd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1012]     0  1012   132802     1794     189     1496             0 sssd_be
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1234]     0  1234    56451      632      66      975             0 snmpd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1235]     0  1235    28203      308      59      231         -1000 sshd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1236]     0  1236    13702      206      31      111             0 oddjobd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1240]     0  1240   143454      362      96     2638             0 tuned
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1248]     0  1248   200916    46164     157     1004             0 glusterd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1254]     0  1254    76829      742     100      252             0 sssd_nss
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1255]     0  1255    71355      366      91      246             0 sssd_pam
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1262]     0  1262     9168      247      21       36             0 systemd-logind
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1293]    29  1293    11145      235      26      176             0 rpc.statd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1294]     0  1294    66134     1351      53      184             0 rsyslogd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1520]     0  1520    22929      213      43      242             0 master
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1524]    89  1524    25541      341      48      236             0 qmgr
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1537]     0  1537    41213      218      27      242             0 pbx_exchange
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1594]     0  1594    44515      177      31      107             0 vnetd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1616]     0  1616    54700      228      50      232             0 bpcd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1647]     0  1647   199453      917     112      953             0 nbdisco
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1667]     0  1667    77801       90      36      222             0 mtstrmd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1697]     0  1697     6476      160      18       47             0 atd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1705]     0  1705    31570      202      18      130             0 crond
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1708]     0  1708    26991      108       9        8             0 rhnsd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 1712]     0  1712    27522      171      10       31             0 agetty
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 2956]     0  2956    55681     1980      65      541             0 osad
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 5999]     0  5999   344870     1815     108     3218             0 glusterfsd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [12007]     0 12007   154731      202      66     2000             0 glusterfs
Nov 18 01:38:05 iahdvlgfsb001 kernel: [12023]     0 12023   171115     1414      68     1896             0 glusterfs
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 5580]     0  5580   631625    45546     218     6402             0 glusterfsd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 5604]     0  5604   207103     2342      63      503             0 glusterfs
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9869]     0  9869   280052    13933     132     1528             0 pcsd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9949]     0  9949    48561    23998      70        0             0 corosync
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9956]     0  9956    26065      305      55     1271             0 pacemakerd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9957]   189  9957    26866      548      56     1592             0 cib
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9958]     0  9958    26953      513      54      549             0 stonithd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9959]     0  9959    24694      370      51      143             0 lrmd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9960]   189  9960    32325      594      62      607             0 attrd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9961]   189  9961    20120      231      42      217             0 pengine
Nov 18 01:38:05 iahdvlgfsb001 kernel: [ 9962]   189  9962    35282      459      72      949             0 crmd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [28730]     0 28730  6051631  3799789    9893  1002189             0 ganesha.nfsd
Nov 18 01:38:05 iahdvlgfsb001 kernel: [18820]    89 18820    25496      549      46        0             0 pickup
Nov 18 01:38:05 iahdvlgfsb001 kernel: [26295]     0 26295    42888     3933      87        0             0 pcs
Nov 18 01:38:05 iahdvlgfsb001 kernel: [26300]     0 26300    28824      258      14        0             0 IPaddr2
Nov 18 01:38:05 iahdvlgfsb001 kernel: [26305]     0 26305     2920      226      11        0             0 tibco-async
Nov 18 01:38:05 iahdvlgfsb001 kernel: [26307]     0 26307     2920       87      10        0             0 tibco-async
Nov 18 01:38:05 iahdvlgfsb001 kernel: [26308]     0 26308     6744       59      17        0             0 systemctl
Nov 18 01:38:05 iahdvlgfsb001 kernel: [26309]     0 26309     2272      142      10        0             0 grep
Nov 18 01:38:05 iahdvlgfsb001 kernel: [26310]     0 26310     1092       80       8        0             0 wc
Nov 18 01:38:05 iahdvlgfsb001 kernel: Out of memory: Kill process 28730 (ganesha.nfsd) score 912 or sacrifice child
Nov 18 01:38:05 iahdvlgfsb001 kernel: Killed process 28730 (ganesha.nfsd) total-vm:24206524kB, anon-rss:15199156kB, file-rss:0kB, shmem-rss:0kB
	

Version-Release number of selected component (if applicable):

[root@iahdvlgfsb001 log]# rpm -qa | grep -iE "ganesha|gluster"
centos-release-gluster41-1.0-1.el7.centos.x86_64
glusterfs-4.1.5-1.el7.x86_64
glusterfs-fuse-4.1.5-1.el7.x86_64
nfs-ganesha-gluster-2.6.3-1.el7.x86_64
glusterfs-client-xlators-4.1.5-1.el7.x86_64
glusterfs-api-4.1.5-1.el7.x86_64
nfs-ganesha-2.6.3-1.el7.x86_64
glusterfs-cli-4.1.5-1.el7.x86_64
glusterfs-server-4.1.5-1.el7.x86_64
glusterfs-libs-4.1.5-1.el7.x86_64


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

See the export configuration we have below:

EXPORT{
      Export_Id= 3 ;
      Path = "/tibUAT";
      FSAL {
           name = GLUSTER;
           hostname="localhost";
          volume="tibUAT";
           }
      CLIENT {
           Clients =  127.0.0.1,10.1.25.5,10.1.25.7,10.1.25.8,10.1.25.9,10.1.25.10,10.1.25.74,10.1.25.75,10.1.25.177,10.1.25.178,10.1.25.76,10.1.25.254;
           Access_type = RW;
           }
      # Access_type = RW;
      # Disable_ACL = true;
      Disable_ACL = false;
      Squash="No_root_squash";
      Pseudo="/tibUAT";
      Protocols = "3", "4" ;
      Transports = "UDP","TCP";
      SecType = "sys";
     }

Note: we have no special configuration for NFS_CORE_PARAM or set any thresholds. Meaning we run NFS_Ganesha with its default system configuration except other than the Export config.

If it is just need to set some threshold of configuration change, please advise.

On the other note: is min_free_kbytes set between 2 to 4% could help? On our current test environment, I set 4% (160Mb out of 4Gb) of memory to vm.min_free_kbytes. Since then, the memory usage% of ganesha.nfsd is maxing ~75%. See top to specific PID of ganesha.nfsd below:

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 6576 root      20   0 5109404   2.8g   3208 S  64.5 75.5   2802:53 ganesha.nfsd


But there is a noticeable continuously growing in /proc/<pid>/oom_score. though it never reached to OOM kill ganesha but oom_score of ganesha continue growing.

I got the idea of setting vm.min_free_kbytes from Veritas recommendation in below URL:
https://sort.veritas.com/public/documents/isa/7.3/linux/productguides/html/vcs_access_admg/ch08s05.htm

in RHEL virtual memory tuning it was mentioned not to set it to low to prevent OOM from killing multiple processes and setting it higher (5-10%) will results in the system spending too much time reclaiming memory.

Refer to URL below:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-tunables

I was trying to find a better memory tuning but never got found a concise one to suite our needs. Do you have one? Can you share with us? Was there a configuration need to be set in NFS-Ganesha configuration that will handle the flushing of memory used by ganesha daemon but will not harm the shared data?

Comment 1 Jiffin 2018-12-04 04:34:59 UTC
I hope u have raised below issue as well.

https://github.com/nfs-ganesha/nfs-ganesha/issues/377

Since u have mentioned about default configuration, the default value for cache inode layer in ganesha is 100000 which can consume upto 1-2GB(from my testing)
plus u have turn on the mdcache settings for gluster volume which will again consume memory for its caching. If ur setup does not have much memory. I assume from the above command ur machine has 4GB RAM, I suggest u to decrease those values accordingly. Also please share the statedump of ganesha process in the begining , in the middle, in peak (before oom kill).

Comment 2 Jiffin 2018-12-06 09:11:19 UTC
I can see alot of discussions happening on ganesha issue than here. Hence closing this since issue is tracker properly in https://github.com/nfs-ganesha/nfs-ganesha/issues/377