Bug 2231151

Summary:	[perf] nfs-ganesha container OOM killed during perf testing
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	Elvir Kuric <ekuric>
Component:	ceph	Assignee:	Frank Filz <ffilz>
ceph sub component:	NFS-Ganesha	QA Contact:	Amrita Mahapatra <ammahapa>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	unspecified
Priority:	unspecified	CC:	asriram, bniver, branto, brgardne, ekuric, etamir, gouthamr, kkeithle, mbenjamin, muagarwa, nberry, nojha, odf-bz-bot, pcuzner, sheggodu, sostapov, tnielsen, vumrao
Version:	4.13
Target Milestone:	---
Target Release:	ODF 4.16.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	4.16.0-110	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:
Clones:	2295943 (view as bug list)		Environment:
Last Closed:	2024-08-19 07:41:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2245261
Bug Blocks:	2295943

Description Elvir Kuric 2023-08-10 18:24:45 UTC

Created attachment 1982841 [details]
dmesg with oom kill

Description of problem (please be detailed as possible and provide log
snippests):

[I cannot find cephNFS in list to put BZ in that section]

nfs-ganesha container is killed during performance testing. 
Test description:

One fio pod writes in --client mode to multiple pods which mounts PVC from "ocs-storagecluster-ceph-nfs" storageclass. 

fio --client <listclints> 

Short after starting test the pod rook-ceph-nfs-ocs-storagecluster-cephnfs-a-584c957ff-cdbmj will end in CrashLoopBackOff which is caused by OOM of nfs-ganesha container 
rook-ceph-nfs-ocs-storagecluster-cephnfs-a-584c957ff-cdbmj        1/2     CrashLoopBackOff   5 (67s ago)      2d9h

--- 
Thu Aug 10 15:03:57 2023] Tasks state (memory values in pages):
[Thu Aug 10 15:03:57 2023] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Thu Aug 10 15:03:57 2023] [ 124130]     0 124130 13217790  2062728 24743936        0          -997 ganesha.nfsd
[Thu Aug 10 15:03:57 2023] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-3d9ac230a9f61c6d2058c6de8d3d9bfae661b610057fd486d6d767e8bf9021b4.scope,mems_allowed=0-1,oom_memcg=/kubepods.slice/kubepods-podf30594d4_60a0_4447_97be_cad161008265.slice/crio-3d9ac230a9f61c6d2058c6de8d3d9bfae661b610057fd486d6d767e8bf9021b4.scope,task_memcg=/kubepods.slice/kubepods-podf30594d4_60a0_4447_97be_cad161008265.slice/crio-3d9ac230a9f61c6d2058c6de8d3d9bfae661b610057fd486d6d767e8bf9021b4.scope,task=ganesha.nfsd,pid=124130,uid=0
[Thu Aug 10 15:03:57 2023] Memory cgroup out of memory: Killed process 124130 (ganesha.nfsd) total-vm:52871160kB, anon-rss:8229748kB, file-rss:21164kB, shmem-rss:0kB, UID:0 pgtables:24164kB oom_score_adj:-997
--- 

We tested different setups, configurations up to 12 pods worked fine, for cases with more pods, the issue persistet. 
Possible workarond was to increase memory limits for nfs-ganesha in "rook-ceph-nfs-ocs-storagecluster-cephnfs-a" deployment and this helped to mitgate issue for 24 pods and 50 pods - in that cases we increased 
memory limits to 32 GB , 40 GB respectively. 


Version of all relevant components (if applicable):

ceph : ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable)

oc get storagecluster -n openshift-storage
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   24d   Ready              2023-07-17T09:34:59Z   4.13.1

OCP v4.13


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes, always. 

Can this issue reproduce from the UI?

NA
If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:
1. enable nfs on top of ODF 
2. create fio pod and direct it to write to pods with mounted pvc from nfs storage class. 
3. monitor "rook-ceph-nfs-ocs-storagecluster-cephnfs-a-" pod in openshift-namespace and once it crashes check logs on node where it scheduled ( dmesg -T )


Actual results:
rook-ceph-nfs-ocs-storagecluster-cephnfs-a- pod crash constanty due OOM 



Additional info:
Same test with ODF cephfs storage class does not show this issue

Comment 2 Blaine Gardner 2023-08-15 17:47:50 UTC

Short update: I'm loosely aware of and working to track down a known NFS-Ganesha memory footprint issue. Ideally, the fix could be made in RHCS. However, we have a short term option of raising the default allocations and/or updating ODF docs to reflect the client connection limitations if the NFS-Ganesha issue can't be fixed for 4.14.

Comment 3 Blaine Gardner 2023-08-21 17:49:15 UTC

Still trying to collect the right info to figure out what the right ODF recommendation is. 

If my interpretation of some non-ODF NFS-Ganesha performance testing is correct, 100 distinct PVCs with 3 Pods using each could cause Ganesha to use 23GB to 80GB of RAM. I don't think we want to set our default resource request to 80GB for CephNFS; that seems obscene. 

I've asked whether there are options that can be set for Ganesha to limit the cache size, but there is a possibility this is a memory leak issue in Ganesha.

From what I gather so far, Ganesha's default cache size is 999 _files_, but it doesn't seem like the actual cache memory utilization (the size of the files) is taken into account. I'm hoping for more clarity around that, since that seems like an obvious OOM risk.

Comment 4 Blaine Gardner 2023-08-21 18:00:36 UTC

@ekuric could you clarify some info about your testing?

When you say 12, 24, 50 pods, are those pods all using the same PVC, or does each pod have a separate PVC? Based on looking at other NFS-Ganesha FIO tests for RHCS, the [X pods : X PVCs] case may use significantly more memory than [X pods : 1 PVC].

Could you collect performance results from both cases ([X pods : 1 PVC] and [X pods : X PVCs]) and summarize the results here? No need to repeat the tests you've already done, but info from the tests that haven't been done will help us know how to advise users.

Comment 7 Blaine Gardner 2023-09-08 19:05:03 UTC

NFS-Ganesha has a known bug where resource usage balloons with increasing numbers of exports (PVCs in ODF's case). 

One of the RHCS performance tests showed that 100 exports with 3 clients each resulted in nearly 80 GIGAbytes of memory usage. Obviously, that is extreme, but it exemplifies the issue.

For the inverse case of 3 exports (PVCs) with 100 clients (`mount -t nfs` or Pod volumes) each, the resource usage should be much less, though I don't have strong numbers.

I think the best thing we can do for ODF 4.14 is to advise CEEs and customers to do some performance testing with the max usage they expect, and to increase the NFS resources appropriately.

@etamir what do you think? Should we create a doc bug to suggest this in the docs?

Comment 8 Blaine Gardner 2023-09-08 19:36:49 UTC

RHCS bug here: https://bugzilla.redhat.com/show_bug.cgi?id=2236325

Ganesha component is not available, so assigning to RADOS

Comment 10 Eran Tamir 2023-09-11 09:50:19 UTC

@brgardne We can add it to the docs or in a KCS (https://access.redhat.com/solutions/7003575), but we need a rule of thumb - every 10 exports require an additional 16GB RAM or something like that.  
Elvir, so you have such a rule of thumb for us?

Comment 13 Blaine Gardner 2023-10-11 20:12:15 UTC

I don't have very good info for a rule of thumb for a KCS. The information I have is that the usage often jumps hugely at some thresholds, but the graphs I have don't have the resolution for me to know for sure. I'm asking Paul Cuzner for more info.

Comment 17 Blaine Gardner 2023-10-23 19:32:18 UTC

As far as I understand, the NFS-Ganesha fix is not yet part of RHCS and isn't in any ODF versions. I think using the latest version of the v4.14 RC is probably sufficient.

Comment 38 errata-xmlrpc 2024-08-19 07:41:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.1 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:5547