Bug 1992473

Summary: ceph-nfs-pacemaker container crashes as NFS Ganesha exported shares number grow
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Victoria Martinez de la Cruz <vimartin>
Component: Ceph-AnsibleAssignee: Teoman ONAY <tonay>
Status: CLOSED ERRATA QA Contact: Ameena Suhani S H <amsyedha>
Severity: high Docs Contact: Aron Gunn <agunn>
Priority: unspecified    
Version: 4.2CC: agunn, aschoen, ceph-eng-bugs, dsavinea, gfidente, gmeno, nthomas, pasik, rmandyam, tbarron, tonay, tserlin, vereddy, ykaul, yocha
Target Milestone: ---Keywords: Performance, Scale
Target Release: 4.2z3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.62.3-1.el8cp, ceph-ansible-4.0.62.3-1.el7cp Doc Type: Bug Fix
Doc Text:
Cause: Default values of pid-limits for podman (2048) and docker (4096) might be insufficient when the user increases the number of NFS shares Consequence: The container fails to start as the number of processes it needs to start is higher than the allowed limit. Fix: remove that limit of maximum processes that can be started within a container by adding the --pid-limits parameter (-1 for podman & 0 for docker) to the systemd service file Result: Containers will start even if the user customize some internal processes which might require to run more processes than the default limits.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-27 18:26:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1760354, 1890121, 1987235, 1993210    

Description Victoria Martinez de la Cruz 2021-08-11 07:46:20 UTC
Description of problem:

As reported in [0], after creating and exporting 200+ shares, the ceph-nfs-pacemaker container crashes and cannot be restarted.

After analyzing the issue with the Manila squad and NFS Ganesha team, we found out this has to do with the fact that each exported share will spun 15~ threads consistently. By default, containers have a limit of 4096 pids. After creating a certain amount of shares and exporting them, we reach that limit easily.

In order to fix it, we can either bump PidsLimit to a higher value, or set it to 0 so we don't constrain the ceph-nfs-pacemaker container to a certain limit.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1987235


How reproducible: Always reproducible

Steps to Reproduce:
1. Deploy an environment with Manila with CephFS NFS
2. Create shares (manila create, approximately, 200 shares)
3. Export those shares (manila allow-access)
4. See the container crash

Actual results: ceph-nfs-pacemaker container crashes, preventing the user to continue creating shares and accessing to the shares they already created


Expected results: user should be able to create a higher amount of shares and have those available

Comment 23 errata-xmlrpc 2021-09-27 18:26:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.2 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3670

Comment 24 Red Hat Bugzilla 2023-09-15 01:13:32 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days