Bug 1992473 - ceph-nfs-pacemaker container crashes as NFS Ganesha exported shares number grow
Summary: ceph-nfs-pacemaker container crashes as NFS Ganesha exported shares number grow
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2z3
Assignee: Teoman ONAY
QA Contact: Ameena Suhani S H
Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks: 1760354 1890121 1987235 1993210
TreeView+ depends on / blocked
 
Reported: 2021-08-11 07:46 UTC by Victoria Martinez de la Cruz
Modified: 2024-12-20 20:40 UTC (History)
15 users (show)

Fixed In Version: ceph-ansible-4.0.62.3-1.el8cp, ceph-ansible-4.0.62.3-1.el7cp
Doc Type: Bug Fix
Doc Text:
Cause: Default values of pid-limits for podman (2048) and docker (4096) might be insufficient when the user increases the number of NFS shares Consequence: The container fails to start as the number of processes it needs to start is higher than the allowed limit. Fix: remove that limit of maximum processes that can be started within a container by adding the --pid-limits parameter (-1 for podman & 0 for docker) to the systemd service file Result: Containers will start even if the user customize some internal processes which might require to run more processes than the default limits.
Clone Of:
Environment:
Last Closed: 2021-09-27 18:26:56 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 6777 0 None None None 2021-08-11 07:48:58 UTC
Github ceph ceph-ansible pull 6789 0 None None None 2021-08-11 11:48:06 UTC
Red Hat Issue Tracker RHCEPH-698 0 None None None 2021-08-18 21:55:59 UTC
Red Hat Knowledge Base (Solution) 6431861 0 None None None 2021-10-19 04:58:44 UTC
Red Hat Product Errata RHBA-2021:3670 0 None None None 2021-09-27 18:27:31 UTC

Description Victoria Martinez de la Cruz 2021-08-11 07:46:20 UTC
Description of problem:

As reported in [0], after creating and exporting 200+ shares, the ceph-nfs-pacemaker container crashes and cannot be restarted.

After analyzing the issue with the Manila squad and NFS Ganesha team, we found out this has to do with the fact that each exported share will spun 15~ threads consistently. By default, containers have a limit of 4096 pids. After creating a certain amount of shares and exporting them, we reach that limit easily.

In order to fix it, we can either bump PidsLimit to a higher value, or set it to 0 so we don't constrain the ceph-nfs-pacemaker container to a certain limit.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1987235


How reproducible: Always reproducible

Steps to Reproduce:
1. Deploy an environment with Manila with CephFS NFS
2. Create shares (manila create, approximately, 200 shares)
3. Export those shares (manila allow-access)
4. See the container crash

Actual results: ceph-nfs-pacemaker container crashes, preventing the user to continue creating shares and accessing to the shares they already created


Expected results: user should be able to create a higher amount of shares and have those available

Comment 23 errata-xmlrpc 2021-09-27 18:26:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.2 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3670

Comment 24 Red Hat Bugzilla 2023-09-15 01:13:32 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.