Bug 1766079 - default nofile ulimit is too small for ceph-osd container
Summary: default nofile ulimit is too small for ceph-osd container
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z4
: 3.3
Assignee: Dimitri Savineau
QA Contact: Vasishta
URL:
Whiteboard:
Depends On:
Blocks: 1578730
TreeView+ depends on / blocked
 
Reported: 2019-10-28 08:28 UTC by Meiyan Zheng
Modified: 2020-04-06 08:27 UTC (History)
12 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.38-1.el7cp Ubuntu: ceph-ansible_3.2.38-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-06 08:27:05 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 4696 0 'None' closed ceph-osd: Remove ulimit nofile on container start 2020-06-26 15:36:21 UTC
Github ceph ceph-ansible pull 4699 0 'None' closed ceph-osd: Remove ulimit nofile on container start (bp #4696) 2020-06-26 15:36:21 UTC
Github ceph ceph-ansible pull 4700 0 'None' closed ceph-osd: Remove ulimit nofile on container start (bp #4696) 2020-06-26 15:36:21 UTC
Github ceph ceph-container pull 1497 0 'None' closed src/daemon: enforce ceph-osd ulimit values 2020-06-26 15:36:20 UTC
Github ceph ceph-container pull 1499 0 'None' closed src/daemon: enforce ceph-osd ulimit values (bp #1497) 2020-06-26 15:36:20 UTC
Github ceph ceph-container pull 1500 0 'None' closed src/daemon: enforce ceph-osd ulimit values (bp #1497) 2020-06-26 15:36:21 UTC
Red Hat Product Errata RHBA-2020:1320 0 None None None 2020-04-06 08:27:32 UTC

Description Meiyan Zheng 2019-10-28 08:28:06 UTC
Description of problem:

Following errors happening on customer side and OSDs keep down/up on all osd-nodes. 

filestore(/var/lib/ceph/osd/ceph-10)  error (24) Too many open files not handled on operation 0x55dfec678a00 (805947768.0.0, or op 0, counting from 0)


checking nofile for ceph-osd container:

$ cat sos_commands/docker/docker_inspect_f2863fa486e0 | grep limit -A5
            "Ulimits": [
                {
                    "Name": "nofile",
                    "Hard": 4096,
                    "Soft": 1024
                }




Version-Release number of selected component (if applicable):

$ rpm -qa | grep tripleo
ansible-tripleo-ipsec-8.1.1-0.20190513184007.7eb892c.el7ost.noarch
openstack-tripleo-heat-templates-8.3.1-87.el7ost.noarch
puppet-tripleo-8.4.1-27.el7ost.noarch
openstack-tripleo-image-elements-8.0.2-2.el7ost.noarch
openstack-tripleo-common-containers-8.6.8-16.el7ost.noarch
python-tripleoclient-9.2.7-11.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.2-3.el7ost.noarch
openstack-tripleo-validations-8.4.5-2.el7ost.noarch
openstack-tripleo-ui-8.3.2-3.el7ost.noarch
openstack-tripleo-common-8.6.8-16.el7ost.noarch



How reproducible:
Deploy cephstorage nodes with director 

Steps to Reproduce:
1.
2.
3.

Actual results:
The nofile is 4096 for hard limit

Expected results:
The nofile should be bigger, like 1048576

Additional info:

Comment 1 Dougal Matthews 2019-10-28 12:21:11 UTC
I think we could use some Storage DFG input on this one

Comment 4 RHEL Program Management 2019-10-29 13:36:53 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 10 Vasishta 2020-03-17 15:54:04 UTC
Working fine with 
ceph-ansible-3.2.40-1
ceph-3.3-rhel-7-containers-candidate-31439-20200312223408 

$ sudo docker exec -it ceph-osd-3 bash
# ulimit -n
1048576
#  ulimit -Hn
1048576
#  ulimit -Sn
1048576


Moving to VERIFIED state, Please let us know if there are any concerns.

Regards,
Vasishta Shastry
QE, Ceph

Comment 12 errata-xmlrpc 2020-04-06 08:27:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1320


Note You need to log in before you can comment on or make changes to this bug.