Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1564138 - oslo-rootwrap-daemon performing badly in docker containers
oslo-rootwrap-daemon performing badly in docker containers
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-paunch (Show other bugs)
13.0 (Queens)
All All
urgent Severity urgent
: beta
: 13.0 (Queens)
Assigned To: Emilien Macchi
Artem Hrechanychenko
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-04-05 08:54 EDT by wes hayutin
Modified: 2018-06-27 09:51 EDT (History)
15 users (show)

See Also:
Fixed In Version: python-paunch-2.5.0-1.el7ost openstack-tripleo-heat-templates-8.0.2-0.20180410061339.b937f35.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-06-27 09:50:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1760471 None None None 2018-04-05 08:55 EDT
OpenStack gerrit 559462 None stable/queens: MERGED paunch: Add ulimit option for run action (I0cfcf4e3e3e13578ec42e12f459732992fb3a760) 2018-04-10 23:04 EDT
OpenStack gerrit 559631 None stable/queens: MERGED tripleo-heat-templates: Set ulimit for neutron agent containers (Iec722cdfd7642ff3149f50d940d8079b9e1b7147) 2018-04-10 23:04 EDT
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 09:51 EDT

  None (edit)
Description wes hayutin 2018-04-05 08:54:46 EDT
Description of problem:

Tracking bug for https://bugs.launchpad.net/oslo.rootwrap/+bug/1760471
Comment 2 Victor Stinner 2018-04-05 09:21:58 EDT
The issue is specific to Docker containers which are spawn with NOFILE ulimit equals to 1048576. Using close_fds=False in oslo.rootwrap can impact security (introduce a risk of leaking a sensitive file descriptor). Backporting the Python 3 code to optimize close_fds=True to Python 2.7 is non trivial, I suggest to not do that.

As I wrote in the Launchpad issue, the bug is not specific to Python: slapd and rpm are impacted as well by the high NOFILE ulimit value. I suggest instead to ajust the docker configuration, at least for specific containers. Example: "sudo docker run --ulimit nofile=1024:1024 ...".
Comment 3 Yatin Karel 2018-04-05 12:00:22 EDT
Just for record: to add that performance hit is noted in phase 2 job as well:-

Some operation logs by l3 agent with time taken:-

1) http://cougar11.scl.lab.tlv.redhat.com/phase2-13_director-rhel-7.5-virthost-3cont_1comp_3ceph-ipv4-vxlan-ceph-containers/14/controller-0.tar.gz?controller-0/var/log/containers/neutron/l3-agent.log

----------------25 seconds--------------------
2018-04-02 12:36:10.818 131470 DEBUG neutron.agent.l3.agent [req-55ab23f4-0385-4da7-9b8a-0d50e64fc2ca 76e23c896d5d4bdba0ae71ab562859f2 d6d063764f424df58818a4b11bd15b3b - - -] Got routers updated notification :[u'7974c7f9-b57a-4395-b12e-e520693fbc0b'] routers_updated /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:424
2018-04-02 12:36:35.355 131470 DEBUG neutron.agent.l3.agent [-] Finished a router update for 7974c7f9-b57a-4395-b12e-e520693fbc0b _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:574


2) http://cougar11.scl.lab.tlv.redhat.com/phase2-13_director-rhel-7.5-virthost-3cont_1comp_3ceph-ipv4-vxlan-ceph-containers/14/controller-1.tar.gz?controller-1/var/log/containers/neutron/l3-agent.log

--------------------20 seconds-------------------------
2018-04-02 12:48:03.290 109676 DEBUG neutron.agent.l3.agent [req-09271469-1c18-4fb3-9682-486a600de182 76e23c896d5d4bdba0ae71ab562859f2 d6d063764f424df58818a4b11bd15b3b - - -] Got routers updated notification :[u'35036160-53f3-4592-822c-aab5155a6c05'] routers_updated /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:424

2018-04-02 12:48:23.809 109676 DEBUG neutron.agent.l3.agent [-] Finished a router update for 35036160-53f3-4592-822c-aab5155a6c05 _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:574


Yes as Victor said the best would be to chage ulimit per container basis.
Comment 5 Emilien Macchi 2018-04-10 22:08:37 EDT
Both https://review.openstack.org/#/c/559631 and https://review.openstack.org/#/c/559462/ have been merged upstream and downstream for OSP13.
Comment 13 Artem Hrechanychenko 2018-04-19 13:09:00 EDT
VERIFIED

python-paunch-2.5.0-1.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-0.20180414062830.5f869f2.el7ost.noarch


[heat-admin@compute-0 ~]$ sudo docker inspect neutron_ovs_agent |grep ulimit
                "config_data": "{\"start_order\": 10, \"ulimit\": [\"nofile=1024\"]

[heat-admin@controller-0 ~]$ sudo docker inspect neutron_l3_agent |grep ulimit
                "config_data": "{\"start_order\": 10, \"ulimit\": [\"nofile=1024\"]

[heat-admin@controller-0 ~]$ sudo docker inspect neutron_dhcp |grep ulimit
                "config_data": "{\"start_order\": 10, \"ulimit\": [\"nofile=1024\"]


[heat-admin@controller-0 ~]$ sudo docker inspect neutron_l3_agent |grep ulimit
                "config_data": "{\"start_order\": 10, \"ulimit\": [\"nofile=1024\"]

[heat-admin@controller-0 ~]$ sudo docker inspect neutron_ovs_agent |grep ulimit
                "config_data": "{\"start_order\": 10, \"ulimit\": [\"nofile=1024\"]
Comment 16 errata-xmlrpc 2018-06-27 09:50:52 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086

Note You need to log in before you can comment on or make changes to this bug.