Bug 1564138

Summary: oslo-rootwrap-daemon performing badly in docker containers
Product: Red Hat OpenStack Reporter: wes hayutin <whayutin>
Component: python-paunchAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: Artem Hrechanychenko <ahrechan>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: abeekhof, ahrechan, apevec, bhaley, dbecker, emacchi, jcoufal, jschluet, lhh, mburns, morazi, rhel-osp-director-maint, sclewis, srevivo, ykarel
Target Milestone: betaKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: python-paunch-2.5.0-1.el7ost openstack-tripleo-heat-templates-8.0.2-0.20180410061339.b937f35.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:50:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description wes hayutin 2018-04-05 12:54:46 UTC
Description of problem:

Tracking bug for https://bugs.launchpad.net/oslo.rootwrap/+bug/1760471

Comment 2 Victor Stinner 2018-04-05 13:21:58 UTC
The issue is specific to Docker containers which are spawn with NOFILE ulimit equals to 1048576. Using close_fds=False in oslo.rootwrap can impact security (introduce a risk of leaking a sensitive file descriptor). Backporting the Python 3 code to optimize close_fds=True to Python 2.7 is non trivial, I suggest to not do that.

As I wrote in the Launchpad issue, the bug is not specific to Python: slapd and rpm are impacted as well by the high NOFILE ulimit value. I suggest instead to ajust the docker configuration, at least for specific containers. Example: "sudo docker run --ulimit nofile=1024:1024 ...".

Comment 3 Yatin Karel 2018-04-05 16:00:22 UTC
Just for record: to add that performance hit is noted in phase 2 job as well:-

Some operation logs by l3 agent with time taken:-

1) http://cougar11.scl.lab.tlv.redhat.com/phase2-13_director-rhel-7.5-virthost-3cont_1comp_3ceph-ipv4-vxlan-ceph-containers/14/controller-0.tar.gz?controller-0/var/log/containers/neutron/l3-agent.log

----------------25 seconds--------------------
2018-04-02 12:36:10.818 131470 DEBUG neutron.agent.l3.agent [req-55ab23f4-0385-4da7-9b8a-0d50e64fc2ca 76e23c896d5d4bdba0ae71ab562859f2 d6d063764f424df58818a4b11bd15b3b - - -] Got routers updated notification :[u'7974c7f9-b57a-4395-b12e-e520693fbc0b'] routers_updated /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:424
2018-04-02 12:36:35.355 131470 DEBUG neutron.agent.l3.agent [-] Finished a router update for 7974c7f9-b57a-4395-b12e-e520693fbc0b _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:574


2) http://cougar11.scl.lab.tlv.redhat.com/phase2-13_director-rhel-7.5-virthost-3cont_1comp_3ceph-ipv4-vxlan-ceph-containers/14/controller-1.tar.gz?controller-1/var/log/containers/neutron/l3-agent.log

--------------------20 seconds-------------------------
2018-04-02 12:48:03.290 109676 DEBUG neutron.agent.l3.agent [req-09271469-1c18-4fb3-9682-486a600de182 76e23c896d5d4bdba0ae71ab562859f2 d6d063764f424df58818a4b11bd15b3b - - -] Got routers updated notification :[u'35036160-53f3-4592-822c-aab5155a6c05'] routers_updated /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:424

2018-04-02 12:48:23.809 109676 DEBUG neutron.agent.l3.agent [-] Finished a router update for 35036160-53f3-4592-822c-aab5155a6c05 _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:574


Yes as Victor said the best would be to chage ulimit per container basis.

Comment 5 Emilien Macchi 2018-04-11 02:08:37 UTC
Both https://review.openstack.org/#/c/559631 and https://review.openstack.org/#/c/559462/ have been merged upstream and downstream for OSP13.

Comment 13 Artem Hrechanychenko 2018-04-19 17:09:00 UTC
VERIFIED

python-paunch-2.5.0-1.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-0.20180414062830.5f869f2.el7ost.noarch


[heat-admin@compute-0 ~]$ sudo docker inspect neutron_ovs_agent |grep ulimit
                "config_data": "{\"start_order\": 10, \"ulimit\": [\"nofile=1024\"]

[heat-admin@controller-0 ~]$ sudo docker inspect neutron_l3_agent |grep ulimit
                "config_data": "{\"start_order\": 10, \"ulimit\": [\"nofile=1024\"]

[heat-admin@controller-0 ~]$ sudo docker inspect neutron_dhcp |grep ulimit
                "config_data": "{\"start_order\": 10, \"ulimit\": [\"nofile=1024\"]


[heat-admin@controller-0 ~]$ sudo docker inspect neutron_l3_agent |grep ulimit
                "config_data": "{\"start_order\": 10, \"ulimit\": [\"nofile=1024\"]

[heat-admin@controller-0 ~]$ sudo docker inspect neutron_ovs_agent |grep ulimit
                "config_data": "{\"start_order\": 10, \"ulimit\": [\"nofile=1024\"]

Comment 16 errata-xmlrpc 2018-06-27 13:50:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086