1731208 – [OSP15] large number of interfaces cause slow puppet executions due to fact generation

Bug 1731208 - [OSP15] large number of interfaces cause slow puppet executions due to fact generation

Summary: [OSP15] large number of interfaces cause slow puppet executions due to fact g...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	15.0 (Stein)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	15.0 (Stein)
Assignee:	Alex Schultz
QA Contact:	Sasha Smolyak
Docs Contact:
URL:
Whiteboard:
Depends On:	1728402 1731209 1731210
Blocks:	1711267
TreeView+	depends on / blocked

Reported:	2019-07-18 15:37 UTC by Alex Schultz
Modified:	2019-09-27 10:44 UTC (History)
CC List:	2 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-10.6.1-0.20190729150510.74ae8ba.el8ost
Doc Type:	No Doc Update
Doc Text:
Clone Of:	1728402
Environment:
Last Closed:	2019-09-21 11:24:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1835959	None	None	None	2019-07-18 15:37:03 UTC
OpenStack gerrit	672047	None	None	None	2019-07-22 14:03:55 UTC
Red Hat Product Errata	RHEA-2019:2811	None	None	None	2019-09-21 11:24:19 UTC

Description Alex Schultz 2019-07-18 15:37:03 UTC

+++ This bug was initially created as a clone of Bug #1728402 +++

Description of problem:
When a compute node (or controller) has many interfaces (bridges/tun), during the application of updates the puppet processes start thrashing trying to generate the network puppet facts. This affects both facter 2 and facter 3 based systems due to the fact that multiple processes end up trying to execute ip commands to gather the networking facts. This can be worked around by caching the facts for the life of the container-puppet.py (or docker-puppet.py) executions. This affects all versions of TripleO since Ocata. Prior to Ocata, facter 2 is super slow when there are more than than 1000 interfaces on a host. At ~1500, it can take >= 10 minutes to generate the facts.

for i in $(seq 1 380); do ip tuntap add name dummy_tun$i mode tun; done
for i in $(seq 1 1274); do ip link add name dummy_br$i type bridge; done

$ time facter

facter2
real 9m51.817s
user 7m8.936s
sys 2m42.702s

facter3
real 0m2.954s
user 0m1.111s
sys 0m1.721s

$ time puppet facts

facter2
real 12m10.936s
user 8m16.478s
sys 3m54.138s

facter3
real 0m11.169s
user 0m5.522s
sys 0m4.002s


Steps to Reproduce:
1. deploy basic undercloud
2. deploy default overcloud 1ctlr+1compute
3. Apply dummy network interfaces to compute
for i in $(seq 1 380); do ip tuntap add name dummy_tun$i mode tun; done
for i in $(seq 1 1274); do ip link add name dummy_br$i type bridge; done
4. deploy stack update with no configuration changes

Actual results:
Stack update will take an excessive amount of time (almost 4 hours in my test) compared to a basic 1ctlr+1compute update with no interfaces (~30 mins in my test)

Expected results:
Stack update may take a bit longer but it shouldn't approach timeouts.

Comment 6 errata-xmlrpc 2019-09-21 11:24:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811

Note You need to log in before you can comment on or make changes to this bug.