|Summary:||[OSP13] large number of interfaces cause slow puppet executions due to fact generation|
|Product:||Red Hat OpenStack||Reporter:||Alex Schultz <aschultz>|
|Component:||openstack-tripleo-heat-templates||Assignee:||Alex Schultz <aschultz>|
|Status:||CLOSED ERRATA||QA Contact:||Sasha Smolyak <ssmolyak>|
|Version:||13.0 (Queens)||CC:||amcleod, emacchi, jhajyahy, jschluet, knoha, mburns, mfuruta, slinaber, ssmolyak|
|Target Milestone:||z9||Keywords:||Triaged, ZStream|
|Target Release:||13.0 (Queens)|
|Fixed In Version:||openstack-tripleo-heat-templates-8.3.1-92.el7ost||Doc Type:||Enhancement|
This enhancement upgrades facter to version 3, which improves performance when you deploy and run updates on systems with a large number of network interfaces. This version of facter supports fact caching and generates the fact list significantly faster. NOTE You must run facter version 3 in the same containers that you deploy on the host system when you use the version of openstack-tripleo-heat-templates that implements facter version 3.
|Last Closed:||2019-11-07 14:01:17 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:||1728402, 1737512, 1749885|
|Bug Blocks:||1731209, 1711267, 1731208|
Description Alex Schultz 2019-07-18 15:39:55 UTC
+++ This bug was initially created as a clone of Bug #1728402 +++ Description of problem: When a compute node (or controller) has many interfaces (bridges/tun), during the application of updates the puppet processes start thrashing trying to generate the network puppet facts. This affects both facter 2 and facter 3 based systems due to the fact that multiple processes end up trying to execute ip commands to gather the networking facts. This can be worked around by caching the facts for the life of the container-puppet.py (or docker-puppet.py) executions. This affects all versions of TripleO since Ocata. Prior to Ocata, facter 2 is super slow when there are more than than 1000 interfaces on a host. At ~1500, it can take >= 10 minutes to generate the facts. for i in $(seq 1 380); do ip tuntap add name dummy_tun$i mode tun; done for i in $(seq 1 1274); do ip link add name dummy_br$i type bridge; done $ time facter facter2 real 9m51.817s user 7m8.936s sys 2m42.702s facter3 real 0m2.954s user 0m1.111s sys 0m1.721s $ time puppet facts facter2 real 12m10.936s user 8m16.478s sys 3m54.138s facter3 real 0m11.169s user 0m5.522s sys 0m4.002s Steps to Reproduce: 1. deploy basic undercloud 2. deploy default overcloud 1ctlr+1compute 3. Apply dummy network interfaces to compute for i in $(seq 1 380); do ip tuntap add name dummy_tun$i mode tun; done for i in $(seq 1 1274); do ip link add name dummy_br$i type bridge; done 4. deploy stack update with no configuration changes Actual results: Stack update will take an excessive amount of time (almost 4 hours in my test) compared to a basic 1ctlr+1compute update with no interfaces (~30 mins in my test) Expected results: Stack update may take a bit longer but it shouldn't approach timeouts.
Comment 1 Alex Schultz 2019-07-18 15:42:52 UTC
It should be noted that OSP13 uses facter2 which will still have a larger delay than with facter3 (osp15+). We are currently investigating upgrading facter as well for this version but this should improve the existing performance.
Comment 2 Keigo Noha 2019-08-05 08:43:41 UTC
Hi Alex, I tried to backport the patch manually into RHOSP13z7. The patch in upstream doesn't work in RHOSP13z7 because RHOSP13 still uses facter2. If we ship this fix without facter3, user cannot do overcloud deployment. ~~~ - name: Pre-cache facts command: facter --config /var/lib/container-puppet/puppetlabs/facter.conf no_log: True ignore_errors: True tags: - container_config - container_config_tasks ~~~ This task fails due to the lack of command option support for --config. Could you check this issue as soon as possible? Best Regards, Keigo Noha
Comment 3 Alex Schultz 2019-08-05 13:18:48 UTC
You're right, i'll roll it back for now.
Comment 4 Emilien Macchi 2019-08-29 14:33:13 UTC
This is a work in progress as you can see in the dependencies being rebased. Moving to ON_DEV. Alex please move it to POST/MODIFIED if you think it's ready. Thanks!
Comment 5 Alex Schultz 2019-09-25 18:32:53 UTC
*** Bug 1711267 has been marked as a duplicate of this bug. ***
Comment 7 Jad Haj Yahya 2019-10-06 07:12:50 UTC
Verified on 13 -p 2019-10-01.1
Comment 9 errata-xmlrpc 2019-11-07 14:01:17 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3794