Description of problem: while running the ansible-playbook command in undercloud it is taking too much time time ./ansible-playbook-command.sh PLAY RECAP ************************************************************************************************************************************************************************************************************************************ compute10r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute11r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute12r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute13r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute1r1-prod : ok=295 changed=119 unreachable=0 failed=0 skipped=1008 rescued=0 ignored=0 compute2r1-prod : ok=296 changed=118 unreachable=0 failed=0 skipped=1008 rescued=0 ignored=0 compute3r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute4r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute5r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute6r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute7r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute8r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute9r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 controller3v-prod : ok=306 changed=133 unreachable=0 failed=0 skipped=1035 rescued=0 ignored=0 swift3v-prod : ok=261 changed=108 unreachable=0 failed=0 skipped=1047 rescued=0 ignored=0 compute10r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute11r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute12r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute1r1-prod : ok=295 changed=119 unreachable=0 failed=0 skipped=1008 rescued=0 ignored=0 compute1r2-prod : ok=296 changed=120 unreachable=0 failed=0 skipped=1007 rescued=0 ignored=0 compute2r1-prod : ok=296 changed=118 unreachable=0 failed=0 skipped=1008 rescued=0 ignored=0 compute3r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute4r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute5r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute6r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute7r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute8r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute9r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 controller2v-prod : ok=288 changed=133 unreachable=0 failed=0 skipped=1026 rescued=0 ignored=0 swift2v-prod : ok=261 changed=108 unreachable=0 failed=0 skipped=1047 rescued=0 ignored=0 compute10r1-prod : ok=289 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute11r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute12r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute1r1-prod : ok=295 changed=119 unreachable=0 failed=0 skipped=1008 rescued=0 ignored=0 compute1r2-prod : ok=296 changed=120 unreachable=0 failed=0 skipped=1007 rescued=0 ignored=0 compute2r1-prod : ok=295 changed=119 unreachable=0 failed=0 skipped=1008 rescued=0 ignored=0 compute3r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute4r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute5r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute6r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute7r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute8r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 compute9r1-prod : ok=286 changed=114 unreachable=0 failed=0 skipped=1017 rescued=0 ignored=0 controller1v-prod : ok=288 changed=133 unreachable=0 failed=0 skipped=1026 rescued=0 ignored=0 swift1v-prod : ok=261 changed=108 unreachable=0 failed=0 skipped=1047 rescued=0 ignored=0 undercloud : ok=106 changed=30 unreachable=0 failed=0 skipped=8 rescued=0 ignored=0 Monday 14 December 2020 21:10:30 +0200 (0:00:00.067) 3:14:01.716 ******* =============================================================================== Wait for containers to start for step 4 using paunch --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 529.03s Wait for puppet host configuration to finish ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 257.09s Render all_nodes data as group_vars for overcloud ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 128.87s Configure octavia on overcloud ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 111.31s tripleo-hosts-entries : Render out the hosts entries ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 50.66s tripleo-hieradata : Render hieradata from template ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 50.56s Wait for puppet host configuration to finish ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 37.02s Wait for puppet host configuration to finish ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 36.97s Wait for puppet host configuration to finish ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 36.95s Wait for puppet host configuration to finish ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 36.60s Wait for container-puppet tasks (generate config) to finish --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 30.88s Wait for container-puppet tasks (bootstrap tasks) for step 5 to finish ---------------------------------------------------------------------------------------------------------------------------------------------------------------- 22.85s Gathering Facts ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 17.22s Gathering Facts ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 17.17s install needed packages --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 15.50s install needed packages --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 15.39s Sync cached facts --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 15.01s Sync cached facts --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 14.90s Sync cached facts --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 14.86s Sync cached facts --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 14.82s real 194m3.649s user 166m43.413s sys 39m20.972s Version-Release number of selected component (if applicable): How reproducible: easily reproducible Steps to Reproduce: 1.create multiple instances on a compute node 2.If possible add multiple NIC to the instance 3.Run ansible-playbook command from undercloud Actual results: ansible-playbook take a significant amount of time to collect the facts Expected results: ansible-playbook should not take much time for collecting the facts Additional info:
This is likely an issue with ansible itself if gathering facts are taking too long. It's likely not something we can work around in tripleo-ansible. If possible, can we have the full deployment output log so we can see if there's other issues at play? We have a BZ for 16.2 to try and improve things https://bugzilla.redhat.com/show_bug.cgi?id=1897890
BZ1897890 might help quite a lot, because we have 10 roles defined. Actually issue is not facts gathering, but all the facts need to parsed also when skipping hosts. If multiple hosts have 5MB or more facts skipping get a lot slower. If skipping takes 10s then 1000*10s = 2,5h just for skipping tasks.
Skipping tasks should not take 10s, however we've seen that if you end up using a default ansible.cfg instead of the one we generate. If you can provide what exactly was run and if there was an ansible.cfg in the current working directory, we can provide recommendations.
So to provide a bit of an update, I am currently looking into how the fact gather actually affects task execution. I have found with compute system, the network fact gathering alone can dramatically increase the overall fact size due to all the tap interfaces created with the instances. This also has a side effect of increasing overall ansible memory utilization which can directly impact the speed at which tasks can be executed. This is like an issue with ansible itself and we're investigating if there's anything we can do from an OSP standpoint to reduce the impact. Currently I believe we rely on some of the network and hardware facts as part of the deployment in that we may not be able to just turn them off. We might be able to reduce our reliance on them and improve things at a future time. In the mean time, if a customer wishes to write their own playbook and execute them against these hosts, it's recommended to disable fact gathering if possible or reduce the scope. You can specify "gather_subset = !virtual,!oahi,!facter,!network,all" in ansible.cfg to reduce the amount of information collected which should improve the overall execution time. I don't think this works with our deployment playbooks but I will be investigating this.
I've filed an upstream ansible issue to try and figure out how we can improve it. It turns out you can reduce the impact by disabling the INJECT_FACTS_AS_VARS setting in ansible. https://docs.ansible.com/ansible/latest/reference_appendices/config.html#inject-facts-as-vars There is still a significant performance impact but it's greatly reduced in comparison to the existing impact. See https://github.com/ansible/ansible/issues/73654 for additional information. If a user is running ansible manually, please considering configuring this value to false (it's true by default).
Verified using this procedure: Created a 3compute 3control 1ceph deployment added interfaces to all three compute node using: for i in $(seq 1 380); do ip tuntap add name dummy_tun$i mode tun; done for i in $(seq 1 1274); do ip link add name dummy_br$i type bridge; done Reran the deployment With RHOS-16.1-RHEL-8-20210323.n.0 re-running deploy took 4635 seconds With RHOS-16.1-RHEL-8-20210415.n.0 re-running deploy took 1939 seconds Also did same procedure with a 1compute 1control 1ceph deployment With RHOS-16.1-RHEL-8-20210323.n.0 re-running deploy took 2398 seconds With RHOS-16.1-RHEL-8-20210415.n.0 re-running deploy took 1565 seconds When using a large number of interfaces a significant reduction in the re-deploy time is seen with the fix.
*** Bug 1956321 has been marked as a duplicate of this bug. ***
*** Bug 1962589 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenStack Platform 16.1.6 (tripleo-ansible) security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2119