Bug 1749406 - FFWD ansible is slow due to gather facts
Summary: FFWD ansible is slow due to gather facts
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
Target Milestone: z9
: 13.0 (Queens)
Assignee: mathieu bultel
QA Contact: Ronnie Rasouli
Whiteboard: Triaged
: 1728215 1761395 1775869 (view as bug list)
Depends On:
Blocks: 1761395
TreeView+ depends on / blocked
Reported: 2019-09-05 14:36 UTC by Lukas Bezdicka
Modified: 2019-12-05 10:08 UTC (History)
19 users (show)

Fixed In Version: openstack-tripleo-common-8.7.1-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1761395 (view as bug list)
Last Closed: 2019-12-05 10:08:11 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
OpenStack gerrit 682855 'None' MERGED Implement Ansible fact cache for Mistral executor 2020-05-28 11:33:42 UTC
OpenStack gerrit 686557 'None' MERGED Implement Ansible fact cache for Mistral executor 2020-05-28 11:33:39 UTC
Red Hat Knowledge Base (Solution) 4533001 None None None 2019-10-27 15:39:07 UTC
Red Hat Product Errata RHBA-2019:3794 None None None 2019-11-07 14:02:37 UTC

Description Lukas Bezdicka 2019-09-05 14:36:12 UTC
On customer environment it was noticed that ansible playbook was extremely slow due to fact gathering

Comment 3 Lukas Bezdicka 2019-09-18 12:03:15 UTC
*** Bug 1728215 has been marked as a duplicate of this bug. ***

Comment 4 Jesse Pretorius 2019-09-18 14:49:19 UTC
FYI, implementing fact gathering in my testing has improved the performance of the update/upgrade/ffwd-upgrade playbooks. As such I've proposed https://review.opendev.org/682855 upstream. If it's accepted and merged, it may be a good candidate for backporting. I'm working out some other OSP10->OSP13 ffwd-upgrade issues in my environment - once they're resolved, I'll post some timings.

In order to validate this, I implemented the following on the undercloud host which has the same effect and may be useful as a workaround if the patch is not suitable as a backport:

(undercloud) [stack@undercloud-0 ~]$ sudo mv /etc/ansible/ansible.cfg /etc/ansible/ansible.org.cfg
(undercloud) [stack@undercloud-0 ~]$ sudo tee /etc/ansible/ansible.cfg <<EOF
roles_path    = /etc/ansible/roles:/usr/share/ansible/roles

# improve fact gathering performance
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /var/tmp/ansible_fact_cache

# two hours timeout
fact_caching_timeout = 7200











Comment 9 Lukas Bezdicka 2019-10-17 14:25:53 UTC
For workaround documentation:

Starting with running openstack overcloud ffwd-upgrade prepare .... :

1) When the prepare finishes (I still suggest running prepare after you restore Undercloud) run the config download which will save the config to tripleo-config-XXXX directory:
    openstack overcloud config download

2) To speed up running each playbook, extract the inventory and store it in a temporary location. It is important that the environment does not change between this extraction and the completion of the process. ie Do not add new nodes, or remove nodes until the upgrade is complete.

ansible-inventory -i /usr/bin/tripleo-ansible-inventory --list --yaml > /tmp/ansible_inventory.yaml

3) Now apply configuration to the ansible by changing ansible.cfg
# increased forks for better performance
forks = 100

# implement fact caching and a smaller subset of facts gathered for improved performance
gathering = smart
gather_subset = !hardware,!facter,!ohai
fact_caching_connection = /tmp/ansible_fact_cache
fact_caching = jsonfile
# expire the fact cache after 2 hours
fact_caching_timeout = 7200

# work around ensuring the right modules are found
library = /usr/share/ansible-modules

# set the inventory to the extracted inventory location
inventory = /tmp/ansible_inventory.yaml

ssh_extra_args = -o Compression=no -o TCPKeepAlive=yes -o VerifyHostKeyDNS=no -o ForwardX11=no -o ForwardAgent=yes -T

4) Inform ansible where the ansible.cfg file to use is:

export ANSIBLE_CONFIG=<path to ansible.cfg from config-download>

5) Run the fast_forward_upgrade_playbook.yaml playbook present inside of the generated config:
  ansible-playbook -b fast_forward_upgrade_playbook.yaml

At this steps your databases will get updated and  openstack services are stopped and disabled.

6) Run the upgrade for the Controllers
  ansible-playbook --skip-tags=validation -b upgrade_steps_playbook.yaml deploy_steps_playbook.yaml post_upgrade_steps_playbook.yaml  --limit Controller

7) Run the upgrade for one compute. Note that Compute[0] will take first node from the inventory which might not be your compute-0. If you want to update specific node either find out its index number or specify the hostname.
  ansible-playbook --skip-tags=validation -b upgrade_steps_playbook.yaml deploy_steps_playbook.yaml post_upgrade_steps_playbook.yaml  --limit Compute[0]

Consider this as last point from which you can return, if this took too long I suggest reverting back, otherwise continue with the rest of the computes.

8) Upgrade Computes in batches. The ansible host pattern [1:10] will take second to eleventh node in the group from the inventory. You can verify the hosts which will be targeted by running:
ansible -m ping --list-hosts Compute[1:10]
  ansible-playbook --skip-tags=validation -b upgrade_steps_playbook.yaml deploy_steps_playbook.yaml post_upgrade_steps_playbook.yaml  --limit Compute[1:10]

9) Revert ansible.cfg back to its original state, unset the ANSIBLE_CONFIG environment variable and continue with converge

Comment 10 mbollo 2019-10-17 15:25:25 UTC
The package is ready.

Comment 14 Alex McLeod 2019-10-31 11:32:42 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Comment 18 errata-xmlrpc 2019-11-07 14:02:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 19 Jesse Pretorius 2019-11-14 10:41:00 UTC
*** Bug 1761395 has been marked as a duplicate of this bug. ***

Comment 20 Lukas Bezdicka 2019-12-03 10:03:36 UTC
*** Bug 1775869 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.