Bug 1749406

Summary: FFWD ansible is slow due to gather facts
Product: Red Hat OpenStack Reporter: Lukas Bezdicka <lbezdick>
Component: openstack-tripleo-commonAssignee: mathieu bultel <mbultel>
Status: CLOSED ERRATA QA Contact: Ronnie Rasouli <rrasouli>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: augol, ccamacho, fsoppels, jfrancoa, johfulto, jpretori, jschluet, ltamagno, mbollo, mbracho, mbultel, mburns, morazi, nchandek, owalsh, sclewis, shdunne, slinaber
Target Milestone: z9Keywords: ABIAssurance, Reopened, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Triaged
Fixed In Version: openstack-tripleo-common-8.7.1-2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1761395 (view as bug list) Environment:
Last Closed: 2019-12-05 10:08:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1761395    

Description Lukas Bezdicka 2019-09-05 14:36:12 UTC
On customer environment it was noticed that ansible playbook was extremely slow due to fact gathering

Comment 3 Lukas Bezdicka 2019-09-18 12:03:15 UTC
*** Bug 1728215 has been marked as a duplicate of this bug. ***

Comment 4 Jesse Pretorius 2019-09-18 14:49:19 UTC
FYI, implementing fact gathering in my testing has improved the performance of the update/upgrade/ffwd-upgrade playbooks. As such I've proposed https://review.opendev.org/682855 upstream. If it's accepted and merged, it may be a good candidate for backporting. I'm working out some other OSP10->OSP13 ffwd-upgrade issues in my environment - once they're resolved, I'll post some timings.

In order to validate this, I implemented the following on the undercloud host which has the same effect and may be useful as a workaround if the patch is not suitable as a backport:

(undercloud) [stack@undercloud-0 ~]$ sudo mv /etc/ansible/ansible.cfg /etc/ansible/ansible.org.cfg
(undercloud) [stack@undercloud-0 ~]$ sudo tee /etc/ansible/ansible.cfg <<EOF
[defaults]
roles_path    = /etc/ansible/roles:/usr/share/ansible/roles

# improve fact gathering performance
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /var/tmp/ansible_fact_cache

# two hours timeout
fact_caching_timeout = 7200

[inventory]

[privilege_escalation]

[paramiko_connection]

[ssh_connection]

[persistent_connection]

[accelerate]

[selinux]

[colors]

[diff]

EOF

Comment 9 Lukas Bezdicka 2019-10-17 14:25:53 UTC
For workaround documentation:


Starting with running openstack overcloud ffwd-upgrade prepare .... :

1) When the prepare finishes (I still suggest running prepare after you restore Undercloud) run the config download which will save the config to tripleo-config-XXXX directory:
    openstack overcloud config download

2) To speed up running each playbook, extract the inventory and store it in a temporary location. It is important that the environment does not change between this extraction and the completion of the process. ie Do not add new nodes, or remove nodes until the upgrade is complete.

ansible-inventory -i /usr/bin/tripleo-ansible-inventory --list --yaml > /tmp/ansible_inventory.yaml

3) Now apply configuration to the ansible by changing ansible.cfg
[defaults]
# increased forks for better performance
forks = 100

# implement fact caching and a smaller subset of facts gathered for improved performance
gathering = smart
gather_subset = !hardware,!facter,!ohai
fact_caching_connection = /tmp/ansible_fact_cache
fact_caching = jsonfile
# expire the fact cache after 2 hours
fact_caching_timeout = 7200

# work around ensuring the right modules are found
library = /usr/share/ansible-modules

# set the inventory to the extracted inventory location
inventory = /tmp/ansible_inventory.yaml

[ssh_connection]
ssh_extra_args = -o Compression=no -o TCPKeepAlive=yes -o VerifyHostKeyDNS=no -o ForwardX11=no -o ForwardAgent=yes -T

4) Inform ansible where the ansible.cfg file to use is:

export ANSIBLE_CONFIG=<path to ansible.cfg from config-download>

5) Run the fast_forward_upgrade_playbook.yaml playbook present inside of the generated config:
  ansible-playbook -b fast_forward_upgrade_playbook.yaml

At this steps your databases will get updated and  openstack services are stopped and disabled.

6) Run the upgrade for the Controllers
  ansible-playbook --skip-tags=validation -b upgrade_steps_playbook.yaml deploy_steps_playbook.yaml post_upgrade_steps_playbook.yaml  --limit Controller

7) Run the upgrade for one compute. Note that Compute[0] will take first node from the inventory which might not be your compute-0. If you want to update specific node either find out its index number or specify the hostname.
  ansible-playbook --skip-tags=validation -b upgrade_steps_playbook.yaml deploy_steps_playbook.yaml post_upgrade_steps_playbook.yaml  --limit Compute[0]

Consider this as last point from which you can return, if this took too long I suggest reverting back, otherwise continue with the rest of the computes.

8) Upgrade Computes in batches. The ansible host pattern [1:10] will take second to eleventh node in the group from the inventory. You can verify the hosts which will be targeted by running:
ansible -m ping --list-hosts Compute[1:10]
  ansible-playbook --skip-tags=validation -b upgrade_steps_playbook.yaml deploy_steps_playbook.yaml post_upgrade_steps_playbook.yaml  --limit Compute[1:10]

9) Revert ansible.cfg back to its original state, unset the ANSIBLE_CONFIG environment variable and continue with converge

Comment 10 mbollo 2019-10-17 15:25:25 UTC
The package is ready.

Comment 14 Alex McLeod 2019-10-31 11:32:42 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Comment 18 errata-xmlrpc 2019-11-07 14:02:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3794

Comment 19 Jesse Pretorius 2019-11-14 10:41:00 UTC
*** Bug 1761395 has been marked as a duplicate of this bug. ***

Comment 20 Lukas Bezdicka 2019-12-03 10:03:36 UTC
*** Bug 1775869 has been marked as a duplicate of this bug. ***