Bug 1746537
Summary: | Deployment with config-download too slow when compared to non config-download deployment | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Sai Sindhur Malleni <smalleni> |
Component: | openstack-tripleo-common | Assignee: | Emilien Macchi <emacchi> |
Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 13.0 (Queens) | CC: | bfournie, dbecker, emacchi, jslagle, kecarter, marjones, mburns, morazi, slinaber, uemit.seren |
Target Milestone: | async | Keywords: | Triaged, ZStream |
Target Release: | 13.0 (Queens) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-common-8.6.8-17.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-07 14:02:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sai Sindhur Malleni
2019-08-28 17:31:31 UTC
ansible.cfg laid down [defaults] roles_path = /etc/ansible/roles:/usr/share/ansible/roles retry_files_enabled = False log_path = /var/lib/mistral/7d13efac-08c8-4f9c-8d30-37a8bb89c0a8/ansible.log forks = 25 timeout = 30 gather_timeout = 30 [inventory] [privilege_escalation] [paramiko_connection] [ssh_connection] ssh_args = -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ControlMaster=auto -o ControlPersist=30m -o ServerAliveInterval=5 -o ServerAliveCountMax=5 control_path_dir = /var/lib/mistral/7d13efac-08c8-4f9c-8d30-37a8bb89c0a8/ansible-ssh retries = 8 pipelining = True [persistent_connection] [accelerate] [selinux] [colors] [diff] The biggest offender seems to be the Host prep steps play. Due to the number of roles in this deployment, there is a lot of task skipping going on as we go through the task list role by role. The first task only applies to a single role, the second task only applies to a single role, etc. All other roles are skipped for each task. It seems to be much quicker to instead use a separate play per role instead of a separate task per role. You can limit the play to just a single role, then Ansible does not need to compute what will be skipped. I limited the testing to 2 roles to speed things up a bit. Using separate plays resulted in the total run time going from 23m to 4m. (Note that using strategy:free had little effect). # --tags host_prep_steps --limit Controller:r630Compute real 23m31.119s user 23m25.239s sys 7m40.223s # --tags host_prep_steps --limit Controller:r630Compute, strategy:free real 23m48.704s user 23m56.315s sys 7m45.093s # --tags host_prep_steps --limit Controller:r630Compute, separate plays real 4m32.785s user 4m16.400s sys 1m42.193s # --tags host_prep_steps --limit Controller:r630Compute, separate plays, strategy:free real 4m15.653s user 4m27.188s sys 1m43.599s # --tags host_prep_steps separate plays real 12m23.470s user 12m33.849s sys 6m40.841s We'd need to patch tripleo-heat-templates and backport to 13 to get this benefit. Another thing, the tasks from HostPrepDeployment are actually getting run twice. Once as a standalone deployment during pre_deploy_steps, and also during host_prep_steps. We can remove the standalone deployment. This patch https://review.opendev.org/#/c/623098 could be backported to 13. This should save a few minutes. We could also look at backporting the patch where we parallelized pre and post deployments: https://review.opendev.org/#/c/574474/ https://review.opendev.org/#/c/574473/ This would likely also save some time. I've posted the following patches for review: Remove HostPrepConfig: https://review.opendev.org/679146 (rocky) https://review.opendev.org/679147 (queens) Parallelize pre/post deployments: https://review.opendev.org/679151 (queens) https://review.opendev.org/679152 (queens) Use separate plays for Host prep steps: https://review.opendev.org/679149 (master) This patch will fix the issue with tripleo-ssh-known-hosts: https://review.opendev.org/#/c/680516/ I've done most of backports of patches mentioned in this BZ. Except for https://review.opendev.org/#/c/680516/ for now, but Kevin Carter is looking at it. I was building tripleo-heat-templates for another bug and picked up the fixes here so the THT version has been added to FixedInVersion. Not moving to MODIFIED yet because of Comment 10. https://review.opendev.org/#/c/680516/ has now been backported. Moving to MODIFIED at this time. Is there an ETA for this patch ? We have a 200 node OpenStack cloud and are considering to switch to config-download. I guess without this patch this is going to be painfully slow. Also are there any plans to backport the --config-donwload-only flag for the CLI ? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3794 |