Bug 1794012
Summary: | Ansible consumes a large amount of CPU and RAM resources when running the update /etc/hosts task on a scale deployment | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Sai Sindhur Malleni <smalleni> |
Component: | tripleo-ansible | Assignee: | Luke Short <lshort> |
Status: | CLOSED ERRATA | QA Contact: | Sasha Smolyak <ssmolyak> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 16.0 (Train) | CC: | aschultz, bdobreli, drosenfe, jschluet, jtanner |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | tripleo-ansible-0.4.2-0.20200207140442.b750574.el8ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-03 09:45:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sai Sindhur Malleni
2020-01-22 13:56:04 UTC
*** Bug 1794014 has been marked as a duplicate of this bug. *** *** Bug 1794013 has been marked as a duplicate of this bug. *** Wrong link in bug description Here is the link to CPU consumption https://snapshot.raintank.io/dashboard/snapshot/g3Ije4s6TKilkNE063ykcojJsOqhSHkz?orgId=2 The spike of 5000%=50 cores is during the update /etc/hosts task. To clarify, the forkcount for ansible-playbook is set to 50? This issue was not necessary with the fork count (although it does default to 50 and we have increased it to 500 for scale testing). Ansible would process a large Jinja template of all of the hosts in the stack and it would do this processing on every single host. We actually only need to render that template once. I patched this upstream and now have it available in a build for QE to test with. *** Bug 1792425 has been marked as a duplicate of this bug. *** Performed two tests: - 1cont, 1comp, 3ceph test. In this case the Update /etc/hosts task from /var/lib/mistral/overcloud/ansible.log took 499ms: 2020-02-05 15:16:34,265 p=734 u=mistral | TASK [tripleo-hosts-entries : Update /etc/hosts] ******************************* 2020-02-05 15:16:34,265 p=734 u=mistral | Wednesday 05 February 2020 15:16:34 +0000 (0:00:00.469) 0:02:13.105 **** 2020-02-05 15:16:35,324 p=734 u=mistral | changed: [ceph-0] 2020-02-05 15:16:35,383 p=734 u=mistral | changed: [ceph-1] 2020-02-05 15:16:35,401 p=734 u=mistral | changed: [ceph-2] 2020-02-05 15:16:35,618 p=734 u=mistral | changed: [compute-0] 2020-02-05 15:16:35,764 p=734 u=mistral | changed: [controller-0] - a 10 node scaling test. In this case the Update /etc/hosts task from /var/lib/mistral/overcloud/ansible.log took 16.152s. 2020-02-05 21:48:26,382 p=19469 u=mistral | TASK [tripleo-hosts-entries : Update /etc/hosts] ******************************* 2020-02-05 21:48:26,382 p=19469 u=mistral | Wednesday 05 February 2020 21:48:26 +0000 (0:00:01.596) 0:05:48.368 **** 2020-02-05 21:48:36,027 p=19469 u=mistral | changed: [compute-6] 2020-02-05 21:48:37,212 p=19469 u=mistral | changed: [ceph-0] 2020-02-05 21:48:37,848 p=19469 u=mistral | changed: [compute-10] 2020-02-05 21:48:38,146 p=19469 u=mistral | changed: [compute-0] 2020-02-05 21:48:38,291 p=19469 u=mistral | changed: [ceph-2] 2020-02-05 21:48:38,589 p=19469 u=mistral | changed: [compute-1] 2020-02-05 21:48:39,483 p=19469 u=mistral | changed: [ceph-1] 2020-02-05 21:48:39,584 p=19469 u=mistral | changed: [compute-11] 2020-02-05 21:48:39,829 p=19469 u=mistral | changed: [compute-2] 2020-02-05 21:48:39,836 p=19469 u=mistral | changed: [compute-3] 2020-02-05 21:48:40,439 p=19469 u=mistral | changed: [compute-4] 2020-02-05 21:48:40,751 p=19469 u=mistral | changed: [controller-1] 2020-02-05 21:48:41,010 p=19469 u=mistral | changed: [compute-7] 2020-02-05 21:48:41,015 p=19469 u=mistral | changed: [compute-8] 2020-02-05 21:48:41,060 p=19469 u=mistral | changed: [compute-9] 2020-02-05 21:48:41,251 p=19469 u=mistral | changed: [compute-5] 2020-02-05 21:48:41,767 p=19469 u=mistral | changed: [controller-2] 2020-02-05 21:48:42,534 p=19469 u=mistral | changed: [controller-0] In both cases the update took much less than a minute as specified in Comment 6. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0655 |