Description of problem: Currently by default 10*CPU_COUNT forks are configured in the ansbile.cfg. This leads to cases where on a 64 core undercloud we have forks set to 640 and when the user doesn't use --limit option in ansible and the playbook ends up running on all the existing nodes (let's say we really have a large number of nodes, 600+), we see ansible consuming 230G+ of RSS memory. Link to ansible memory usage with so many forks: https://snapshot.raintank.io/dashboard/snapshot/zKg6pZnP1m6zHqHYDQdpXwRRS01zF4fc?orgId=2 The peak is when ansible run against 630 overcloud nodes We need to, 1. Change the default calculation we currently have to reduce the number of forks by default 2. Place an upper limit on the number of forks, irrespective of the number of cores on the undercloud Version-Release number of selected component (if applicable): 16.1 How reproducible: 100% at large scale Steps to Reproduce: 1. Have enough overcloud nodes and an undercloud node with a lot of CPUs 2. Run the config-download ansible playbooks with default ansible.cfg 3. Actual results: Ansible consumes almost all the memory on the undercloud Expected results: Ansible shouldn't consume so many resources Additional info:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:3483