Description of problem:
Currently by default 10*CPU_COUNT forks are configured in the ansbile.cfg. This leads to cases where on a 64 core undercloud we have forks set to 640 and when the user doesn't use --limit option in ansible and the playbook ends up running on all the existing nodes (let's say we really have a large number of nodes, 600+), we see ansible consuming 230G+ of RSS memory.
Link to ansible memory usage with so many forks: https://snapshot.raintank.io/dashboard/snapshot/zKg6pZnP1m6zHqHYDQdpXwRRS01zF4fc?orgId=2
The peak is when ansible run against 630 overcloud nodes
We need to,
1. Change the default calculation we currently have to reduce the number of forks by default
2. Place an upper limit on the number of forks, irrespective of the number of cores on the undercloud
Version-Release number of selected component (if applicable):
100% at large scale
Steps to Reproduce:
1. Have enough overcloud nodes and an undercloud node with a lot of CPUs
2. Run the config-download ansible playbooks with default ansible.cfg
Ansible consumes almost all the memory on the undercloud
Ansible shouldn't consume so many resources
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.