Bug 1857451

Summary: Ansible forks value should have an upper limit and Current Calculation needs to change
Product: Red Hat OpenStack Reporter: Sai Sindhur Malleni <smalleni>
Component: python-tripleoclientAssignee: OSP Team <rhos-maint>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: ahyder, aschultz, emacchi, hbrock, jslagle, mburns, ramishra, slinaber
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: python-tripleoclient-12.4.1-2.20210214005004.106048a.el8ost.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-15 07:08:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sai Sindhur Malleni 2020-07-15 21:11:03 UTC
Description of problem:
Currently by default 10*CPU_COUNT forks are configured in the ansbile.cfg. This leads to cases where on a 64 core undercloud we have forks set to 640 and when the user doesn't use --limit option in ansible and the playbook ends up running on all the existing nodes (let's say we really have a large number of nodes, 600+), we see ansible consuming 230G+ of RSS memory. 

Link to ansible memory usage with so many forks: https://snapshot.raintank.io/dashboard/snapshot/zKg6pZnP1m6zHqHYDQdpXwRRS01zF4fc?orgId=2

The peak is when ansible run against 630 overcloud nodes

We need to,
1. Change the default calculation we currently have to reduce the number of forks by default
2. Place an upper limit on the number of forks, irrespective of the number of cores on the undercloud

Version-Release number of selected component (if applicable):

How reproducible:
100% at large scale

Steps to Reproduce:
1. Have enough overcloud nodes and an undercloud node with a lot of CPUs
2. Run the config-download ansible playbooks with default ansible.cfg

Actual results:
Ansible consumes almost all the memory on the undercloud

Expected results:
Ansible shouldn't consume so many resources

Additional info:

Comment 13 errata-xmlrpc 2021-09-15 07:08:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.