Bug 1421052

Summary: Adjusting OSP 10 Director components settings for scaling 100+ nodes
Product: Red Hat OpenStack Reporter: Pablo Caruana <pcaruana>
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED ERRATA QA Contact: Amit Ugol <augol>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: aschultz, dbecker, jbuchta, jtaleric, mburns, mcornea, morazi, pablo.iranzo, pcaruana, racedoro, rhel-osp-director-maint, rrubins, smalleni, vcojot
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-01 00:41:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pablo Caruana 2017-02-10 09:06:15 UTC
Currently there is no official documentation for providing the hints tweaks for reaching deployments with 3 controllers + 97 or more computes.

Some of the bottleneck detected were sorted by increasing the  rpc_response_timeout  to at least 3600 in heat, ironic and nova configuration, enabling the memcached cached, increasing the heat engine rpc workers (48) as the default one where too low (2) and in this way it was able to  enlarge cloud to 80-90 compute node then reaching some haproxy timeouts at the controllers above the default ones.
With all those information we could create a basic solution articles explaining some of the tunables that can be used, but still, As more details can appear this should be taken as a whole, and be included in the standard documentation as it's not uncommon to have customers coming and asking about scaling the platform (specially cloud providers/ partners having their own products at scale and thus, we expect this to become even more common in the future.

Comment 2 Joe Talerico 2017-02-10 17:02:55 UTC
Can you share what is failing here? 

Any sort of "tweaks" needed should be pushed back into the product vs having some sort of one-off documentation somewhere.

Comment 18 Sai Sindhur Malleni 2019-05-01 00:41:46 UTC
There is a general OSP 10 scale guide now at https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/recommendations_for_large_deployments/index

Closing this BZ, please reopen if needed.