Bug 2251191

Summary: Heat stack fails at while scaling from 850-950 nodes with TimeoutError: resources[192].resources.NovaCompute: QueuePool limit of size 5 overflow 64 reached, connection timed out
Product: Red Hat OpenStack Reporter: Asma Syed Hameed <asyedham>
Component: openstack-heatAssignee: Rabi Mishra <ramishra>
Status: CLOSED CURRENTRELEASE QA Contact: David Rosenfeld <drosenfe>
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: bshephar, jraju, lsvaty, mori, pweeks, ralfieri
Target Milestone: ---Keywords: TestBlocker, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-12-16 19:20:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Asma Syed Hameed 2023-11-23 11:28:31 UTC
Description of problem:

During scaling from 850-950 virtual computes with baremetal undercloud, 3 baremetal controllers and 7 baremetal ceph nodes the stack creation fails

83619:2023-11-23 09:23:34.833 23 INFO heat.engine.stack [req-03d92752-90b6-46c8-954f-69258a45ca36 admin admin - - -] Stack UPDATE FAILED (overcloud-ComputeR4-lcuju6fhmd2p): Resource CREATE failed: TimeoutError: resources[192].resources.NovaCompute: QueuePool limit of size 5 overflow 64 reached, connection timed out, timeout 30 (Background on this error at: http: //sqlalche.me/e/13/3o7r)                                                                                                    
83868:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource Traceback (most recent call last):
83869:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource   File "/usr/lib/python3.9/site-packages/heat/engine/resource.py", line 916, in _action_recorder                                                                                         
83870:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource     yield
83871:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource   File "/usr/lib/python3.9/site-packages/heat/engine/resource.py", line 1028, in _do_action                                                                                              
83872:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource     yield from self.action_handler_task(action, args=handler_args)                                                                                                                       
83873:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource   File "/usr/lib/python3.9/site-packages/heat/engine/resource.py", line 978, in action_handler_task                                                                                      
83874:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource     done = check(handler_data)
83875:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource   File "/usr/lib/python3.9/site-packages/heat/engine/resources/openstack/heat/resource_group.py", line 433, in check_create_complete                                                     
83876:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource     if not checker.step():
83877:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource   File "/usr/lib/python3.9/site-packages/heat/engine/scheduler.py", line 210, in step                                                                                                    
83878:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource     poll_period = next(self._runner)
83879:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource   File "/usr/lib/python3.9/site-packages/heat/engine/resources/openstack/heat/resource_group.py", line 441, in _run_to_completion                                                        
83880:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource     while not super(ResourceGroup,
83881:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource   File "/usr/lib/python3.9/site-packages/heat/engine/resources/stack_resource.py", line 545, in check_update_complete                                                                    
83882:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource     return self._check_status_complete(target_action,
83883:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource   File "/usr/lib/python3.9/site-packages/heat/engine/resources/stack_resource.py", line 450, in _check_status_complete                                                                   
83884:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource     raise exception.ResourceFailure(status_reason, self,
83885:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource heat.common.exception.ResourceFailure: resources.ComputeR4: Resource CREATE failed: TimeoutError: resources[192].resources.NovaCompute: QueuePool limit of size 5 overflow 64 reached, connection timed out, timeout 30 (Background on this error at: http: //sqlalche.me/e/13/3o7r)
83886:2023-11-23 09:23:36.945 9 ERROR heat.engine.resource
84072:2023-11-23 09:23:38.723 9 INFO heat.engine.stack [req-03d92752-90b6-46c8-954f-69258a45ca36 admin admin - - -] Stack CREATE FAILED (overcloud): Resource CREATE failed: resources.ComputeR4: Resource CREATE failed: TimeoutError: resources[192].resources.NovaCompute: QueuePool limit of size 5 overflow 64 reached, connection timed out, timeout 30 (Background on this error at: http: //sqlalche.me/e/13/3o7r)                                                                               


Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20230802.n.1

How reproducible:
100%

Steps to Reproduce:
1. Deploy 800+ nodes
2. Scale out to 950 nodes


Actual results:
Stack creation failed

Expected results:
Stack creation successful