Description of problem: wait_for_message_queue() in heat_launcher.py sets the aggressive timeout. There is an environment that it consumed 30 seconds when ephemeral-heat launches its heat-engine workers. ~~~ 2023-06-05 13:51:06.526 1 DEBUG heat-api-noauth [-] ******************************************************************************** log_opt_values /usr/lib/python3.9/site-packages/oslo_config/cfg.py:2593 2023-06-05 13:51:06.527 1 DEBUG heat-api-noauth [-] Configuration options gathered from: log_opt_values /usr/lib/python3.9/site-packages/oslo_config/cfg.py:2594 2023-06-05 13:51:06.527 1 DEBUG heat-api-noauth [-] command line args: ['--config-file', '/etc/heat/heat.conf'] log_opt_values /usr/lib/python3.9/site-packages/oslo_config/cfg.py:2595 2023-06-05 13:51:06.527 1 DEBUG heat-api-noauth [-] config files: ['/etc/heat/heat.conf'] log_opt_values /usr/lib/python3.9/site-packages/oslo_config/cfg.py:2596 2023-06-05 13:51:06.527 1 DEBUG heat-api-noauth [-] ================================================================================ log_opt_values /usr/lib/python3.9/site-packages/oslo_config/cfg.py:2598 : 2023-06-05 13:51:06.558 1 INFO heat-api [-] Starting Heat REST API on 0.0.0.0:8006 2023-06-05 13:51:06.558 1 INFO heat.common.wsgi [-] Starting single process server 2023-06-05 13:51:06.559 1 INFO eventlet.wsgi.server [-] (1) wsgi starting up on http://0.0.0.0:8006 2023-06-05 13:51:06.599 1 WARNING heat.common.config [-] stack_user_domain_id or stack_user_domain_name : 2023-06-05 13:51:07.110 1 DEBUG oslo_concurrency.lockutils [-] Acquired lock "singleton_lock" lock /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:266 2023-06-05 13:51:07.110 1 DEBUG oslo_concurrency.lockutils [-] Releasing lock "singleton_lock" lock /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:282 2023-06-05 13:51:07.110 1 INFO oslo_service.service [-] Starting 16 workers : 2023-06-05 13:51:07.195 1 DEBUG oslo_service.service [-] ******************************************************************************** log_opt_values /usr/lib/python3.9/site-packages/oslo_config/cfg.py:2617 2023-06-05 13:51:37.152 2 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID1. 2023-06-05 13:51:37.158 5 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID2. 2023-06-05 13:51:37.163 6 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID3. 2023-06-05 13:51:37.166 7 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID4. 2023-06-05 13:51:37.168 3 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID5. 2023-06-05 13:51:37.169 8 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID6. 2023-06-05 13:51:37.169 9 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID7. 2023-06-05 13:51:37.173 10 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID8. 2023-06-05 13:51:37.173 11 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID9. 2023-06-05 13:51:37.179 4 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID10. 2023-06-05 13:51:37.181 13 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID11. 2023-06-05 13:51:37.182 14 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID12. 2023-06-05 13:51:37.183 12 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID13. 2023-06-05 13:51:37.184 15 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID14. 2023-06-05 13:51:37.187 16 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID15. 2023-06-05 13:51:37.188 17 INFO heat.engine.worker [-] Starting engine_worker (1.4) in engine UUID16. ~~~ However, wait_for_message_queue() assumes that ephemeral-heat will launches and create a queue within 10 seconds. ~~~ @retry(retry=retry_if_exception_type(HeatPodMessageQueueException), reraise=True, stop=(stop_after_delay(10) | stop_after_attempt(10)), wait=wait_fixed(0.5)) def wait_for_message_queue(self): queue_name = 'engine.' + EPHEMERAL_HEAT_POD_NAME output = subprocess.check_output([ 'sudo', 'podman', 'exec', 'rabbitmq', 'rabbitmqctl', 'list_queues']) if str(output).count(queue_name) < 1: msg = "Message queue for ephemeral heat not created in time." raise HeatPodMessageQueueException(msg) ~~~ Also, wait time, 0.5 seconds seems to be aggressive. I think we should increase the amount of the value for wait_fixed() and stop_after_delay() or it should be configurable. Version-Release number of selected component (if applicable): OSP17.0 How reproducible: Everytime when they run overcloud without the rabbitmq queue. Steps to Reproduce: 1. Create a undrecloud and run overcloud deploy. 2. 3. Actual results: overcloud deploy failed. Expected results: overcloud deploy succeeed. Additional info:
I started an upstream patch to change the wait to 60s with a retry every 1s. https://review.opendev.org/c/openstack/python-tripleoclient/+/885580
Hi James, Thank you for your work on this bugzilla and upstream. From support side, the same inquiry may be raised from many customers once we ship OSP17.1. Could you please add the change into OSP17.1 GA as an exception? Best regards, Keigo Noha
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHOSP 17.1.4 (openstack-tripleo-common and python-tripleoclient) security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:9990