Description of problem: Overcloud deploy times out with the following error: 2022-04-22 16:02:06.325490 | 09e0b99e-1910-4da1-b5c1-1f2cfa86c02d | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | controller-0 2022-04-22 16:02:06.356468 | 525400f5-334d-2426-9356-00000001528b | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_3 2022-04-22 17:20:38.444 119459 INFO tripleoclient.utils.utils [-] Temporary directory [ /tmp/tripleok10jk27j ] cleaned up[00m 2022-04-22 17:20:38.446 119459 ERROR tripleoclient.utils.utils [-] Ansible execution failed. playbook: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/deploy_steps_playbook.yaml, Run Status: timeout, Return Code: 254, To rerun the failed command manually execute the following script: /home/stack/overcloud-deploy/overcloud/config-download/ansible-playbook-command.sh[00m 2022-04-22 17:20:38.447 119459 WARNING tripleoclient.utils.safe_write [-] The output file /home/stack/overcloud-deploy/overcloud/overcloud-deployment_status.yaml will be overriden: RuntimeError: Ansible execution failed. playbook: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/deploy_steps_playbook.yaml, Run Status: timeout, Return Code: 254, To rerun the failed command manually execute the following script: /home/stack/overcloud-deploy/overcloud/config-download/ansible-playbook-command.sh[00m Overcloud Endpoint: https://10.0.0.142:13000 Overcloud Horizon Dashboard URL: https://10.0.0.142:443/dashboard Overcloud rc file: /home/stack/overcloud-deploy/overcloud/overcloudrc and /home/stack/overcloudrc Overcloud Deployed with error 2022-04-22 17:20:40.661 119459 INFO tripleoclient.v1.overcloud_deploy.DeployOvercloud [-] Stopping ephemeral heat.[00m 2022-04-22 17:20:40.894 119459 INFO tripleoclient.heat_launcher [-] Killing pod: ephemeral-heat[00m c841a8ff019437f0940d1b69aea0828f09fad1d0d39f3a5d63ac3b475ed4a795 2022-04-22 17:20:41.107 119459 INFO tripleoclient.heat_launcher [-] Killed pod: ephemeral-heat[00m 2022-04-22 17:20:41.359 119459 INFO tripleoclient.heat_launcher [-] Starting back up of heat db[00m 2022-04-22 17:20:55.559 119459 INFO tripleoclient.heat_launcher [-] Created tarfile /home/stack/overcloud-deploy/overcloud/heat-launcher/heat-db.sql-1650641987.7266178.tar.bzip2[00m 2022-04-22 17:20:55.559 119459 INFO tripleoclient.heat_launcher [-] Deleting /home/stack/overcloud-deploy/overcloud/heat-launcher/heat-db.sql[00m 2022-04-22 17:20:56.322 119459 INFO tripleoclient.heat_launcher [-] Removing pod: ephemeral-heat[00m c841a8ff019437f0940d1b69aea0828f09fad1d0d39f3a5d63ac3b475ed4a795 2022-04-22 17:20:57.565 119459 INFO tripleoclient.heat_launcher [-] Created tarfile /home/stack/overcloud-deploy/overcloud/heat-launcher/log/heat-1650641987.7266178.log-1650641987.7266178.tar.bzip2[00m 2022-04-22 17:20:57.565 119459 INFO tripleoclient.heat_launcher [-] Deleting /home/stack/overcloud-deploy/overcloud/heat-launcher/log/heat-1650641987.7266178.log[00m 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud [-] Exception occured while running the command: RuntimeError: Ansible execution failed. playbook: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/deploy_steps_playbook.yaml, Run Status: timeout, Return Code: 254, To rerun the failed command manually execute the following script: /home/stack/overcloud-deploy/overcloud/config-download/ansible-playbook-command.sh 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud Traceback (most recent call last): 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud File "/usr/lib/python3.9/site-packages/tripleoclient/command.py", line 34, in run 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud super(Command, self).run(parsed_args) 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud File "/usr/lib/python3.9/site-packages/osc_lib/command/command.py", line 39, in run 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud return super(Command, self).run(parsed_args) 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud File "/usr/lib/python3.9/site-packages/cliff/command.py", line 186, in run 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud return_code = self.take_action(parsed_args) or 0 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud File "/usr/lib/python3.9/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1362, in take_action 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud deployment.set_deployment_status( 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud self.force_reraise() 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud raise self.value 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud File "/usr/lib/python3.9/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1334, in take_action 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud deployment.config_download( 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud File "/usr/lib/python3.9/site-packages/tripleoclient/workflows/deployment.py", line 407, in config_download 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud utils.run_ansible_playbook( 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud File "/usr/lib/python3.9/site-packages/tripleoclient/utils.py", line 733, in run_ansible_playbook 2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud raise RuntimeError(err_msg) Version-Release number of selected component (if applicable): RHOS-17.0-RHEL-9-20220414.n.1 How reproducible: Every time a job with composable roles is run. Steps to Reproduce: 1. Execute one of the composable roles job. Ex: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/df/view/deployment/job/DFG-df-deployment-17.0-virthost-3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-geneve-remote_registry-RHELOSP-31897/ 2. 3. Actual results: Overcloud deploy fails with error above Expected results: Overcloud successfully deploys Additional info:
Log to a failing job: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/df/view/deployment/job/DFG-df-deployment-17.0-virthost-3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-geneve-remote_registry-RHELOSP-31897/3/
From: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-df-deployment-17.0-virthost-3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-geneve-remote_registry-RHELOSP-31897/3/undercloud-0/home/stack/overcloud_install.log.gz Deployment starts at 2022-04-22 15:43:36.078: 2022-04-22 15:43:36.078 119459 INFO tripleoclient.utils.utils [-] Running Ansible playbook with timeout 97m: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/deploy_steps_playbook.yaml, Working directory: /home/stack/overcloud-deploy/overcloud/config-download, Playbook directory: /home/stack/overcloud-deploy/overcloud/config-download/overcloud[00m Stuck on this task: 2022-04-22 16:02:06.356468 | 525400f5-334d-2426-9356-00000001528b | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_3 2022-04-22 17:20:38.444 119459 INFO tripleoclient.utils.utils [-] Temporary directory [ /tmp/tripleok10jk27j ] cleaned up[00m 2022-04-22 17:20:38.446 119459 ERROR tripleoclient.utils.utils [-] Ansible execution failed. playbook: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/deploy_steps_playbook.yaml, Run Status: timeout, Return Code: 254, To rerun the failed command manually execute the following script: /home/stack/overcloud-deploy/overcloud/config-download/ansible-playbook-command.sh[00m
Looks like the db syncs at step3 can't connect to the db from the bootstrap node (controller-2): http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-df-deployment-17.0-virthost-3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-geneve-remote_registry-RHELOSP-31897/3/controller-2/var/log/containers/stdouts/nova_api_db_sync.log.gz http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-df-deployment-17.0-virthost-3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-geneve-remote_registry-RHELOSP-31897/3/controller-2/var/log/containers/stdouts/heat_engine_db_sync.log.gz I believe the issue is that the firewall rules to allow port 3306 are not created on the controller nodes that are running HAProxy. There are no rules on controller-2: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-df-deployment-17.0-virthost-3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-geneve-remote_registry-RHELOSP-31897/3/controller-2/var/log/extra/network.txt.gz The rules are however on the database nodes: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-df-deployment-17.0-virthost-3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-geneve-remote_registry-RHELOSP-31897/3/database-0/var/log/extra/network.txt.gz I believe this is a known issue and there are some upstream in progress patches to fix this.
*** This bug has been marked as a duplicate of bug 2074541 ***