Bug 2078579 - OSP17 Timeout during overcloud deploy when using composable roles
Summary: OSP17 Timeout during overcloud deploy when using composable roles
Keywords:
Status: CLOSED DUPLICATE of bug 2074541
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: James Slagle
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-25 17:17 UTC by David Rosenfeld
Modified: 2022-05-30 08:39 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-30 08:39:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1961799 0 None None None 2022-04-25 23:29:48 UTC
OpenStack gerrit 831547 0 None NEW tripleo_firewall: Allow injecting frontend rules 2022-04-25 23:29:48 UTC
OpenStack gerrit 831549 0 None NEW Define frontend firewall rules separately 2022-04-25 22:10:03 UTC
Red Hat Issue Tracker OSP-14864 0 None None None 2022-04-25 17:32:42 UTC

Description David Rosenfeld 2022-04-25 17:17:12 UTC
Description of problem: Overcloud deploy times out with the following error:

2022-04-22 16:02:06.325490 | 09e0b99e-1910-4da1-b5c1-1f2cfa86c02d |   INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | controller-0
2022-04-22 16:02:06.356468 | 525400f5-334d-2426-9356-00000001528b |       TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_3
2022-04-22 17:20:38.444 119459 INFO tripleoclient.utils.utils [-] Temporary directory [ /tmp/tripleok10jk27j ] cleaned up[00m
2022-04-22 17:20:38.446 119459 ERROR tripleoclient.utils.utils [-] Ansible execution failed. playbook: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/deploy_steps_playbook.yaml, Run Status: timeout, Return Code: 254, To rerun the failed command manually execute the following script: /home/stack/overcloud-deploy/overcloud/config-download/ansible-playbook-command.sh[00m
2022-04-22 17:20:38.447 119459 WARNING tripleoclient.utils.safe_write [-] The output file /home/stack/overcloud-deploy/overcloud/overcloud-deployment_status.yaml will be overriden: RuntimeError: Ansible execution failed. playbook: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/deploy_steps_playbook.yaml, Run Status: timeout, Return Code: 254, To rerun the failed command manually execute the following script: /home/stack/overcloud-deploy/overcloud/config-download/ansible-playbook-command.sh[00m
Overcloud Endpoint: https://10.0.0.142:13000
Overcloud Horizon Dashboard URL: https://10.0.0.142:443/dashboard
Overcloud rc file: /home/stack/overcloud-deploy/overcloud/overcloudrc and /home/stack/overcloudrc
Overcloud Deployed with error
2022-04-22 17:20:40.661 119459 INFO tripleoclient.v1.overcloud_deploy.DeployOvercloud [-] Stopping ephemeral heat.[00m
2022-04-22 17:20:40.894 119459 INFO tripleoclient.heat_launcher [-] Killing pod: ephemeral-heat[00m
c841a8ff019437f0940d1b69aea0828f09fad1d0d39f3a5d63ac3b475ed4a795
2022-04-22 17:20:41.107 119459 INFO tripleoclient.heat_launcher [-] Killed pod: ephemeral-heat[00m
2022-04-22 17:20:41.359 119459 INFO tripleoclient.heat_launcher [-] Starting back up of heat db[00m
2022-04-22 17:20:55.559 119459 INFO tripleoclient.heat_launcher [-] Created tarfile /home/stack/overcloud-deploy/overcloud/heat-launcher/heat-db.sql-1650641987.7266178.tar.bzip2[00m
2022-04-22 17:20:55.559 119459 INFO tripleoclient.heat_launcher [-] Deleting /home/stack/overcloud-deploy/overcloud/heat-launcher/heat-db.sql[00m
2022-04-22 17:20:56.322 119459 INFO tripleoclient.heat_launcher [-] Removing pod: ephemeral-heat[00m
c841a8ff019437f0940d1b69aea0828f09fad1d0d39f3a5d63ac3b475ed4a795
2022-04-22 17:20:57.565 119459 INFO tripleoclient.heat_launcher [-] Created tarfile /home/stack/overcloud-deploy/overcloud/heat-launcher/log/heat-1650641987.7266178.log-1650641987.7266178.tar.bzip2[00m
2022-04-22 17:20:57.565 119459 INFO tripleoclient.heat_launcher [-] Deleting /home/stack/overcloud-deploy/overcloud/heat-launcher/log/heat-1650641987.7266178.log[00m
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud [-] Exception occured while running the command: RuntimeError: Ansible execution failed. playbook: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/deploy_steps_playbook.yaml, Run Status: timeout, Return Code: 254, To rerun the failed command manually execute the following script: /home/stack/overcloud-deploy/overcloud/config-download/ansible-playbook-command.sh
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud Traceback (most recent call last):
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud   File "/usr/lib/python3.9/site-packages/tripleoclient/command.py", line 34, in run
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud     super(Command, self).run(parsed_args)
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud   File "/usr/lib/python3.9/site-packages/osc_lib/command/command.py", line 39, in run
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud     return super(Command, self).run(parsed_args)
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud   File "/usr/lib/python3.9/site-packages/cliff/command.py", line 186, in run
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud     return_code = self.take_action(parsed_args) or 0
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud   File "/usr/lib/python3.9/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1362, in take_action
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud     deployment.set_deployment_status(
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud   File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud     self.force_reraise()
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud   File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud     raise self.value
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud   File "/usr/lib/python3.9/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1334, in take_action
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud     deployment.config_download(
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud   File "/usr/lib/python3.9/site-packages/tripleoclient/workflows/deployment.py", line 407, in config_download
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud     utils.run_ansible_playbook(
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud   File "/usr/lib/python3.9/site-packages/tripleoclient/utils.py", line 733, in run_ansible_playbook
2022-04-22 17:21:01.399 119459 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud     raise RuntimeError(err_msg)


Version-Release number of selected component (if applicable): RHOS-17.0-RHEL-9-20220414.n.1


How reproducible: Every time a job with composable roles is run.


Steps to Reproduce:
1. Execute one of the composable roles job. Ex: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/df/view/deployment/job/DFG-df-deployment-17.0-virthost-3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-geneve-remote_registry-RHELOSP-31897/
2.
3.

Actual results: Overcloud deploy fails with error above


Expected results: Overcloud successfully deploys


Additional info:

Comment 2 James Slagle 2022-04-25 21:27:17 UTC
From:

http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-df-deployment-17.0-virthost-3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-geneve-remote_registry-RHELOSP-31897/3/undercloud-0/home/stack/overcloud_install.log.gz

Deployment starts at 2022-04-22 15:43:36.078:

2022-04-22 15:43:36.078 119459 INFO tripleoclient.utils.utils [-] Running Ansible playbook with timeout 97m: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/deploy_steps_playbook.yaml, Working directory: /home/stack/overcloud-deploy/overcloud/config-download, Playbook directory: /home/stack/overcloud-deploy/overcloud/config-download/overcloud[00m

Stuck on this task:

2022-04-22 16:02:06.356468 | 525400f5-334d-2426-9356-00000001528b |       TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_3
2022-04-22 17:20:38.444 119459 INFO tripleoclient.utils.utils [-] Temporary directory [ /tmp/tripleok10jk27j ] cleaned up[00m
2022-04-22 17:20:38.446 119459 ERROR tripleoclient.utils.utils [-] Ansible execution failed. playbook: /home/stack/overcloud-deploy/overcloud/config-download/overcloud/deploy_steps_playbook.yaml, Run Status: timeout, Return Code: 254, To rerun the failed command manually execute the following script: /home/stack/overcloud-deploy/overcloud/config-download/ansible-playbook-command.sh[00m

Comment 7 Rabi Mishra 2022-05-30 08:39:01 UTC

*** This bug has been marked as a duplicate of bug 2074541 ***


Note You need to log in before you can comment on or make changes to this bug.