Description of problem: The Undercloud upgrade fails during step5 in deployment tasks with the following error: 2022-02-05 22:08:36 | 2022-02-05 22:08:36.321034 | 52540022-e1f2-fc14-5813-0000000045d2 | TASK | Delete orphan containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 2022-02-05 22:08:36 | 2022-02-05 22:08:36.370107 | 52540022-e1f2-fc14-5813-0000000045d2 | TIMING | Delete orphan containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | undercloud-0 | 0:20:11.949398 | 0.04s 2022-02-05 22:08:36 | 2022-02-05 22:08:36.515514 | 52540022-e1f2-fc14-5813-00000000461f | TIMING | tripleo_container_rm : include_tasks | undercloud-0 | 0:20:12.094821 | 0.10s 2022-02-05 22:08:36 | 2022-02-05 22:08:36.540016 | 52540022-e1f2-fc14-5813-000000004546 | TASK | Create containers from /var/lib/tripleo-config/container-startup-config/step_5 2022-02-05 22:08:36 | 2022-02-05 22:08:36.567910 | 52540022-e1f2-fc14-5813-000000004546 | TIMING | tripleo_container_manage : Create containers from /var/lib/tripleo-config/container-startup-config/step_5 | undercloud-0 | 0:20:12.147202 | 0.03s 2022-02-05 22:08:36 | 2022-02-05 22:08:36.580700 | a73ad766-66ef-4462-8b33-e7a14f124897 | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | undercloud-0 2022-02-05 22:08:36 | 2022-02-05 22:08:36.610576 | 52540022-e1f2-fc14-5813-000000004649 | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 2022-02-05 22:18:53 | 2022-02-05 22:18:53.287277 | | WARNING | ERROR: Can't run container nova_wait_for_compute_service 2022-02-05 22:18:53 | stderr: + command -v python3 2022-02-05 22:18:53 | + python3 /container-config-scripts/nova_wait_for_compute_service.py 2022-02-05 22:18:53 | 2022-02-05 22:18:53.290911 | 52540022-e1f2-fc14-5813-000000004649 | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | undercloud-0 | error={"changed": false, "msg": "Failed containers: nova_wait_for_compute_service"} 2022-02-05 22:18:53 | 2022-02-05 22:18:53.292697 | 52540022-e1f2-fc14-5813-000000004649 | TIMING | tripleo_container_manage : Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | undercloud-0 | 0:30:28.871975 | 616.68s 2022-02-05 22:18:53 | Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-17.0-from-16.2-passed_phase1-3cont_2comp_3ceph_1ipa-ipv4-ovn_dvr/2/undercloud-0/home/stack/ffu_undercloud_upgrade.log.gz When having a look at the running containers after the failure, we can still see three old OSP16.2 containers which were deprecated in OSP17: 3ea4e5b9f5db undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-mistral-api:16.2_20220124.1 /usr/bin/bootstra... 24 minutes ago Exited (0) 24 minutes ago mistral_db_populate a4e33ff8c0da undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute-ironic:16.2_20220124.1 kolla_start 24 minutes ago Up 24 minutes ago nova_compute 924b048cf4b3 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute-ironic:16.2_20220124.1 /container-config... 24 minutes ago Exited (1) 14 minutes ago nova_wait_for_compute_service Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-17.0-from-16.2-passed_phase1-3cont_2comp_3ceph_1ipa-ipv4-ovn_dvr/2/undercloud-0/var/log/extra/podman/podman_allinfo.log.gz And indeed, when we have a look at the container tripleo configs, we can see the configs belonging to those services hanging in /var/lib/tripleo-config/container-startup-config/step_5/ directory: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-17.0-from-16.2-passed_phase1-3cont_2comp_3ceph_1ipa-ipv4-ovn_dvr/2/undercloud-0/var/lib/tripleo-config/container-startup-config/step_5/ Version-Release number of selected component (if applicable): How reproducible: Run CI job: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/upgrades/view/ffu/job/DFG-upgrades-ffu-17.0-from-16.2-passed_phase1-3cont_2comp_3ceph_1ipa-ipv4-ovn_dvr/ Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
The issue seems to be in this line: https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/common/common-container-setup-tasks.yaml#L76 as the Undercloud/docker_config.yaml (file attached in the BZ obtained from the failing CI job) does not contain a step_5. Therefore, all the configurations stored in that folder remain there: https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/ansible_plugins/modules/container_startup_config.py#L93-L102
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577