Hide Forgot
Description of problem: Overcloud deployment fails when pulling images from a remote registry(not the undercloud) because nova_wait_for_compute_service container on compute nodes exits with rc 1. From compute: [root@compute-0 heat-admin]# podman ps -a | grep nova_wait 6b562095fc46 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:20190904.1 dumb-init --singl... 13 hours ago Exited (1) 13 hours ago nova_wait_for_compute_service 41a6bb77e1c6 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:20190904.1 dumb-init --singl... 13 hours ago Exited (1) 13 hours ago nova_wait_for_placement_service [root@compute-0 heat-admin]# podman inspect nova_wait_for_compute_service | grep StartedAt "StartedAt": "2019-09-05T02:47:35.14452906Z", [root@compute-0 heat-admin]# podman inspect nova_wait_for_compute_service | grep FinishedAt "FinishedAt": "2019-09-05T02:57:37.880221163Z" From controller: [root@controller-0 heat-admin]# podman inspect nova_api | grep StartedAt "StartedAt": "2019-09-05T02:58:46.315255257Z", [root@controller-1 heat-admin]# podman inspect nova_api | grep StartedAt "StartedAt": "2019-09-05T02:58:45.849809526Z", [root@controller-2 heat-admin]# podman inspect nova_api | grep StartedAt "StartedAt": "2019-09-05T02:58:56.508654288Z" So we can see that the nova_api containers started after the nova_wait_for_compute_service container exited. This appears to be a race condition which only reproduces when using a remote registry. A workaround for this issue is to upload the container images to the undercloud registry and pull images from it on overcloud nodes. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-10.6.1-0.20190904124632.4e2dddb.el8ost.noarch python3-paunch-4.5.1-0.20190829080435.f9349e0.el8ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud with images pulled from a remote registry, not from the undercloud registry Actual results: Deployment fails because nova_wait_for_compute_service container on compute nodes fails, exiting before the nova api service container started on the controller nodes. Expected results: No failure. Additional info: Adding links to the failed jobs.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0643