Bug 1749443 - [OSP15] Overcloud deployment fails when pulling images from a remote registry(not the undercloud) because nova_wait_for_compute_service container on compute nodes exits with rc 1
Summary: [OSP15] Overcloud deployment fails when pulling images from a remote registry...
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z2
: 15.0 (Stein)
Assignee: Martin Schuppert
QA Contact: Archit Modi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-05 15:50 UTC by Marius Cornea
Modified: 2020-01-20 18:14 UTC (History)
12 users (show)

Fixed In Version: openstack-tripleo-heat-templates-10.6.2-0.20191029010436.5c36542.el8ost
Doc Type: Known Issue
Doc Text:
The Compute services (nova) can fail to deploy because the nova_wait_for_compute_service script is unable to query the Nova API. If a remote container image registry is used outside of the undercloud, the Nova API service might not finish deploying in time. The workaround is to rerun the deployment command, or to use a local container image registry on the undercloud.
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1842948 None None None 2019-09-05 16:20:06 UTC
OpenStack gerrit 688349 'None' 'MERGED' 'Ensure nova-api is running before starting nova-compute containers' 2019-12-04 15:13:48 UTC

Description Marius Cornea 2019-09-05 15:50:14 UTC
Description of problem:

Overcloud deployment fails when pulling images from a remote registry(not the undercloud) because nova_wait_for_compute_service container on compute nodes exits with rc 1.

From compute:

[root@compute-0 heat-admin]# podman ps -a | grep nova_wait
6b562095fc46  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:20190904.1                dumb-init --singl...  13 hours ago  Exited (1) 13 hours ago         nova_wait_for_compute_service
41a6bb77e1c6  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:20190904.1                dumb-init --singl...  13 hours ago  Exited (1) 13 hours ago         nova_wait_for_placement_service


[root@compute-0 heat-admin]# podman inspect nova_wait_for_compute_service | grep StartedAt
            "StartedAt": "2019-09-05T02:47:35.14452906Z",
[root@compute-0 heat-admin]# podman inspect nova_wait_for_compute_service | grep FinishedAt
            "FinishedAt": "2019-09-05T02:57:37.880221163Z"



From controller:

[root@controller-0 heat-admin]# podman inspect nova_api | grep StartedAt
            "StartedAt": "2019-09-05T02:58:46.315255257Z",
[root@controller-1 heat-admin]# podman inspect nova_api | grep StartedAt
            "StartedAt": "2019-09-05T02:58:45.849809526Z",
[root@controller-2 heat-admin]# podman inspect nova_api | grep StartedAt
            "StartedAt": "2019-09-05T02:58:56.508654288Z"

So we can see that the nova_api containers started after the nova_wait_for_compute_service container exited.

This appears to be a race condition which only reproduces when using a remote registry. A workaround for this issue is to upload the container images to the undercloud registry and pull images from it on overcloud nodes.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-10.6.1-0.20190904124632.4e2dddb.el8ost.noarch
python3-paunch-4.5.1-0.20190829080435.f9349e0.el8ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with images pulled from a remote registry, not from the undercloud registry

Actual results:
Deployment fails because nova_wait_for_compute_service container on compute nodes fails, exiting before the nova api service container started on the controller nodes.

Expected results:
No failure.

Additional info:
Adding links to the failed jobs.


Note You need to log in before you can comment on or make changes to this bug.