Bug 1940889
| Summary: | Installation failures in OpenStack release jobs | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Petr Muller <pmuller> |
| Component: | Installer | Assignee: | Pierre Prinetti <pprinett> |
| Installer sub component: | OpenShift on OpenStack | QA Contact: | Jon Uriarte <juriarte> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | urgent | CC: | m.andre, pprinett |
| Version: | 4.8 | Keywords: | Triaged |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: |
test: operator.Run template e2e-openstack - e2e-openstack container setup
|
|
| Last Closed: | 2021-07-27 22:54:33 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Petr Muller
2021-03-19 13:34:18 UTC
The initial investigation shows that the bootstrap node is unable to fetch its ignition file:
A start job is running for Ignition (fetch) (23min 49s / no limit)
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.8/1373877230773473280/artifacts/e2e-openstack-serial/bootstrap/nova.log
It affects all jobs running on vexxhost and not just 4.8 periodics. 4.6 and 4.7 periodics are also affected as well as pre-submit. Setting the priority to urgent as this means we're currently navigating blind without CI.
Also deploying master installer + latest RHCOS + latest nightly release image in a different environment works fine, confirming that the breakage is limited to Vexxhost.
They seem to have networking issues: nova-metadata service is down. It's been reported already. Looking at the console of a bootstap node shows it can't talk to nova-metadata: A start job is running for Ignition (fetch) (49s / no limit)[ 54.004386] ignition[720]: GET http://169.254.169.254/openstack/latest/user_data: attempt #8 Vexxhost made some networking changes and they no longer automatically serve DNS from the DHCP server. We now need to specify a DNS on the subnets we create via the `externalDNS` parameter of install-config.yaml. We're working on a fix for our CI jobs. Vexxhost also fixed the failing nova-metadata service yesterday, so after we configure our jobs to use an externalDNS resolver, that should fix the jobs. Seems to be fixed now, at least for the pre-submit jobs https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-installer-master-e2e-openstack Let's wait a bit more see if this also fixed the periodic jobs. Periodic jobs work too, https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.7/1374662306641743872 Moving to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |