Bug 2003641

Summary: All metal ipi jobs are failing in 4.10
Product: OpenShift Container Platform Reporter: Stephen Benjamin <stbenjam>
Component: InstallerAssignee: Derek Higgins <derekh>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Raviv Bar-Tal <rbartal>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: derekh, rbartal, sippy
Version: 4.9Keywords: OtherQA, Triaged
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
job=periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-upgrade=all job=periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-virtualmedia=all job=periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-compact=all job=periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi=all job=periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-serial-ipv4=all job=periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-dualstack=all job=periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-dualstack-local-gateway=all job=periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-ipv6=all job=periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.9-e2e-metal-ipi-upgrade=all
Last Closed: 2022-03-10 16:09:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stephen Benjamin 2021-09-13 11:15:11 UTC
All metal IPI jobs are failing to install in 4.10.  I don't see the libvirt VM's being created other than the bootstrap, and the installer is complaining about inspection:

> level=debug msg=2021/09/12 16:20:37 [DEBUG] module.masters.ironic_node_v1.openshift-master-host[1]: apply errored, but we're indicating that via the Error pointer rather than returning it: could not inspect: could not inspect node, node is currently 'inspect failed' , last error was 'Failed to inspect hardware. Reason: unable to start inspection: Version requested but version discovery document was not found and allow_version_hack was False' 


See Sippy: https://sippy.ci.openshift.org/sippy-ng/jobs/4.10/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22metal-ipi%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D

Example job: https://amd64.ocp.releases.ci.openshift.org/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2021-09-12-134458

Comment 1 Derek Higgins 2021-09-13 17:04:33 UTC
there is an error in the ironic-conductor logs when trying to talk to ironic-inspector

2021-09-12 16:20:37.024 1 ERROR ironic.drivers.modules.inspector [req-c4f8ddda-709b-451a-b431-9382f152b977 bootstrap-user - - - -] Unable to start managed inspection for node ef925ac1-340f-4800-a3e4-361f1d0237b2: Version requested but version discovery document was not found and allow_version_hack was False: keystoneauth1.exceptions.discovery.DiscoveryFailure: Version requested but version discovery document was not found and allow_version_hack was False

ironic-inspector is being started but has no logs, this is because we appear to have swapped over from the
ironic-inspector-image to the using the ironic-image (which now contains the inspector package)

The ironic-image has no entry point so it needs to be specified in script that starts the inspector

Comment 2 Derek Higgins 2021-09-14 16:11:32 UTC
Now fixed, nightly's are no longer blocked
https://github.com/openshift/installer/pull/5208

Comment 7 errata-xmlrpc 2022-03-10 16:09:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056