Bug 2087213
Summary: | Spoke BMH stuck "inspecting" when deployed via ZTP in 4.11 OCP hub | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Chad Crum <ccrum> | |
Component: | Bare Metal Hardware Provisioning | Assignee: | Dmitry Tantsur <dtantsur> | |
Bare Metal Hardware Provisioning sub component: | baremetal-operator | QA Contact: | Chad Crum <ccrum> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | high | CC: | calfonso, ccrum, ercohen, imelofer, mcornea, nshidlin, oourfali, sasha, smiron, trwest, tsedovic, yfirst | |
Version: | 4.11 | Keywords: | TestBlocker, Triaged | |
Target Milestone: | --- | Flags: | calfonso:
needinfo-
|
|
Target Release: | 4.11.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2101511 2109125 (view as bug list) | Environment: | ||
Last Closed: | 2022-08-10 11:12:53 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2109125 |
Description
Chad Crum
2022-05-17 15:20:02 UTC
This is actually already a known issue and should be fixed with: https://issues.redhat.com/browse/MGMT-10004 (MGMT-10004 [In Progress] : Add PreprovisioningImage controller to assisted-service ) Opened BZ just for tracking and will link to the Epic for the above task While https://issues.redhat.com/browse/MGMT-10004 should fix the issue in ACM 2.6 I think the ZTP flow should work on 4.11 with ACM 2.5 as well. I think we should fix whatever broke the metal3 compatibility with the ZTP flow in ACM 2.5. Triaging notes: starting with 4.11, the PreprovisioningImage CR controller no longer reconciles images with the InfraEnv attached. This is to enable an integrated ZTP flow via a new controller tracked in https://issues.redhat.com/browse/MGMT-10004. The problem is that BMO expects the image to be reconciled to move the BareMetalHost forward, even if inspection and cleaning are disabled, and live ISO deployment is requested. We need to fix that. I tested OCP version registry.ci.openshift.org/ocp/release:4.11.0-0.ci-2022-06-21-193241 on the hub, with MCE 2.0.1 (ACM 2.5.1) and MCE 2.1 (ACM 2.6) and experienced the same issue on both. The BMH gets stuck "provisioning"and the virtualmedia is never attached to the spoke node. To reproduce on dev-scripts without any assisted components 1) Build dev-scripts with export NUM_EXTRA_WORKERS=1 # or more 2) Copy any ISO as test.iso to /opt/dev-scripts/ironic/html/images/ 3) Apply this manifest, replacing credentials and the System UUID with ones from dev-scripts/ocp/ostest/extra_baremetalhosts.json: --- apiVersion: v1 kind: Secret metadata: name: ostest-extraworker-0-bmc-secret namespace: openshift-machine-api type: Opaque data: username: YWRtaW4= password: ... --- apiVersion: metal3.io/v1alpha1 kind: BareMetalHost metadata: name: ostest-extraworker-0 namespace: openshift-machine-api annotations: inspect.metal3.io: disabled labels: infraenvs.agent-install.openshift.io: test spec: online: true automatedCleaningMode: disabled bootMACAddress: 00:d9:5c:22:74:3f bmc: address: "redfish-virtualmedia+http://192.168.111.1:8000/redfish/v1/Systems/b549a45d-ee8a-418a-bbe8-fd434a5b2658" credentialsName: ostest-extraworker-0-bmc-secret image: url: http://172.22.0.1/images/test.iso diskFormat: live-iso Without any fixes, the BMH gets stuck in "inspecting", now in "provisioning". Check with Ironic: $ baremetal node show openshift-machine-api~ostest-extraworker-0 --fields provision_state power_state instance_info +-----------------+----------------------+ | Field | Value | +-----------------+----------------------+ | instance_info | {'capabilities': {}} | | power_state | power off | | provision_state | manageable | +-----------------+----------------------+ Correction: - diskFormat: live-iso + format: live-iso Not finished yet, the Ironic patch is still pending. Once the upstream ironic patch merges we still need to backport to the right branch to be pickup in a dowstream build and we will tag for ironic-image For 4.11 we need https://review.opendev.org/c/openstack/ironic/+/847657/ If CI cooperates we will get this merged asap Adding Depends On for 2101511 since we need a 4.12 tracker based on slack conversation https://github.com/openshift/ironic-image/pull/281 contains the rpms for ironic tagged for 4.11 with https://review.opendev.org/c/openstack/ironic/+/847657/ No longer depends on 4.12 BZ QE were able to test with cluster bot, now we just need the staff eng to add labels in the PR Verified on registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-07-01-065600 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-07-01-065600 True False 4h25m Cluster version is 4.11.0-0.nightly-2022-07-01-065600 $ oc get bmh NAME STATE CONSUMER ONLINE ERROR AGE spoke-master-0-0-bmh provisioned true 101m spoke-master-0-1-bmh provisioned true 101m spoke-master-0-2-bmh provisioned true 101m spoke-worker-0-0-bmh provisioned true 101m spoke-worker-0-1-bmh provisioned true 101m $ oc get clusterdeployment NAME INFRAID PLATFORM REGION VERSION CLUSTERTYPE PROVISIONSTATUS POWERSTATE AGE spoke-0 dbd36a41-8766-49da-bf3c-430e77e8f964 agent-baremetal 4.11.0 Provisioned 102m $ oc get aci NAME CLUSTER STATE spoke-0 spoke-0 adding-hosts *** Bug 2100904 has been marked as a duplicate of this bug. *** *** Bug 2051533 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |