Bug 2100904 - SNO BMH node deployed via ZTP stuck in provisioning state
Summary: SNO BMH node deployed via ZTP stuck in provisioning state
Keywords:
Status: CLOSED DUPLICATE of bug 2087213
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Iury Gregory Melo Ferreira
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-24 16:02 UTC by Marius Cornea
Modified: 2022-07-07 08:55 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-05 14:49:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Marius Cornea 2022-06-24 16:02:34 UTC
Description of problem:

SNO BMH node deployed via ZTP stuck in provisioning state:

oc -n kni-qe-1 get bmh 
NAME                                   STATE          CONSUMER   ONLINE   ERROR   AGE
sno.kni-qe-1.lab.eng.rdu2.redhat.com   provisioning              true             17m


Version-Release number of selected component (if applicable):
4.11.0-fc.3

How reproducible:
100%

Steps to Reproduce:
1. Deploy SNO node via ZTP procedure
2. Check BMH node state

Actual results:
BMH stuck in provisioning state

Expected results:
BMH is provisioned

Additional info:

Attaching must-gather. Snippet from baremetal-operator logs:

{"level":"info","ts":1656086432.6228547,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"kni-qe-1/sno.kni-qe-1.lab.eng.rdu2.redhat.com"}
{"level":"info","ts":1656086432.6586285,"logger":"controllers.BareMetalHost","msg":"registering and validating access to management controller","baremetalhost":"kni-qe-1/sno.kni-qe-1.lab.eng.rdu2.redhat.com","provisioningState":"provisioning","credentials":{"credentials":{"name":"kni-qe-1-bmc-secret","namespace":"kni-qe-1"},"credentialsVersion":"658471"}}
{"level":"info","ts":1656086432.6586885,"logger":"controllers.BareMetalHost","msg":"pre-provisioning image architecture mismatch","baremetalhost":"kni-qe-1/sno.kni-qe-1.lab.eng.rdu2.redhat.com","provisioningState":"provisioning","wanted":"x86_64","current":""}
{"level":"info","ts":1656086432.690859,"logger":"provisioner.ironic","msg":"updating node settings in ironic","host":"kni-qe-1~sno.kni-qe-1.lab.eng.rdu2.redhat.com"}
{"level":"info","ts":1656086432.726027,"logger":"provisioner.ironic","msg":"could not update node settings in ironic, busy","host":"kni-qe-1~sno.kni-qe-1.lab.eng.rdu2.redhat.com"}
{"level":"info","ts":1656086432.7260554,"logger":"controllers.BareMetalHost","msg":"host not ready","baremetalhost":"kni-qe-1/sno.kni-qe-1.lab.eng.rdu2.redhat.com","provisioningState":"provisioning","wait":10}
{"level":"info","ts":1656086432.7260654,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"kni-qe-1/sno.kni-qe-1.lab.eng.rdu2.redhat.com","provisioningState":"provisioning","requeue":true,"after":10}

Comment 2 Marius Cornea 2022-06-24 16:10:49 UTC
BMH:


apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "1"
    bmac.agent-install.openshift.io/hostname: sno.kni-qe-1.lab.eng.rdu2.redhat.com
    bmac.agent-install.openshift.io/ignition-config-overrides: '{"ignition":{"version":"3.2.0"},"storage":{"disks":[{"device":"/dev/nvme2n1","wipeTable":true,
      "partitions": []}, {"device":"/dev/nvme3n1","wipeTable":true, "partitions":
      []}]}}'
    bmac.agent-install.openshift.io/role: master
    inspect.metal3.io: disabled
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"metal3.io/v1alpha1","kind":"BareMetalHost","metadata":{"annotations":{"argocd.argoproj.io/sync-wave":"1","bmac.agent-install.openshift.io/hostname":"sno.kni-qe-1.lab.eng.rdu2.redhat.com","bmac.agent-install.openshift.io/ignition-config-overrides":"{\"ignition\":{\"version\":\"3.2.0\"},\"storage\":{\"disks\":[{\"device\":\"/dev/nvme2n1\",\"wipeTable\":true, \"partitions\": []}, {\"device\":\"/dev/nvme3n1\",\"wipeTable\":true, \"partitions\": []}]}}","bmac.agent-install.openshift.io/role":"master","inspect.metal3.io":"disabled","ran.openshift.io/ztp-gitops-generated":"{}"},"labels":{"app.kubernetes.io/instance":"clusters","infraenvs.agent-install.openshift.io":"kni-qe-1"},"name":"sno.kni-qe-1.lab.eng.rdu2.redhat.com","namespace":"kni-qe-1"},"spec":{"automatedCleaningMode":"disabled","bmc":{"address":"ilo5-virtualmedia://[2620:52:0:11D:B67A:F1FF:FEAE:98D9]/redfish/v1/Systems/1","credentialsName":"kni-qe-1-bmc-secret","disableCertificateVerification":true},"bootMACAddress":"b4:96:91:a5:7b:06","bootMode":"UEFI","online":true}}
    ran.openshift.io/ztp-gitops-generated: '{}'
  creationTimestamp: "2022-06-24T15:56:15Z"
  finalizers:
  - baremetalhost.metal3.io
  generation: 2
  labels:
    app.kubernetes.io/instance: clusters
    infraenvs.agent-install.openshift.io: kni-qe-1
  name: sno.kni-qe-1.lab.eng.rdu2.redhat.com
  namespace: kni-qe-1
  resourceVersion: "661935"
  uid: 5dec0247-153c-4379-b06b-2703d0fb211b
spec:
  automatedCleaningMode: disabled
  bmc:
    address: ilo5-virtualmedia://[2620:52:0:11D:B67A:F1FF:FEAE:98D9]/redfish/v1/Systems/1
    credentialsName: kni-qe-1-bmc-secret
    disableCertificateVerification: true
  bootMACAddress: b4:96:91:a5:7b:06
  bootMode: UEFI
  image:
    format: live-iso
    url: https://assisted-image-service-multicluster-engine.apps.kni-qe-0.lab.eng.rdu2.redhat.com/images/7da70cb1-6f2f-48f6-b5b8-96bf33589e4b?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI3ZGE3MGNiMS02ZjJmLTQ4ZjYtYjViOC05NmJmMzM1ODllNGIifQ.mPssk4euBMxCzWOJgGWZSCvaTVv2oEUJCisAI96-jK57-T87nPpYB5GtR6XL9oZ7jdBw83IOCb7J0J-GERcsyg&arch=x86_64&type=minimal-iso&version=4.11
  online: true
status:
  errorCount: 0
  errorMessage: ""
  goodCredentials:
    credentials:
      name: kni-qe-1-bmc-secret
      namespace: kni-qe-1
    credentialsVersion: "658471"
  hardwareProfile: unknown
  lastUpdated: "2022-06-24T15:58:11Z"
  operationHistory:
    deprovision:
      end: null
      start: null
    inspect:
      end: "2022-06-24T15:56:16Z"
      start: "2022-06-24T15:56:15Z"
    provision:
      end: null
      start: "2022-06-24T15:58:11Z"
    register:
      end: "2022-06-24T15:56:15Z"
      start: "2022-06-24T15:56:15Z"
  operationalStatus: OK
  poweredOn: true
  provisioning:
    ID: 4a122a3e-28f3-440c-9c89-581aceed3ce1
    bootMode: UEFI
    image:
      url: ""
    raid:
      hardwareRAIDVolumes: null
      softwareRAIDVolumes: []
    rootDeviceHints:
      deviceName: /dev/sda
    state: provisioning
  triedCredentials:
    credentials:
      name: kni-qe-1-bmc-secret
      namespace: kni-qe-1
    credentialsVersion: "658471"

Comment 3 Iury Gregory Melo Ferreira 2022-07-01 14:21:01 UTC
Hi Marius,

This looks a bit similar to what is described in https://bugzilla.redhat.com/show_bug.cgi?id=2087213#c7

We got a fix merged yesterday (it was considered a blocker+, so probably it will be available in nightly build very soon)
Would you be able to test with a nightly that contains https://github.com/openshift/ironic-image/pull/281 ?

I will assign the BZ to me to keep track of it, if you don't want to wait for a nightly build we can create a release image with the cluster-bot so you can test it also. 
Feel free to ping me on slack or here in the BZ

Comment 4 Dmitry Tantsur 2022-07-05 14:49:45 UTC
Yeah, I assume it's a duplicate.

*** This bug has been marked as a duplicate of bug 2087213 ***

Comment 5 Marius Cornea 2022-07-07 08:55:23 UTC
Clearing the needinfo. I can confirm the issue didn't reproduce with 4.11.0-0.nightly-2022-07-06-062815.


Note You need to log in before you can comment on or make changes to this bug.