Hide Forgot
Created attachment 1765288 [details] metal3-baremetal-operator.log Description of problem: The BM worker nodes deployment failed while trying to cleanup raid. The cluster was deployed with virtual media and the BMs are set to UEFI boot. Were able to deploy the same cluster with ipmi successfully. [root@cnfdt07-installer ~]# oc get bmh -A NAMESPACE NAME STATE CONSUMER ONLINE ERROR openshift-machine-api cnfdt07-master-0 externally provisioned cnfdt07-lghjm-master-0 true openshift-machine-api cnfdt07-master-1 externally provisioned cnfdt07-lghjm-master-1 true openshift-machine-api cnfdt07-master-2 externally provisioned cnfdt07-lghjm-master-2 true openshift-machine-api cnfdt07-worker-0 preparing true preparation error openshift-machine-api cnfdt07-worker-1 preparing true preparation error error message from 'oc describe bmh': "Node failed to start the first cleaning step. Error: node does not support this clean step: {'interface': 'raid', 'step': 'delete_configuration'}" It seems that this cleanup stage was added is this PR: https://github.com/metal3-io/baremetal-operator/pull/292/files Version-Release number of selected component (if applicable): 4.8.0-0.nightly-2021-03-21-033801 How reproducible: Deploy a cluster with virtual media and BMs set to UEFI boot. Actual results: [root@cnfdt07-installer ~]# oc get node NAME STATUS ROLES AGE VERSION dhcp-55-172.lab.eng.tlv2.redhat.com Ready master,virtual 24h v1.20.0+39c0afe dhcp-55-213.lab.eng.tlv2.redhat.com Ready master,virtual 24h v1.20.0+39c0afe dhcp-55-221.lab.eng.tlv2.redhat.com Ready master,virtual 24h v1.20.0+39c0afe [root@cnfdt07-installer ~]# oc get bmh -A NAMESPACE NAME STATE CONSUMER ONLINE ERROR openshift-machine-api cnfdt07-master-0 externally provisioned cnfdt07-lghjm-master-0 true openshift-machine-api cnfdt07-master-1 externally provisioned cnfdt07-lghjm-master-1 true openshift-machine-api cnfdt07-master-2 externally provisioned cnfdt07-lghjm-master-2 true openshift-machine-api cnfdt07-worker-0 preparing true preparation error openshift-machine-api cnfdt07-worker-1 preparing true preparation error Expected results: The bms should be provisioned successfully. Additional info: Attached baremetal-operator logs and must-gather.
Created attachment 1765290 [details] kube-rbac-proxy.log
BM node is R640 with the PERC H740P Mini (Embedded) FW Version: 50.9.4-3025
It looks like the baremetal-operator now attempts that clean step whenever the BMC driver's RAIDInterface() method returns a non-empty string. In this case the driver is idrac-virtualmedia and the RAID interface is set to "no-raid", so it attempts to run the clean step. The regular iDRAC driver returns an empty string for the RAID interface, which is why it's not affected.
Zane, yeah I've found that. I'm adding a check for the "no-raid" to BMO (will push the PR for it)
Moving to
Could you verify the BZ, please? We don't have the same setup
The cluster was deployed successfully with 4.8.0-0.nightly-2021-04-03-092337
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438