Description of problem:
While running the deployment using openshift-baremetal-install there is one point in time where the nodes are turned on and get a RHEL ironic image from PXE. That image installs CoreOS in the local disk but the installer does not change the boot order to when it's restarted does not boot CoreOS, but from PXE boot, so the installation does not progress unless you change that manually during the deployment.
If you restart and select the local disk CoreOS starts getting the ignition file and then after some time it reboots again, so you have to select again manually to boot from local disk that second time (boot order is not changed at that time either)
Version-Release number of the following components:
Steps to Reproduce:
1.Run baremetal IPI with openshift-baremetal-install
2.Wait until installer use BMC to boot nodes
3.Nodes start from PXE again after reboot
Installer does not reconfigure the boot order
Installer reconfigure the boot order and starts from local disk when rebooting after RHEL ironic image finish
I used UEFI
Just an additional note. I've found that during the PXE boot made by metal3 (booting the Workers) the boot from NICs is removed from the boot order, so this behavior is only found while PXE booting the Master nodes from the bootstrap.
Julia, does this sound familiar - could it be because we're not setting boot_capabilities on workers yet?
Stephen, this sounds like the BMC may not be honoring the direction to boot from disk. The hardware that this was encountered on is somewhat known not to honor persistent boot commands, so the hardware should ideally be set to boot from disk by default at all times and then ironic will signal it to boot from the network for installation. Also worth noting that UEFI is a bit of an after thought to IPMI,and as such persistent boot commands are raw flags which are now actually blocked by some vendor's BMCs. We highly recommend using redfish in this case.
Beyond this, there is not much I can say that can be done, not really a bug that can be descerned with out logs from the ironic pod.
I am not aware that this happened to others and if it were consistent we would have seen more errors such as this. What is the differentiator? Do you use specific hardware to install on?
The server bios likely needs to be set to always boot from disk as opposed to always boot from network. We've seen that specific manufacturer's BMCs disregard persistent boot commands before, so I really think there is nothing code wise that can really be "fixed" here.
Stephen, I think this is a candidate to be closed/wontfix.
Closing won't fix as per last comment from Julia.