Created attachment 1789221 [details] Grub failure console screenshot Description of problem: We have a cluster of 5 R740s (3 masters and 2 workers) in a Baremetal IPI setup. One of the workers failed to boot with the grub error in the attached screenshot Version-Release number of selected component (if applicable): bootstrapOSImage: rhcos-48.84.202105190318-0-qemu.x86_64.qcow2.gz?sha256=84683a75c0e3d164c1d4a95448e142490a0bf91ff07076bff2b3bbc209c6c368# clusterOSImage: rhcos-48.84.202105190318-0-openstack.x86_64.qcow2.gz?sha256=37a156f9f2b0efded45cb3cd5688aa2d42c26873a534951484e96f546a6b2c84# How reproducible: Occurred on 1 of 5 systems. We are retrying the deployment and will update the results here.
This isn't happening on all reboots, but when it does it appear as though input is been sent to the grub menu screen causing is to enter the grub console. I can then scroll through this text in the grub console history with the up arrow. I've reboot iDrac to see if it is somehow responsible for sending this text to the grub menu. I haven't see the problem occur since the reboot, I'll update here once I'm sure the problem isn't coming back.
Moving back from RHEL to OCP/Bare Metal/Ironic for now. We've discovered this issue while investigating https://bugzilla.redhat.com/show_bug.cgi?id=1966129. The workaround we plan to use (https://review.opendev.org/c/openstack/ironic-python-agent/+/795862) might resolve the issue or change the behaviour. We will take a look again once it's merged and investigate further.
Closing this, it looks likely to be a iDrac issue, We've seen it occur on another Dell R740 (same symptoms in grub) and again restarting iDrac made the problem go away.