Bug 1968513

Summary: Booting rhcos 4.8.0-fc.5 on a Dell R740 resulted in a grub failure
Product: OpenShift Container Platform Reporter: Bob Fournier <bfournie>
Component: Bare Metal Hardware ProvisioningAssignee: Derek Higgins <derekh>
Bare Metal Hardware Provisioning sub component: ironic QA Contact: Amit Ugol <augol>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: low CC: derekh, miabbott, tsedovic, ykashtan
Version: 4.8Keywords: Triaged
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-22 15:14:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Grub failure console screenshot none

Description Bob Fournier 2021-06-07 13:24:44 UTC
Created attachment 1789221 [details]
Grub failure console screenshot

Description of problem:

We have a cluster of 5 R740s (3 masters and 2 workers) in a Baremetal IPI setup. One of the workers failed to boot with the grub error in the attached screenshot

Version-Release number of selected component (if applicable):

bootstrapOSImage: rhcos-48.84.202105190318-0-qemu.x86_64.qcow2.gz?sha256=84683a75c0e3d164c1d4a95448e142490a0bf91ff07076bff2b3bbc209c6c368#
clusterOSImage: rhcos-48.84.202105190318-0-openstack.x86_64.qcow2.gz?sha256=37a156f9f2b0efded45cb3cd5688aa2d42c26873a534951484e96f546a6b2c84#

How reproducible:

Occurred on 1 of 5 systems. We are retrying the deployment and will update the results here.

Comment 1 Derek Higgins 2021-06-09 10:40:25 UTC
This isn't happening on all reboots, but when it does it appear as though
input is been sent to the grub menu screen causing is to enter the grub console.

I can then scroll through this text in the grub console history with the up arrow.

I've reboot iDrac to see if it is somehow responsible for sending this text to the grub menu. 
I haven't see the problem occur since the reboot, I'll update here once I'm sure the problem
isn't coming back.

Comment 2 Tomas Sedovic 2021-06-11 11:47:27 UTC
Moving back from RHEL to OCP/Bare Metal/Ironic for now.

We've discovered this issue while investigating https://bugzilla.redhat.com/show_bug.cgi?id=1966129. The workaround we plan to use (https://review.opendev.org/c/openstack/ironic-python-agent/+/795862) might resolve the issue or change the behaviour.

We will take a look again once it's merged and investigate further.

Comment 3 Derek Higgins 2021-07-22 15:14:11 UTC
Closing this, it looks likely to be a iDrac issue,
We've seen it occur on another Dell R740 (same symptoms in grub) and again restarting iDrac made the problem go away.