Bug 2217966

Summary: [RHOSP16.2][shift-on-stack] RHCOS images without Byte Order Marker fail to be properly identified and loaded for boot
Product: Red Hat OpenStack Reporter: Julia Kreger <jkreger>
Component: openstack-ironic-python-agentAssignee: Julia Kreger <jkreger>
Status: MODIFIED --- QA Contact: James E. LaBarre <jlabarre>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.2 (Train)CC: imatza, mdemaced, sbaker
Target Milestone: z6Keywords: AutomationBlocker, Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ironic-python-agent-5.0.5-2.20230502215002.8330df9.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Julia Kreger 2023-06-27 16:50:22 UTC
Description of problem:

A backport rooted in https://bugzilla.redhat.com/show_bug.cgi?id=2134529 had to be modified slightly from the upstream version which introduced another issue in the backport where values were not properly split into an ordered list, and then leveraged to use as the hint. The fix was rooted in different python versions which have to be supported on the train branch of the upstream software. This resulted in slightly divergent logic which also couldn't be directly unit testable as it was version dependent.

The base fix, while proposed upstream, was never merged to the upstream stable/train branch, but we did merge the fix downstream on our Train branch because the upstream branch really was no longer getting new fixes.

Version-Release number of selected component (if applicable):

RHOSP 16.2.5

Deploying:
CoreOS 413.92.202306140611-0 (Plow) (ostree:0)


How reproducible:

Always.

Steps to Reproduce:
1. Utilizing RHOSP 16.2, deploy Shift on Stack with a baremetal machine in UEFI mode.
2. Attempt to deploy an rhcos image on that baremetal machine.
3. Node reaches "active" state in ironic, but never boots. 


No workaround exists for this.

Actual results:

Machine loads a UEFI nvram entry labled "h", which is unable to boot.

Expected results:

The baremetal machine reboots to a working operating system.

Additional info:

Comment 4 Itay Matza 2023-06-28 16:30:45 UTC
Verified the fix together with Julia - applied the code locally on an env, and the baremetal workers booted successfully with the RHCOS image of 4.13.0-0.nightly-2023-06-20-224158 on top of RHOS-16.2-RHEL-8-20230526.n.1. [0]

Thank you, Julia!

[0] https://code.engineering.redhat.com/gerrit/c/openstack-ironic-python-agent/+/444550