Bug 2006690 - OS boot failure "x64 Exception Type 06 - Invalid Opcode Exception"
Summary: OS boot failure "x64 Exception Type 06 - Invalid Opcode Exception"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.8
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.10.0
Assignee: Benjamin Gilbert
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 2004596
Blocks: 2006962
TreeView+ depends on / blocked
 
Reported: 2021-09-22 08:04 UTC by Jatan Malde
Modified: 2022-07-20 06:12 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When the RHCOS live ISO adds a UEFI boot entry for itself, it assumes the existing UEFI boot entry IDs are consecutive. Consequence: The ISO crashes in the UEFI firmware when booting on systems with non-consecutive boot entry IDs. Fix: The RHCOS live ISO no longer adds a UEFI boot entry for itself. Result: The ISO boots successfully.
Clone Of:
: 2006962 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:12:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github coreos coreos-assembler pull 2435 0 None Merged buildextend-live: drop shim fallback.efi from ISO; simplify EFI image 2021-09-22 18:02:56 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:13:04 UTC

Description Jatan Malde 2021-09-22 08:04:46 UTC
---

OCP Version at Install Time: 4.8
RHCOS Version at Install Time: 4.8
Platform: baremetal
Architecture: x86_64 HP DL360 Gen 9

What are you trying to do? What is your use case?


Booting the machine with rhcos 4.8.2 iso for the HPDL360 Gen 9 machine it fails before stage1 of booting 

  OS boot failure "x64 Exception Type 06 - Invalid Opcode Exception" (screenshot attached)

Attaching the screenshots about the machine's hardware. When we boot coreos 4.7.13 iso or rhel 8.4 iso it works fine and we do see the machine booting up propery but does not work with rhcos 4.8

There has been upstream community discussion about the message and the customer has changed the console from html5 to java console but still sees the issue. 

 https://community.hpe.com/t5/ProLiant-Servers-ML-DL-SL/DL360-Gen9-ESXi/m-p/7124546/highlight/true#M173951

What happened? What went wrong or what did you expect?

To reproduce:-
1. Take a HP DL360 Gen9 machine and then boot the machine with rhcos 4.8.2 iso.

Attaching the screenshots of machine details and a short video of the error.

Comment 2 Timothée Ravier 2021-09-22 09:24:43 UTC
We recently fixed some booting issues with the LiveISO so that may be it. Can you test with the latest image from https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.8 ?

Comment 5 Benjamin Gilbert 2021-09-22 17:04:16 UTC
There are two recent fixes to EFI boot; one of the bugs affected 4.7 and one didn't.  Since the 4.7 ISO is working, it looks like the problem is in fallback.efi, which has now been removed from the ISO to fix bug 2004677.

Comment 6 Benjamin Gilbert 2021-09-22 17:05:44 UTC
(A private comment confirmed that a recent build of the 4.8 ISO is working correctly.)

Comment 8 RHCOS Bug Bot 2021-09-22 18:15:19 UTC
This bug has been reported fixed in a new RHCOS build.  Do not move this bug to MODIFIED until the fix has landed in a new bootimage.

Comment 12 Michael Nguyen 2021-10-12 13:42:35 UTC
@bglibert

This is the output I get from the latest installer.  The fallback.efi is removed but it looks different from the 4.9 installer (missing the EFI/BOOT/redhat, see 
https://bugzilla.redhat.com/show_bug.cgi?id=2006962#c4).  If this is expected, I can move this to Verified: Tested.

$ sudo mount -o loop,ro rhcos-410.84.202110081440-0-live.x86_64.iso  /mnt/iso
$ sudo mount -o loop,ro /mnt/iso/images/efiboot.img /mnt/efi
$ tree /mnt/efi
/mnt/efi
└── EFI
    └── BOOT
        ├── BOOTX64.EFI
        ├── fonts
        ├── grub.cfg
        ├── grubx64.efi
        └── mmx64.efi

3 directories, 4 files

Comment 13 Benjamin Gilbert 2021-10-12 14:51:02 UTC
Yup, that's correct.  The 4.10 PR did some additional simplification that wasn't included in the backports.

Comment 14 RHCOS Bug Bot 2021-10-20 17:53:45 UTC
The fix for this bug has landed in a bootimage bump, as tracked in bug 2004596 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 16 Michael Nguyen 2021-10-25 14:19:23 UTC
Moving to verified now that the boot image has at bug 2004596 has been verified.

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-10-25-062528   True        False         9m36s   Cluster version is 4.10.0-0.nightly-2021-10-25-062528

Comment 19 errata-xmlrpc 2022-03-10 16:12:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.