Created attachment 1829804 [details] virtualmediaboot-2021-10-06_14.16.34.mkv Description of problem: SNO deployment on HPE e910 blade fails because the node does not boot from virtualmedia and falls back to booting from previous installation on the internal drive. Version-Release number of selected component (if applicable): 4.9.0-0.nightly-2021-10-05-004711 How reproducible: 100% Steps to Reproduce: 1. Deploy SNO node by following the ZTP procedure Actual results: When the node defined in the BMH object first boots it does not boot from the ISO image attached via Virtual Media. Expected results: When the node defined in the BMH object first boots it boots from the ISO image attached via Virtual Media. Additional info: Attaching must-gather and video recording of the boot process.
Looks like this entry in your boot table was be causing the instruction to boot from cd to do the wrong thing "PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x4)/USB(0x1,0x0)/CDROM(0x1)/\\EFI\\redhat\\shimx64.efi", after removing it the host booted correctly, from iso I've hit this before, https://storyboard.openstack.org/#!/story/2008763 I'm not sure what to do about it, ironic can't clean out the entry if it can't boot IPA to clean it... I believe RHCOS created this entry when it boots from CD
The installation process moves forward after manually removing the boot entry so lowering the severity.
This can happen when booting a different image other than RHCOS, simply booting the RHCOS image consistently will not result in this problem. This should be a doc change to indicate that if there are any entries in boot table (see comment #4) they should be removed manually.
Happened on a Dell machine as well used only for OCP deployments: workaround: efibootmgr -v | grep shimx64.efi | grep CDROM Boot000A* Red Hat Enterprise Linux PciRoot(0x0)/Pci(0x14,0x0)/USB(13,0)/USB(0,0)/USB(2,0)/Unit(0)/CDROM(1,0x221,0xd2a)/File(\EFI\redhat\shimx64.efi) efibootmgr -B -b 000A
I believe this may be the same issue we are also tracking/investigating under BZ 1978314
Adding another data point, the `CDROM(1,0x221,0xd2a)/File(\EFI\redhat\shimx64.efi)` boot entry gets created when deploying OCP 4.8, it is not created when deploying OCP 4.9 so this issue manifests when first deploying OCP 4.8 and then deploying OCP 4.9 on the same machine via ZTP process.
*** Bug 1978314 has been marked as a duplicate of this bug. ***
Testing with the latest 4.9 install, which includes backport of a fix for BZ 2004449 (https://github.com/coreos/coreos-assembler/pull/2436), did not experience this issue. Specifics of the test: Starting with a clean UEFI boot order (deleted all entries with efibootmgr -B -b 000x) # efibootmgr -v BootCurrent: 0002 No BootOrder is set; firmware will attempt recovery MirroredPercentageAbove4G: 0.00 MirrorMemoryBelow4GB: false # OCP 4.9.10 installed (first install) # efibootmgr -v BootCurrent: 0005 BootOrder: 0005,0000,0001,0003,0004 Boot0000* Virtual Floppy PciRoot(0x0)/Pci(0x14,0x0)/USB(13,0)/USB(0,0)/USB(2,0)/Unit(1) Boot0001* Virtual CD PciRoot(0x0)/Pci(0x14,0x0)/USB(13,0)/USB(0,0)/USB(2,0)/Unit(0) Boot0003* Integrated NIC 1 Port 1 Partition 1 VenHw(3a191845-5f86-4e78-8fce-c4cff59f9daa) Boot0004* Integrated NIC 1 Port 3 Partition 1 VenHw(d227c733-f75f-4341-b749-4d1759ec8538) Boot0005* Red Hat Enterprise Linux HD(2,GPT,1e8869d4-1225-4915-866c-9e18550a9a72,0x1000,0x3f800)/File(\EFI\redhat\shimx64.efi) MirroredPercentageAbove4G: 0.00 MirrorMemoryBelow4GB: false # uname -a Linux cnfocto1.ptp.lab.eng.bos.redhat.com 4.18.0-305.28.1.el8_4.x86_64 #1 SMP Mon Nov 8 07:45:47 EST 2021 x86_64 x86_64 x86_64 GNU/Linux # egrep Core /etc/motd Red Hat Enterprise Linux CoreOS 49.84.202111292103-0 Second install succeeded. System booted directly into the boot ISO. # efibootmgr -v BootCurrent: 0002 BootOrder: 0002,0000,0001,0003,0004 Boot0000* Virtual Floppy PciRoot(0x0)/Pci(0x14,0x0)/USB(13,0)/USB(0,0)/USB(2,0)/Unit(1) Boot0001* Virtual CD PciRoot(0x0)/Pci(0x14,0x0)/USB(13,0)/USB(0,0)/USB(2,0)/Unit(0) Boot0002* Red Hat Enterprise Linux HD(2,GPT,1e8869d4-1225-4915-866c-9e18550a9a72,0x1000,0x3f800)/File(\EFI\redhat\shimx64.efi) Boot0003* Integrated NIC 1 Port 1 Partition 1 VenHw(3a191845-5f86-4e78-8fce-c4cff59f9daa) Boot0004* Integrated NIC 1 Port 3 Partition 1 VenHw(d227c733-f75f-4341-b749-4d1759ec8538) MirroredPercentageAbove4G: 0.00 MirrorMemoryBelow4GB: false # uname -a Linux cnfocto1.ptp.lab.eng.bos.redhat.com 4.18.0-305.28.1.el8_4.x86_64 #1 SMP Mon Nov 8 07:45:47 EST 2021 x86_64 x86_64 x86_64 GNU/Linux # egrep Core /etc/motd Red Hat Enterprise Linux CoreOS 49.84.202111292103-0
Ian - thanks, that's great news. Marius - similar to what Ian did, could you clear out the entries that are causing the problem and try it with the recent version? If that works we could probably close this as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2004449.
(In reply to Bob Fournier from comment #29) > Ian - thanks, that's great news. > > Marius - similar to what Ian did, could you clear out the entries that are > causing the problem and try it with the recent version? If that works we > could probably close this as a duplicate of > https://bugzilla.redhat.com/show_bug.cgi?id=2004449. The issue didn't reproduce with rhcos-49.84.202111292103-0 but an older image is still referenced in openshift-installer, assisted-service and openshift dependencies so I believe these need to be updated with the more recent rhcos build so we can consume the fix: openshift-installer: https://github.com/openshift/installer/blob/release-4.9/data/data/rhcos.json#L118-L121 assisted-service: https://github.com/openshift/assisted-service/blob/master/data/default_os_images.json#L26-L28 dependencies: https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.9/4.9.0/
(In reply to Marius Cornea from comment #30) > (In reply to Bob Fournier from comment #29) > > Ian - thanks, that's great news. > > > > Marius - similar to what Ian did, could you clear out the entries that are > > causing the problem and try it with the recent version? If that works we > > could probably close this as a duplicate of > > https://bugzilla.redhat.com/show_bug.cgi?id=2004449. > > The issue didn't reproduce with rhcos-49.84.202111292103-0 but an older > image is still referenced in openshift-installer, assisted-service and > openshift dependencies so I believe these need to be updated with the more > recent rhcos build so we can consume the fix: > > openshift-installer: > https://github.com/openshift/installer/blob/release-4.9/data/data/rhcos. > json#L118-L121 > assisted-service: > https://github.com/openshift/assisted-service/blob/master/data/ > default_os_images.json#L26-L28 > dependencies: > https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.9/4.9.0/ Marius - great, thanks for checking that. I changed the Product+Component to reflect that the build needs to be updated and that this isn't a shim issue.
From a brief perusal of this BZ, it looks like the action that needs to be taken is to update the RHCOS version in the stream data. I am sending this BZ to the CoreOS team as they own that.
(In reply to Marius Cornea from comment #30) > (In reply to Bob Fournier from comment #29) > > Ian - thanks, that's great news. > > > > Marius - similar to what Ian did, could you clear out the entries that are > > causing the problem and try it with the recent version? If that works we > > could probably close this as a duplicate of > > https://bugzilla.redhat.com/show_bug.cgi?id=2004449. > > The issue didn't reproduce with rhcos-49.84.202111292103-0 but an older > image is still referenced in openshift-installer, assisted-service and > openshift dependencies so I believe these need to be updated with the more > recent rhcos build so we can consume the fix: > > openshift-installer: > https://github.com/openshift/installer/blob/release-4.9/data/data/rhcos. > json#L118-L121 > assisted-service: > https://github.com/openshift/assisted-service/blob/master/data/ > default_os_images.json#L26-L28 > dependencies: > https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.9/4.9.0/ I'm a bit confused here...we fixed BZ#2004449 with a change to `coreos-assembler` (https://github.com/coreos/coreos-assembler/pull/2435) first by dropping the `shim` fallback.efi from the live ISO. New bootimages were generated and then used as part of the update to `openshift-install` here https://github.com/openshift/installer/pull/5231. (And then updated again in https://github.com/openshift/installer/pull/5279 to use 49.84.202110081407-0) If this problem is the same as what is reported in BZ#2004449, then no further changes should be needed. It's not clear to me if the images referenced in `openshift-install` are still experiencing a problem and if additional changes are needed. @Marius could you please try to reproduce this issue with the latest 4.9 images found in https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.9/4.9.0/? If the issue persists, then we need to understand what changed between 49.84.202110081407-0 (latest image referenced in openshift-installer) and 49.84.202111292103-0 (image reported fixed in comment #30)
I was not able to reproduce the issue when using the latest 4.9 images found in https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.9/4.9.0/ so closing this BZ