Bug 1835984 - [telco] Openshift 4.4.3 boot fails with UEFI mode
Summary: [telco] Openshift 4.4.3 boot fails with UEFI mode
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Julia Kreger
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-14 20:39 UTC by emahoney
Modified: 2023-10-06 20:02 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-30 16:16:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description emahoney 2020-05-14 20:39:54 UTC
Description of problem:
While deploying 4.4.3 version with UEFI mode over pxe mode it fails with unable to connect with TFTP server or errors out with unknown network error, Where as it picks up the image while in BIOS mode without issues 

While monitoring the console I Observed something weird it says the "NBP file downloaded successfully "on Integrated 1 NIC1 and then it fails. It keeps making attempts after 10 attempts it picks up the image. Please find the attached screen shot.

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:
Every time

Steps to Reproduce:
1. N/a
2.
3.

Actual results:
UEFI boot fails

Expected results:
Boot succeeds

Additional info:
Please see screenshots and logs.

Comment 8 Amit Ugol 2020-05-17 17:04:37 UTC
Hi,
What is the server's brand and model?

Comment 9 Stephen Benjamin 2020-05-18 00:48:49 UTC
This sounds similar to BZ1830298. We'll use an entirely different iPXE firmware (snponly.efi) in the fix for that BZ. That uses the built-in UEFI network stack instead of iPXE's set of drivers. 

If you do see this problem again with snponly.efi, please re-open this bug and provide more details about the hardware being used.

*** This bug has been marked as a duplicate of bug 1830298 ***

Comment 10 Stephen Benjamin 2020-05-18 00:56:02 UTC
Hm, actually looking at the screenshots closer, it's showing a few different errors. It's worth retrying with the new firmware, but in some screenshots it's failing before you're even in iPXE. ipxe.efi isn't even loaded and things fail much earlier. 

Later screenshots show "Failed to alloc highmem for files."  I'll leave this open for the Ironic folks to take a look.

More details about the hardware including types of nics would be helpful as Amit asked for.

Comment 11 Amit Ugol 2020-05-18 06:06:53 UTC
So adding another question, hardware brand, model and firmware version please.(In reply to Stephen Benjamin from comment #10)

> Hm, actually looking at the screenshots closer, it's showing a few different
> errors. It's worth retrying with the new firmware, but in some screenshots
> it's failing before you're even in iPXE. ipxe.efi isn't even loaded and
> things fail much earlier. 
> 
> Later screenshots show "Failed to alloc highmem for files."  I'll leave this
> open for the Ironic folks to take a look.
> 
> More details about the hardware including types of nics would be helpful as
> Amit asked for.

Comment 12 Julia Kreger 2020-05-19 16:31:31 UTC
Specifically we will also need firmware/hardware version of the network devices as reported in the BMC.

Comment 13 emahoney 2020-05-19 17:25:24 UTC
Here is the latest info from the support case:

~~~~
I was able to reproduce this on a VZ environment with Dell PowerEdge R640 using BIOS  2.4.8 andiDrac Firmware 4.10.10.10.
The provisioning NIC is internal, Broadcom Adv. Dual 25Gb Ethernet, controller BIOS version 214.4.32.0, EFI version 214.0.275.0.

In the cluster that was failing with UEFI, I had success with snponly.efi instead of ipxe.efi, but I need to test more.
~~~~

Comment 16 Honza Pokorny 2020-05-26 16:25:30 UTC
Is the new firmware working for you?

Comment 17 Angus Thomas 2020-05-28 11:54:24 UTC
Does this comment above "In the cluster that was failing with UEFI, I had success with snponly.efi instead of ipxe.efi" mean that this bug can now be closed?
It could, of course, be reopened if a reproducable problem were to occur with the use of snponly.efi.

Comment 19 Julia Kreger 2020-05-28 14:48:09 UTC
That does seem to be the case. I believe we fixed this in our latest 4.4 release, so all we need at this point is confirmation from the filer that it works as expected.

Comment 24 Red Hat Bugzilla 2023-09-14 06:00:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.