Bug 1905460 - Deploy using virtualmedia for disabled provisioning network on real BM(HPE) fails
Summary: Deploy using virtualmedia for disabled provisioning network on real BM(HPE) f...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.0
Assignee: Derek Higgins
QA Contact: Lubov
URL:
Whiteboard:
Depends On:
Blocks: 1893648
TreeView+ depends on / blocked
 
Reported: 2020-12-08 12:10 UTC by Lubov
Modified: 2021-07-27 22:35 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:34:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ironic logs (207.23 KB, application/gzip)
2020-12-08 12:10 UTC, Lubov
no flags Details
Dell console (61.27 KB, image/png)
2020-12-27 08:25 UTC, Lubov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:35:11 UTC

Description Lubov 2020-12-08 12:10:50 UTC
Created attachment 1737584 [details]
ironic logs

Version:

$ ./openshift-baremetal-install version
./openshift-baremetal-install 4.7.0-0.nightly-2020-12-04-013308
built from commit b9701c56ece235c8a988530816aac84980a91bdd
release image registry.svc.ci.openshift.org/ocp/release@sha256:2352dfe2655dcc891e3c09b4c260b9e346e930ee4dcdc96c6a7fd003860ef100

Platform:
IPI baremetal

What happened?
Deploy using virtualmedia for disabled provisioning network on real BM environment fails constantly with

Error: could not inspect: could not inspect node, node is currently 'inspect failed', last error was 'timeout reached while inspecting the node'

The same problem reported in ironic-inspector.log
ERROR ironic_inspector.node_cache [-] Introspection for nodes ['e4ce855c-3d04-4075-bf05-395fc85e6c8a', 'd421a87c-5d79-4fd5-a67a-ece35188b1c3', '6f262b39-01c5-4215-8105-fe20b0879c87'] has timed out

On node console: machines are restarting and dropped to Emergency shell. 

What did you expect to happen?
Deployment should succeed

How to reproduce it (as minimally and precisely as possible)?
Deploy using virtualmedia for disabled provisioning network on real BM

Anything else we need to know?
Attaching inspector and conductor logs

Comment 1 Derek Higgins 2020-12-08 12:32:40 UTC
The hardware in question is

ProLiant DL380 Gen10
System ROM U30 v2.22 (11/13/2019)
iLO Firmware Version 2.12 Jan 17 2020

After spending some time on this environment, I can confirm that the virtual media is getting attached to the Node but no attempt is being made to boot from it,
I can force the node to boot from IPA by setting the Server Boot Order manually to have "iLO Virtual USB 3 : iLO Virtual CD-ROM" at the top,
but I don't see the same option in the One-Time Boot menu...

Ironic has set the one time boot to cdrom
2020-12-08 11:17:58.005 1 DEBUG ironic.drivers.modules.redfish.boot [req-a20f7efb-c483-4400-99f2-7e40306a8e38 bootstrap-user - - - -] Node fd2218f0-639f-468d-8e0f-0c5aa7f72f20 is set to one time boot from cdrom 

And has the redfish representation
'BootSourceOverrideEnabled': 'Once', 'BootSourceOverrideMode': 'UEFI', 'BootSourceOverrideTarget': 'Cd', 'BootSourceOverrideTarget': ['None', 'Cd', 'Hdd', 'Usb', 'SDCard', 'Utilities', 'Diags', 'BiosSetup', 'Pxe', 'UefiShell', 'UefiHttp', 'UefiTarget'], 'UefiTargetBootSourceOverride': 'None'

If there is newer firmware available for this hardware can you try this after updating it?

Comment 2 Lubov 2020-12-08 20:48:20 UTC
Upgraded to
iLO 5 Firmware Version 2.31 Oct 13 2020
System ROM U30 v2.40 (10/26/2020)

This step passed

Comment 3 Lubov 2020-12-09 11:18:12 UTC
As Derek predicted the second attempt (on not clean HD) fails with the same. Re-opening the bug

Comment 4 Derek Higgins 2020-12-10 11:34:37 UTC
As far as I can see, the "Cd" mentioned in the list of the OverrideTargets is not the VirtualCD

    "BootSourceOverrideTarget": [
      "None",
      "Cd",
      "Hdd",
      "Usb",
      "SDCard",
      "Utilities",
      "Diags",
      "BiosSetup",
      "Pxe",
      "UefiShell",
      "UefiHttp",
      "UefiTarget"
    ],

I've tried a number of these options, and the only one I could get to boot from the Virtual Cd is to set it to UefiTarget with 

curl -k -X PATCH  -u XXX:XXX -H "Content-Type: application/json" -H 'OData-Version: 4.0' https://10.46.61.16/redfish/v1/Systems/1 -d '{"Boot": {"UefiTargetBootSourceOverride": "PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x4)/USB(0x1,0x0)"}}'

Comment 5 Lubov 2020-12-17 13:52:32 UTC
It cleared there was a problem with pre-configuration of provisioning network

Comment 6 Lubov 2020-12-17 14:20:25 UTC
I was wrong, it still failing

Comment 7 Lubov 2020-12-27 08:25:12 UTC
Created attachment 1742216 [details]
Dell console

Comment 16 Derek Higgins 2021-01-15 12:37:57 UTC
I've tried reproducing this on another ProLiant DL380 Gen10 with
System ROM  U30 v2.36 (07/16/2020)
iLO 5   2.18 Jun 22 2020    System Board

and it booted from vmedia as expected, I then upgraded it to 
iLO 5   2.18 Jun 22 2020    System Board
System ROM  U30 v2.40 (10/26/2020)  System Board

and it again booted from vmedia as expected,
I'll look again at the original system to compare the two

Comment 19 Derek Higgins 2021-04-01 13:10:56 UTC
This should be ok for retesting, it looks like the problem was old entries in the boot manager
https://storyboard.openstack.org/#!/story/2008763

Comment 20 Lubov 2021-04-04 14:19:09 UTC
verified on 4.8.0-0.nightly-2021-04-03-092337
Twice run deployment using redfish-virtualmedia and with provisioning network disabled - both attempts completed successfully

Comment 23 errata-xmlrpc 2021-07-27 22:34:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.