Created attachment 1737584 [details] ironic logs Version: $ ./openshift-baremetal-install version ./openshift-baremetal-install 4.7.0-0.nightly-2020-12-04-013308 built from commit b9701c56ece235c8a988530816aac84980a91bdd release image registry.svc.ci.openshift.org/ocp/release@sha256:2352dfe2655dcc891e3c09b4c260b9e346e930ee4dcdc96c6a7fd003860ef100 Platform: IPI baremetal What happened? Deploy using virtualmedia for disabled provisioning network on real BM environment fails constantly with Error: could not inspect: could not inspect node, node is currently 'inspect failed', last error was 'timeout reached while inspecting the node' The same problem reported in ironic-inspector.log ERROR ironic_inspector.node_cache [-] Introspection for nodes ['e4ce855c-3d04-4075-bf05-395fc85e6c8a', 'd421a87c-5d79-4fd5-a67a-ece35188b1c3', '6f262b39-01c5-4215-8105-fe20b0879c87'] has timed out On node console: machines are restarting and dropped to Emergency shell. What did you expect to happen? Deployment should succeed How to reproduce it (as minimally and precisely as possible)? Deploy using virtualmedia for disabled provisioning network on real BM Anything else we need to know? Attaching inspector and conductor logs
The hardware in question is ProLiant DL380 Gen10 System ROM U30 v2.22 (11/13/2019) iLO Firmware Version 2.12 Jan 17 2020 After spending some time on this environment, I can confirm that the virtual media is getting attached to the Node but no attempt is being made to boot from it, I can force the node to boot from IPA by setting the Server Boot Order manually to have "iLO Virtual USB 3 : iLO Virtual CD-ROM" at the top, but I don't see the same option in the One-Time Boot menu... Ironic has set the one time boot to cdrom 2020-12-08 11:17:58.005 1 DEBUG ironic.drivers.modules.redfish.boot [req-a20f7efb-c483-4400-99f2-7e40306a8e38 bootstrap-user - - - -] Node fd2218f0-639f-468d-8e0f-0c5aa7f72f20 is set to one time boot from cdrom And has the redfish representation 'BootSourceOverrideEnabled': 'Once', 'BootSourceOverrideMode': 'UEFI', 'BootSourceOverrideTarget': 'Cd', 'BootSourceOverrideTarget': ['None', 'Cd', 'Hdd', 'Usb', 'SDCard', 'Utilities', 'Diags', 'BiosSetup', 'Pxe', 'UefiShell', 'UefiHttp', 'UefiTarget'], 'UefiTargetBootSourceOverride': 'None' If there is newer firmware available for this hardware can you try this after updating it?
Upgraded to iLO 5 Firmware Version 2.31 Oct 13 2020 System ROM U30 v2.40 (10/26/2020) This step passed
As Derek predicted the second attempt (on not clean HD) fails with the same. Re-opening the bug
As far as I can see, the "Cd" mentioned in the list of the OverrideTargets is not the VirtualCD "BootSourceOverrideTarget": [ "None", "Cd", "Hdd", "Usb", "SDCard", "Utilities", "Diags", "BiosSetup", "Pxe", "UefiShell", "UefiHttp", "UefiTarget" ], I've tried a number of these options, and the only one I could get to boot from the Virtual Cd is to set it to UefiTarget with curl -k -X PATCH -u XXX:XXX -H "Content-Type: application/json" -H 'OData-Version: 4.0' https://10.46.61.16/redfish/v1/Systems/1 -d '{"Boot": {"UefiTargetBootSourceOverride": "PciRoot(0x0)/Pci(0x1C,0x4)/Pci(0x0,0x4)/USB(0x1,0x0)"}}'
It cleared there was a problem with pre-configuration of provisioning network
I was wrong, it still failing
Created attachment 1742216 [details] Dell console
I've tried reproducing this on another ProLiant DL380 Gen10 with System ROM U30 v2.36 (07/16/2020) iLO 5 2.18 Jun 22 2020 System Board and it booted from vmedia as expected, I then upgraded it to iLO 5 2.18 Jun 22 2020 System Board System ROM U30 v2.40 (10/26/2020) System Board and it again booted from vmedia as expected, I'll look again at the original system to compare the two
This should be ok for retesting, it looks like the problem was old entries in the boot manager https://storyboard.openstack.org/#!/story/2008763
verified on 4.8.0-0.nightly-2021-04-03-092337 Twice run deployment using redfish-virtualmedia and with provisioning network disabled - both attempts completed successfully
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438