Bug 1879034

Summary: Ironic boot_mode capabilities can end up empty when using bootMode: UEFI on baremetal host
Product: OpenShift Container Platform Reporter: Asher Shoshan <ashoshan>
Component: InstallerAssignee: Doug Hellmann <dhellmann>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Lubov <lshilin>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: amalykhi, bnemec, dhellmann, rbartal, stbenjam, zbitter
Version: 4.6Keywords: Triaged, UpcomingSprint
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:41:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Asher Shoshan 2020-09-15 09:18:06 UTC
Description of problem:
Deploy 4.6 cluster with "provsioningNetework: Disabled", control plane  created, however worker nodes failed to provision.

Version-Release number of the following components:

How reproducible:

Steps to Reproduce:
1. deploy 4.6 cluster with "provisioningNetwork: Disabled", and redfish-virtualmedia in BMC address (in install-config.yaml) 
2.
3.

Actual results:
workers are not provisioned, and in "inspecting" state

Expected results:
workers to be provisioned

Additional info:
excerpt of metal3 pod (ns machine-config-api) container  metal3-ironic-conductor log:

2020-09-15 09:14:02.019 1 ERROR ironic.common.images [req-6fe71de1-2f0c-4980-bbff-a29f3a8cac39 ironic-user - - - -] Creating the filesystem root failed.: FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/syslinux/isolinux.bin'
2020-09-15 09:14:02.019 1 ERROR ironic.common.images Traceback (most recent call last):
2020-09-15 09:14:02.019 1 ERROR ironic.common.images   File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 269, in create_isolinux_image_for_bios
2020-09-15 09:14:02.019 1 ERROR ironic.common.images     _create_root_fs(tmpdir, files_info)
2020-09-15 09:14:02.019 1 ERROR ironic.common.images   File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 65, in _create_root_fs
2020-09-15 09:14:02.019 1 ERROR ironic.common.images     shutil.copyfile(src_file, target_file)
2020-09-15 09:14:02.019 1 ERROR ironic.common.images   File "/usr/lib64/python3.6/shutil.py", line 120, in copyfile
2020-09-15 09:14:02.019 1 ERROR ironic.common.images     with open(src, 'rb') as fsrc:
2020-09-15 09:14:02.019 1 ERROR ironic.common.images FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/syslinux/isolinux.bin'
2020-09-15 09:14:02.019 1 ERROR ironic.common.images 
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector [req-6fe71de1-2f0c-4980-bbff-a29f3a8cac39 ironic-user - - - -] Unable to start managed inspection for node f7842608-7caa-42fe-81c2-4d5c8e8c9c80: Creating iso image failed: [Errno 2] No such file or directory: '/usr/lib/syslinux/isolinux.bin': ironic.common.exception.ImageCreationFailed: Creating iso image failed: [Errno 2] No such file or directory: '/usr/lib/syslinux/isolinux.bin'
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector Traceback (most recent call last):
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector   File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 269, in create_isolinux_image_for_bios
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector     _create_root_fs(tmpdir, files_info)
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector   File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 65, in _create_root_fs
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector     shutil.copyfile(src_file, target_file)
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector   File "/usr/lib64/python3.6/shutil.py", line 120, in copyfile
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector     with open(src, 'rb') as fsrc:
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/syslinux/isolinux.bin'
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector 
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector During handling of the above exception, another exception occurred:
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector 
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector Traceback (most recent call last):
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/inspector.py", line 204, in _start_managed_inspection
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector     task.driver.boot.prepare_ramdisk(task, ramdisk_params=params)
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/boot.py", line 895, in prepare_ramdisk
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector     iso_ref = _prepare_deploy_iso(task, ramdisk_params, mode)
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/boot.py", line 644, in _prepare_deploy_iso
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector     return prepare_iso_image()
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector   File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/boot.py", line 562, in _prepare_iso_image
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector     base_iso=base_iso)
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector   File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 620, in create_boot_iso
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector     kernel_params=params, configdrive=configdrive_path)
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector   File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 273, in create_isolinux_image_for_bios
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector     raise exception.ImageCreationFailed(image_type='iso', error=e)
2020-09-15 09:14:02.389 1 ERROR ironic.drivers.modules.inspector ironic.common.exception.ImageCreationFailed: Creating iso image failed: [Errno 2] No such file or directory: '/usr/lib/syslinux/isolinux.bin'

Comment 4 Stephen Benjamin 2020-09-15 11:03:06 UTC
This  isn't related to the disabled provisioning network, for some reason baremetal-operator isn't setting capabilities correctly to tell the host we're doing a UEFI boot.

BIOS-based virtualmedia provisioning won't work due to https://bugzilla.redhat.com/show_bug.cgi?id=1862608#
 

Here's the properites field of the Ironic host:

$ curl -g -X GET --user ironic-user:XXXXXXXX http://X.X.X.X:6385/v1/nodes/c4ccee33-20c3-471e-bf97-b20a8c730633 -H "Accept: application/json" -H "Content-Type: application/json" -H "X-Openstack-Ironic-API-Version: 1.67"  | jq ".properties"
shows:
{
  "capabilities": ""
}

However, the BMH correctly shows bootMode: UEFI.


@Doug: could there be some kind of race here and somehow capabilities ends up empty?

Comment 5 Doug Hellmann 2020-09-15 19:56:11 UTC
It looks like the problem is that we only set the boot mode in ironic before we provision, and not before we start inspection. See https://github.com/metal3-io/baremetal-operator/pull/635

Comment 6 Stephen Benjamin 2020-09-15 23:42:58 UTC
That's really odd because the e2e-metal-ipi-virtualmedia job was passing, using UEFI virtual media, did something change in Ironic or BMO?

Comment 7 Doug Hellmann 2020-09-16 20:08:00 UTC
(In reply to Stephen Benjamin from comment #6)
> That's really odd because the e2e-metal-ipi-virtualmedia job was passing,
> using UEFI virtual media, did something change in Ironic or BMO?

I had the impression that the boot mode setting for a VM didn't matter as much as it might for a physical host.

Regardless, the patch mentioned in comment 5 is updating a gap we've always had in the original implementation.

Comment 9 Lubov 2020-09-30 15:46:53 UTC
@Doug I run deployment with redgish-virtualmedia for provsioningNetework: Disabled and bootMode UEFI - it passed with no problem

But I understand that's not enough to verify this BZ

Could U, please, suggest what else should be verified?

Comment 10 Doug Hellmann 2020-09-30 16:06:17 UTC
The original issue was with the timing of when we passed the boot mode to ironic. Before the fix, we only told ironic which boot mode to use when we were provisioning. That meant the host could fail to boot properly for inspection. To verify that ironic has the correct boot mode during inspection you could look at the node settings in ironic while the host is being inspected and verify that the value of /properties/capabilities includes a boot_mode value that matches the setting in the BareMetalHost resource.

Comment 11 Lubov 2020-10-06 09:53:53 UTC
Verified on 4.6.0-0.nightly-2020-10-05-234751

While node is being inspected
(openstack-cli) [kni@provisionhost-0-0 ~]$ baremetal node show openshift-worker-0-2  
| properties             | {'capabilities': 'boot_mode:uefi'}

Comment 15 errata-xmlrpc 2020-10-27 16:41:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196