Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1985483

Summary: Cleaning a BMH deployed using live ISO results in a TLS failure
Product: OpenShift Container Platform Reporter: Ian Main <imain>
Component: Bare Metal Hardware ProvisioningAssignee: Dmitry Tantsur <dtantsur>
Bare Metal Hardware Provisioning sub component: ironic QA Contact: Lubov <lshilin>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: lshilin, rbartal, rpittau, zbitter
Version: 4.9Keywords: Triaged
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:40:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Screenshot of SSL error. none

Description Ian Main 2021-07-23 16:53:18 UTC
Created attachment 1804957 [details]
Screenshot of SSL error.

Description of problem:

We have been working on MPINSTALL-1, the ability to perform redfish+virtualmedia deployments outside of the provisioning network.  We've successfully provisioned hosts both on the provisioning network and outside of it.  However when performing deprovisioning we are seeing an SSL error.  See attachment for a screenshot of the error.

Version-Release number of selected component (if applicable):


How reproducible:

100%

Steps to Reproduce:
1. Set virtualMediaViaExternalNetwork = true in provisioning CR
2. Provision a new baremetal host.
3. Delete the new baremetal host after it has successfully started.

Actual results:

Watch deprovisioning stall forever.  Check host for SSL error during cleaning phase.

Expected results:

Deprovisioning works.

Additional info:

Comment 1 Dmitry Tantsur 2021-07-26 17:00:02 UTC
> However when performing deprovisioning we are seeing an SSL error.

On all hosts, both internal and external?

Could you please attach all logs from the Metal3 pod?

Additionally, would it be possible to log into the ramdisk and fetch the complete logs? BMO should have a parameter to set an SSH key. If not, maybe make a video of booting? I lack some information that preceds the screenshot.

Comment 2 Zane Bitter 2021-07-26 17:49:38 UTC
*** Bug 1986118 has been marked as a duplicate of this bug. ***

Comment 3 Ian Main 2021-07-26 18:38:02 UTC
(In reply to Dmitry Tantsur from comment #1)
> > However when performing deprovisioning we are seeing an SSL error.
> 
> On all hosts, both internal and external?

Yes.  Once you set the external IP option it fails since this changes the 
external_callback_url and callback_endpoint_override in the ironic configuration.
Yes it fails for both locally provisioned and externally provisioned hosts as they
both use the same new callbacks.  The certs generated by CBO are for the provisioning
 IP and not the external IP so SSL validation fails.
 
> Could you please attach all logs from the Metal3 pod?

I don't think there's anything useful in there.

> Additionally, would it be possible to log into the ramdisk and fetch the
> complete logs? BMO should have a parameter to set an SSH key. If not, maybe
> make a video of booting? I lack some information that preceds the screenshot.

None of the extra kernel params in the ironic.conf seem to be taking effect for
 the cleaning operation.  I edited the grub command line to add a console and 
managed to get a log.  The boot params are:

[    0.000000] Command line: BOOT_IMAGE=/vmlinuz root=/dev/ram0 text ipa-api-url=https://192.168.111.21:6385 ipa-agent-token=igSKNii0jy2mqgcNe17_pzYvrWUpq5FQpMmdLgyho3o ipa-
debug=1 boot_method=vmedia console=ttyS0 vga=normal nomodeset --

(I added console/vga/nomodeset by hand)

If I add ipa-insecure=1 to the boot via grub it can talk to ironic API just fine.  
So somehow there are no extra kernel params being set for cleaning.

Comment 4 Zane Bitter 2021-07-26 22:06:00 UTC
I tracked this down to a regression in ironic caused by:

https://review.opendev.org/q/I25c28df048c706f0c5b013b4d252f09d5a7e57bd

The BMO sets the deploy_interface to "ramdisk" whenever the image format is "live-iso". After that patch, ironic returns "ramdisk" from get_boot_option(). And when that happens, ironic ignores the kernel arguments in the config file and substitutes hard-coded ones consistent with what we are seeing ("root=/dev/ram0 text"):

https://opendev.org/openstack/ironic/src/branch/master/ironic/drivers/modules/image_utils.py#L466-L469


So this issue affects all deployments where the image format is "live-iso", which in practice I think means ZTP.
The patch in Ironic was backported to stable/wallaby and bugfix/18.0 branches, so some previous releases might be affected.

Comment 5 Dmitry Tantsur 2021-07-27 08:26:07 UTC
The live ISO workflow does not use cleaning (or IPA at all), and this bug is not about the live ISO (or so I was told).

I will double-check how cleaning works with the ramdisk deploy.

Comment 6 Dmitry Tantsur 2021-07-27 08:47:09 UTC
The regression is hopefully fixed by https://review.opendev.org/c/openstack/ironic/+/802437, however I need to understand what you're trying to do. The live ISO workflow is reserved for assisted installer, which doesn't use cleaning or inspection. If you do not use the live ISO workflow, I'll still need the ironic-conductor logs for investigation.

Comment 7 Zane Bitter 2021-07-27 14:53:55 UTC
It may be that there are no customer scenarios affected. Ian found the issue by hand-testing with a live-iso image (which is a thing that ought to work upstream at least) for convenience, on the (evidently mistaken) assumption that it would work the same for those purposes.

Comment 8 Dmitry Tantsur 2021-07-27 15:38:00 UTC
Okay, I will fix it, but with a lower priority.

Comment 9 Dmitry Tantsur 2021-09-07 12:34:39 UTC
The fixed package is available in 4.9.

Comment 13 Lubov 2021-09-30 20:01:41 UTC
verified on 4.9.0-0.nightly-2021-09-29-172320

Comment 15 errata-xmlrpc 2021-10-18 17:40:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759