Description of problem: When using BMH to boot a live iso from an HTTPS source served by an OpenShift route, the image fails to be attached and boot.
Version-Release number of selected component (if applicable): 4.8
How reproducible: 100%
Steps to Reproduce:
1. Create a Deployment with a webserver serving a bootable ISO
2. Create a Service for it
3. Expose the Service with a Route
4. Put the https URL in the image section of a BMH object as a live iso
Ironic fails to download the image for caching due to failing to verify the OpenShift default ingress certificate.
The image is attached to the machine represented by the BMH object where the ISO was set
From earlier discussion, there seems to be a way to prevent ironic from downloading the image with passthrough, though it may not be available at the moment in the "ramdisk" deploy that the Bare Metal Operatore live-iso functionality uses.
The current ironic default to cachine is due to:
* Wanting to ensure image reachability
* Ensuring the image is available until the machine is undeployed, since some BMC access the image in chunks as needed. In the Central Hub Management the need is probably to ensure that the ISO remains available in the assisted service endpoint until the reboot.
Another angle that was discussed at the end of last week was that the baremetal operator would adapt the way it deploys ironic so that it mounts the configmap/secret that contains the default Ingress certificate. This would allow it to fetch the image via HTTPS and continue to serve it via HTTP to the BMC (since Ironic does not currently provide certificates to BMC for HTTPS fetching).
The naive fix is https://github.com/metal3-io/ironic-image/pull/255, which just makes the image downloading code respect IRONIC_INSECURE. I haven't looked into accepting the actual certificate (no time until Friday).
We've discussed this on the bug review. Apparently CBO already mounts certificates to each container, we only need to tell Ironic where they are. The PR above should be reworked to accept the path (or False) via a variable, then we can set it in CBO.
Added https://github.com/metal3-io/ironic-image/pull/258 to allow setting webserver_verify_ca to a path.
We still need a fix to cluster-baremetal-operator to set the WEBSERVER_CACERT_FILE. Moving this back to POST.
Our testing for this issue failed for a few reasons.
1. The file configured in https://github.com/openshift/cluster-baremetal-operator/pull/139 doesn't exist
- The mounted configmap (cbo-trusted-ca) contains a key named "ca-bundle.crt" not "trusted-ca"
2. The ca bundle in that configmap doesn't configure trust with the default ingress cert for the cluster.
- This can be tested by using the contents of the configmap to curl from an https route.
During the debugging session we determined that it might be easier to configure the image url to point to the cluster internal service. In this case the assisted service is using a service signing cert and the CA bundle for that cert can be retrieved using the steps in https://docs.openshift.com/container-platform/4.7/security/certificate_types_descriptions/service-ca-certificates.html
Alternatively, to fix this bug as reported (accessing the route) the default ingress certificate would need to be fetched from the openshift-config-managed namespace as described in https://docs.openshift.com/container-platform/4.7/security/certificate_types_descriptions/ingress-certificates.html#workflow
@Bob let me know which direction you want to go and we can adjust accordingly.
To close on the decision for implementation, the patch to set the CA was removed from CBO so only the ironic-image patch remains which will disable cert verification.
This is the note from Stephen in the CBO revert:
This is the wrong cert to use for the assisted images, it uses the service ca, this needs more investigation in another release to make the service, ingress, and trusted CA bundles available to our containers, as well as investigating using TLS for the httpd hosted by Ironic.
I tested with 4.8.0-0.nightly-2021-05-10-151546 and ironic was able to pull from an https unsecure registry (in this case the assisted service pod https api).
@Lubov - Is there any other validation you want to do or can we consider this verified? (Just asking because you are assigned QA on this one)
good for me
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.