Bug 1973724
Summary: | metal3 Pod cannot download RHCOS images using the provisioning network anymore | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Denis Ollier <dollierp> |
Component: | Bare Metal Hardware Provisioning | Assignee: | Angus Salkeld <asalkeld> |
Bare Metal Hardware Provisioning sub component: | cluster-baremetal-operator | QA Contact: | Lubov <lshilin> |
Status: | CLOSED ERRATA | Docs Contact: | Padraig O'Grady <pogrady> |
Severity: | medium | ||
Priority: | medium | CC: | aos-bugs, asalkeld, beth.white, fdeutsch, pogrady, rbartal, zbitter |
Version: | 4.8 | Keywords: | Triaged |
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
A change was made to stop provisioning services once control plane was deployed.
Consequence:
This caused the InitContainer `metal3-machine-os-downloader` of the metal3 Pod to fail to download the image.
Fix:
The order of creating InitContainers has been changed to so that static-ip-set happens prior to the image download.
Result:
Image download happens as expected.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-18 17:35:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Denis Ollier
2021-06-18 14:59:11 UTC
Given that we use the "Recreate" deployment strategy, I don't see any reason that we couldn't acquire the VIP before doing the OS download. (In reply to Zane Bitter from comment #2) > Given that we use the "Recreate" deployment strategy, I don't see any reason > that we couldn't acquire the VIP before doing the OS download. I agree probably the fix is to reorder the initContainers, so that static-ip-set happens prior to the image download. That was previously discussed on https://bugzilla.redhat.com/show_bug.cgi?id=1847142#c2 As I mention there, just switching the order may not be enough, because we set the connection lifetime to 300s in the initContainer: https://github.com/openshift/ironic-static-ip-manager/blob/master/set-static-ip#L37 The expectation is that the refresh-static-ip later refreshes that, but if the RHCOS download takes more than 300s it's possible the connection could be interrupted. That said, given that the default is to download from an external URL via the controlplane network, switching the order is probably reasonable - in the cases where this is set to the provisioning network it's very likely to be referencing a locally cached image, thus the download shouldn't take more than 300s. verified on 4.9.0-0.nightly-2021-07-04-140102 from metal3-machine-os-downloader container log + curl -g --compressed -L --connect-timeout 120 -o rhcos-48.84.202105190318-0-openstack.x86_64.qcow2.gz http://172.22.0.1/rhcos/rhcos-48.84.202105190318-0-openstack.x86_64.qcow2.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 963M 100 963M 0 0 910M 0 0:00:01 0:00:01 --:--:-- 910M Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |