Customer Contact Name:
Description of Problem:
This problem is caused by the recent merge of Day1 networking feature into OCP.
Summary of Day1 networking
Related changes to this issue
In previous OCP, we used initrd and kernel to PXE boot.
But this method was changed to the coreos IPA(Ironic Python Agent) by Day1 networking feature.
This means a baremetal server is booted by coreos image.(not initrd and kernel)
By this changing, inspection failed when behind a proxy environment because there was no proxy variables set up for podman.
podman is running on coreos. So this problem does not occur in previous OCP(we don't use coreos previously when deploying).
This new way for running the IPA is booting coreos on the bare metal server and then running IPA as a container through podman, but the proxy variables were not set for podman although they were written in the install-config.yaml and that caused an image pull failure.
Version-Release number of selected component:
This issue was detected in the Pre-GA version.
Red Hat OpenShift Container Platform Version Number: 4.10
Release Number: 4.10.0-0.nightly-2021-12-20-231053
Kubernetes Version: 1.22.1
Cri-o Version: 1.23.0
Related Component: NONE
Related Middleware/Application: irmc
Underlying RHCOS Release Number: 4.10
Underlying RHCOS Architecture: x86_64
Underlying RHCOS Kernel Version: 4.18.0
Drivers or hardware or architecture dependency:
Step to Reproduce:
$ openshift-install --dir ~/clusterconfigs create manifests
$ openshift-install --dir ~/clusterconfigs --log-level debug create cluster
IPA image pull failed
IPA image can be pulled successfully
Summary of actions taken to resolve issue:
Fujitsu opened issue: https://github.com/openshift/installer/issues/5552
Fujitsu sent PR: https://github.com/openshift/image-customization-controller/pull/33
Location of diagnostic data:
Model: RX2540 M4
Upon discussion with the Metal Platform team we decided this qualifies as a blocker due to regression in use cases requiring use of proxy.
Our colleagues from Fujitsu who originally identified this issue have proposed fixes which are currently under review.
In addition to PRs which are aiming to resolve the proxy issue, the Metal Team is currently working on adding a validation / CI job that would ensure that the fixes proposed work as expected (this is tracked in https://github.com/openshift-metal3/dev-scripts/pull/1341).
The Team have made good progress with this BZ - with regards to fixes, we currently we have:
PR5569 is past reviews and hasn't merged only due to perma-failing tests. It is now waiting for a Staff Engineer to review and override CI allowing it to merge.
PR1341 (https://github.com/openshift-metal3/dev-scripts/pull/1341) which is aiming to add test coverage is still WIP however this is not a part of the fix - this can be finished as a follow up change post 4.10 Code Freeze.
https://github.com/openshift/installer/pull/5569 has just MERGED. I removed explicit linkage to https://github.com/openshift-metal3/dev-scripts/pull/1341 and setting the BZ to MODIFIED.
Verified the fix had no regression and deployment succeeded on IPv6 ctrplane network
Note - no reproduce of the issue itself was possible in QE env at that moment)
provisionhost-0-0 ~]$ more install-config.yaml
[kni@provisionhost-0-0 ~]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2022-01-27-104747 True False 5h3m Cluster version is 4.10.0-0.nightly-2022-01-27-104747
with all that the real verification wait for QE to implement iptables rules to enable connection outside only via Bastion host and via proxy
or fix been verified at customer environment
Hi, could you please report if the fix was working? Our test env to reproduce the original issue is still WIP.
Actually we were interested to understand the topology of your env, where proxy is the only gateway and restrictions you apply on your nodes. Thanks
Thank you for your reply.
> could you please report if the fix was working?
Yes, Fujitsu verified that this fix was working correctly.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.