Description of problem: e2e-metal-ipi-ovn-ipv6 is failing on the latest cri-o / RHCOS builds, bootstrapping failures started happening after we bumped the RHEL images to 8.4 beta. Kubelet is reporting: ./journals/kubelet.log:Apr 30 17:39:26 localhost kubelet.sh[3129]: E0430 17:39:26.406727 3183 kuberuntime_sandbox.go:68] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = error creating pod sandbox with name \"k8s_etcd-bootstrap-member-localhost_openshift-etcd_505683813579901bb5fed1eab2bf616d_0\": Error initializing source docker://k8s.gcr.io/pause:3.5: error pinging docker registry k8s.gcr.io: Get \"https://k8s.gcr.io/v2/\": dial tcp [2607:f8b0:400e:c07::52]:443: i/o timeout" pod="openshift-etcd/etcd-bootstrap-member-localhost" We noticed https://github.com/cri-o/cri-o/pull/4550 -- which seems related. How does this typically work for disconnected environments, do you ship the pause image with cri-o somehow? I don't see it as part of the release payload, or in crictl images. Version-Release number of selected component (if applicable): cri-o-1.21.0-81.rhaos4.8.gitbc63075.el8.x86_64 How reproducible: Always Steps to Reproduce: 1. Install disconnected environment Actual results: Bootstraping fails with kubelet unable to fetch k8s pause image Expected results: Bootstrap succeeds Additional info:
This is blocking new nightlies.
So bootstrap overrides this with /etc/kubernetes/kubelet-pause-image-override, which uses the 'pod' image from the release payload. I believe to make this work is you need to carry the 3.5 changes from https://github.com/kubernetes/kubernetes/pull/100292 in openshift/kubernetes
I think this will be fixed by the attached PR (updating bootstrap process to handle new crio config behavior)
Tested with payload 4.8.0-0.nightly-2021-05-06-210840. Deployed an IPI on AWS with disconnected environment. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-06-210840 True False 28m Cluster version is 4.8.0-0.nightly-2021-05-06-210840 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-55-209.us-east-2.compute.internal Ready worker 43m v1.21.0-rc.0+291e731 10.0.55.209 <none> Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa) 4.18.0-293.el8.x86_64 cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8 ip-10-0-58-66.us-east-2.compute.internal Ready master 55m v1.21.0-rc.0+291e731 10.0.58.66 <none> Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa) 4.18.0-293.el8.x86_64 cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8 ip-10-0-61-121.us-east-2.compute.internal Ready worker 43m v1.21.0-rc.0+291e731 10.0.61.121 <none> Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa) 4.18.0-293.el8.x86_64 cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8 ip-10-0-61-45.us-east-2.compute.internal Ready master 55m v1.21.0-rc.0+291e731 10.0.61.45 <none> Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa) 4.18.0-293.el8.x86_64 cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8 ip-10-0-68-220.us-east-2.compute.internal Ready master 55m v1.21.0-rc.0+291e731 10.0.68.220 <none> Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa) 4.18.0-293.el8.x86_64 cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8 ip-10-0-75-161.us-east-2.compute.internal Ready worker 43m v1.21.0-rc.0+291e731 10.0.75.161 <none> Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa) 4.18.0-293.el8.x86_64 cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8 sh-4.4# cat /etc/crio/crio.conf.d/00-default | grep pause_image pause_image = "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c29777c0c62fa28843ae7be123e65d844dc48842db0ed50b7f7f3cb1c29caa15" pause_image_auth_file = "/var/lib/kubelet/config.json"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438