Bug 1956281 - Disconnected installs are failing with kubelet trying to pause image from the internet
Summary: Disconnected installs are failing with kubelet trying to pause image from the...
Keywords:
Status: ON_QA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.0
Assignee: Peter Hunt
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-03 11:42 UTC by Stephen Benjamin
Modified: 2021-05-06 08:08 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4898 0 None open bug 1956281: crio: fix bootstrap given new crio config behavior 2021-05-04 16:00:09 UTC

Description Stephen Benjamin 2021-05-03 11:42:15 UTC
Description of problem:

e2e-metal-ipi-ovn-ipv6 is failing on the latest cri-o / RHCOS builds, bootstrapping failures started happening after we bumped the RHEL images to 8.4 beta.

Kubelet is reporting:

./journals/kubelet.log:Apr 30 17:39:26 localhost kubelet.sh[3129]: E0430 17:39:26.406727    3183 kuberuntime_sandbox.go:68] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = error 		creating pod sandbox with name \"k8s_etcd-bootstrap-member-localhost_openshift-etcd_505683813579901bb5fed1eab2bf616d_0\": Error initializing source docker://k8s.gcr.io/pause:3.5: error pinging docker 	registry k8s.gcr.io: Get \"https://k8s.gcr.io/v2/\": dial tcp [2607:f8b0:400e:c07::52]:443: i/o timeout" pod="openshift-etcd/etcd-bootstrap-member-localhost"

We noticed  https://github.com/cri-o/cri-o/pull/4550 -- which seems related.  How does this typically work for disconnected environments, do you ship the pause image with cri-o somehow? I don't see it as part of the release payload, or in crictl images.

Version-Release number of selected component (if applicable):

cri-o-1.21.0-81.rhaos4.8.gitbc63075.el8.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Install disconnected environment


Actual results:

Bootstraping fails with kubelet unable to fetch k8s pause image

Expected results:

Bootstrap succeeds


Additional info:

Comment 1 Stephen Benjamin 2021-05-03 11:42:48 UTC
This is blocking new nightlies.

Comment 2 Stephen Benjamin 2021-05-03 11:57:25 UTC
So bootstrap overrides this with /etc/kubernetes/kubelet-pause-image-override, which uses the 'pod' image from the release payload.  I believe to make this work is you need to carry the 3.5 changes from https://github.com/kubernetes/kubernetes/pull/100292 in openshift/kubernetes

Comment 4 Peter Hunt 2021-05-03 15:50:34 UTC
I think this will be fixed by the attached PR (updating bootstrap process to handle new crio config behavior)


Note You need to log in before you can comment on or make changes to this bug.