Bug 1956281 - Disconnected installs are failing with kubelet trying to pause image from the internet
Summary: Disconnected installs are failing with kubelet trying to pause image from the...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.0
Assignee: Peter Hunt
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-03 11:42 UTC by Stephen Benjamin
Modified: 2021-07-27 23:06 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:05:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4898 0 None open bug 1956281: crio: fix bootstrap given new crio config behavior 2021-05-04 16:00:09 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:06:24 UTC

Description Stephen Benjamin 2021-05-03 11:42:15 UTC
Description of problem:

e2e-metal-ipi-ovn-ipv6 is failing on the latest cri-o / RHCOS builds, bootstrapping failures started happening after we bumped the RHEL images to 8.4 beta.

Kubelet is reporting:

./journals/kubelet.log:Apr 30 17:39:26 localhost kubelet.sh[3129]: E0430 17:39:26.406727    3183 kuberuntime_sandbox.go:68] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = error 		creating pod sandbox with name \"k8s_etcd-bootstrap-member-localhost_openshift-etcd_505683813579901bb5fed1eab2bf616d_0\": Error initializing source docker://k8s.gcr.io/pause:3.5: error pinging docker 	registry k8s.gcr.io: Get \"https://k8s.gcr.io/v2/\": dial tcp [2607:f8b0:400e:c07::52]:443: i/o timeout" pod="openshift-etcd/etcd-bootstrap-member-localhost"

We noticed  https://github.com/cri-o/cri-o/pull/4550 -- which seems related.  How does this typically work for disconnected environments, do you ship the pause image with cri-o somehow? I don't see it as part of the release payload, or in crictl images.

Version-Release number of selected component (if applicable):

cri-o-1.21.0-81.rhaos4.8.gitbc63075.el8.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Install disconnected environment


Actual results:

Bootstraping fails with kubelet unable to fetch k8s pause image

Expected results:

Bootstrap succeeds


Additional info:

Comment 1 Stephen Benjamin 2021-05-03 11:42:48 UTC
This is blocking new nightlies.

Comment 2 Stephen Benjamin 2021-05-03 11:57:25 UTC
So bootstrap overrides this with /etc/kubernetes/kubelet-pause-image-override, which uses the 'pod' image from the release payload.  I believe to make this work is you need to carry the 3.5 changes from https://github.com/kubernetes/kubernetes/pull/100292 in openshift/kubernetes

Comment 4 Peter Hunt 2021-05-03 15:50:34 UTC
I think this will be fixed by the attached PR (updating bootstrap process to handle new crio config behavior)

Comment 6 Sunil Choudhary 2021-05-07 11:34:40 UTC
Tested with payload 4.8.0-0.nightly-2021-05-06-210840.

Deployed an IPI on AWS with disconnected environment.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-05-06-210840   True        False         28m     Cluster version is 4.8.0-0.nightly-2021-05-06-210840

$ oc get nodes -o wide
NAME                                        STATUS   ROLES    AGE   VERSION                INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION          CONTAINER-RUNTIME
ip-10-0-55-209.us-east-2.compute.internal   Ready    worker   43m   v1.21.0-rc.0+291e731   10.0.55.209   <none>        Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa)   4.18.0-293.el8.x86_64   cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8
ip-10-0-58-66.us-east-2.compute.internal    Ready    master   55m   v1.21.0-rc.0+291e731   10.0.58.66    <none>        Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa)   4.18.0-293.el8.x86_64   cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8
ip-10-0-61-121.us-east-2.compute.internal   Ready    worker   43m   v1.21.0-rc.0+291e731   10.0.61.121   <none>        Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa)   4.18.0-293.el8.x86_64   cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8
ip-10-0-61-45.us-east-2.compute.internal    Ready    master   55m   v1.21.0-rc.0+291e731   10.0.61.45    <none>        Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa)   4.18.0-293.el8.x86_64   cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8
ip-10-0-68-220.us-east-2.compute.internal   Ready    master   55m   v1.21.0-rc.0+291e731   10.0.68.220   <none>        Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa)   4.18.0-293.el8.x86_64   cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8
ip-10-0-75-161.us-east-2.compute.internal   Ready    worker   43m   v1.21.0-rc.0+291e731   10.0.75.161   <none>        Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa)   4.18.0-293.el8.x86_64   cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8


sh-4.4# cat /etc/crio/crio.conf.d/00-default | grep pause_image
pause_image = "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c29777c0c62fa28843ae7be123e65d844dc48842db0ed50b7f7f3cb1c29caa15"
pause_image_auth_file = "/var/lib/kubelet/config.json"

Comment 9 errata-xmlrpc 2021-07-27 23:05:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.