1770101 – Kubelet cannot pull k8s.gcr.io/pause:3.1 image on bootpstrap node

Bug 1770101 - Kubelet cannot pull k8s.gcr.io/pause:3.1 image on bootpstrap node

Summary: Kubelet cannot pull k8s.gcr.io/pause:3.1 image on bootpstrap node

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.3.0
Assignee:	Colin Walters
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1741391
TreeView+	depends on / blocked

Reported:	2019-11-08 06:36 UTC by Johnny Liu
Modified:	2020-01-23 11:12 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-23 11:11:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:0062	0	None	None	None	2020-01-23 11:12:09 UTC

Description Johnny Liu 2019-11-08 06:36:48 UTC

Description of problem:

Version-Release number of the following components:
4.3.0-0.nightly-2019-11-07-172437

How reproducible:
Always

Steps to Reproduce:
1. Create a disconnected network env.
2. Mirror payload into local private registry
3. Run a UPI install on baremetal.

Actual results:
Bootstrap failed. 
From kubelet log, saw this:
Nov 08 06:32:23 qe-gpei-disbz-fc969-bootstrap-0 hyperkube[1736]: E1108 06:32:23.937243    1736 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default(8ab21cd99e1602159ccf69d69e2bc346)" failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default_8ab21cd99e1602159ccf69d69e2bc346_0": Error initializing source docker://k8s.gcr.io/pause:3.1: pinging docker registry returned: Get https://k8s.gcr.io/v2/: dial tcp 209.85.144.82:443: i/o timeout
Nov 08 06:32:23 qe-gpei-disbz-fc969-bootstrap-0 hyperkube[1736]: E1108 06:32:23.937264    1736 kuberuntime_manager.go:710] createPodSandbox for pod "bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default(8ab21cd99e1602159ccf69d69e2bc346)" failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default_8ab21cd99e1602159ccf69d69e2bc346_0": Error initializing source docker://k8s.gcr.io/pause:3.1: pinging docker registry returned: Get https://k8s.gcr.io/v2/: dial tcp 209.85.144.82:443: i/o timeout


Expected results:
Pulling image from private mirror registry, installation get completed.

Additional info:
Similar bug - https://bugzilla.redhat.com/show_bug.cgi?id=1711844 already is fixed, so this is a regression bug.

This is blocking QE's restricted network testing.

Comment 1 Brenton Leanhardt 2019-11-08 14:33:38 UTC

Colin, was the removal of the pause image logic in https://github.com/openshift/installer/pull/1768/files intentional?

Comment 2 Colin Walters 2019-11-08 15:14:55 UTC

It wasn't removed, just moved right?

That said, it could be broken...let me see.

Comment 3 Brenton Leanhardt 2019-11-08 15:17:57 UTC

Ahh, correct, it was technically moved to crio-configure.sh.template.  Does seem like there may be an issue.  Thanks for taking a look!

Comment 4 Colin Walters 2019-11-08 15:28:29 UTC

Just did a quick test with installer master, `systemctl status crio-configure` looks fine, and 

```
$ grep pause_image /etc/crio/crio.conf 
pause_image = "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e3f70f20ce6be55711b54bc266019d215963935779c178e52ea4bc717da58508"
```

also looks right.

Oh but...I do see `k8s.gcr.io/pause` in `podman images`...

Comment 5 Colin Walters 2019-11-08 15:31:30 UTC

And further, I *don't* see the configured pause image in `podman images`.  This looks like crio is ignoring it...another config file compat issue?

Comment 6 Colin Walters 2019-11-08 15:33:59 UTC

Possibly related to https://github.com/openshift/machine-config-operator/pull/1216 ?

Some sort of crio config file format change?

Comment 7 Colin Walters 2019-11-08 15:36:46 UTC

I did verify the cluster nodes look fine, so this is just the bootstrap.

Comment 8 Urvashi Mohnani 2019-11-08 17:40:55 UTC

CRI-O defaults to 'k8s.gcr.io/pause:3.1" for the pause image when it doesn't find anything set for it in crio.conf or the --pause-image flag. So looks like something changed and the actual pause image value is not being set in the cri-o.conf over here for the bootstrap node.

Comment 11 Johnny Liu 2019-11-12 05:16:19 UTC

Verified this bug with 4.3.0-0.nightly-2019-11-12-000306, and PASS.

Comment 13 errata-xmlrpc 2020-01-23 11:11:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062

Note You need to log in before you can comment on or make changes to this bug.