Bug 1770101 - Kubelet cannot pull k8s.gcr.io/pause:3.1 image on bootpstrap node
Summary: Kubelet cannot pull k8s.gcr.io/pause:3.1 image on bootpstrap node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.3.0
Assignee: Colin Walters
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks: 1741391
TreeView+ depends on / blocked
 
Reported: 2019-11-08 06:36 UTC by Johnny Liu
Modified: 2020-01-23 11:12 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:11:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:12:09 UTC

Description Johnny Liu 2019-11-08 06:36:48 UTC
Description of problem:

Version-Release number of the following components:
4.3.0-0.nightly-2019-11-07-172437

How reproducible:
Always

Steps to Reproduce:
1. Create a disconnected network env.
2. Mirror payload into local private registry
3. Run a UPI install on baremetal.

Actual results:
Bootstrap failed. 
From kubelet log, saw this:
Nov 08 06:32:23 qe-gpei-disbz-fc969-bootstrap-0 hyperkube[1736]: E1108 06:32:23.937243    1736 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default(8ab21cd99e1602159ccf69d69e2bc346)" failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default_8ab21cd99e1602159ccf69d69e2bc346_0": Error initializing source docker://k8s.gcr.io/pause:3.1: pinging docker registry returned: Get https://k8s.gcr.io/v2/: dial tcp 209.85.144.82:443: i/o timeout
Nov 08 06:32:23 qe-gpei-disbz-fc969-bootstrap-0 hyperkube[1736]: E1108 06:32:23.937264    1736 kuberuntime_manager.go:710] createPodSandbox for pod "bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default(8ab21cd99e1602159ccf69d69e2bc346)" failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default_8ab21cd99e1602159ccf69d69e2bc346_0": Error initializing source docker://k8s.gcr.io/pause:3.1: pinging docker registry returned: Get https://k8s.gcr.io/v2/: dial tcp 209.85.144.82:443: i/o timeout


Expected results:
Pulling image from private mirror registry, installation get completed.

Additional info:
Similar bug - https://bugzilla.redhat.com/show_bug.cgi?id=1711844 already is fixed, so this is a regression bug.

This is blocking QE's restricted network testing.

Comment 1 Brenton Leanhardt 2019-11-08 14:33:38 UTC
Colin, was the removal of the pause image logic in https://github.com/openshift/installer/pull/1768/files intentional?

Comment 2 Colin Walters 2019-11-08 15:14:55 UTC
It wasn't removed, just moved right?

That said, it could be broken...let me see.

Comment 3 Brenton Leanhardt 2019-11-08 15:17:57 UTC
Ahh, correct, it was technically moved to crio-configure.sh.template.  Does seem like there may be an issue.  Thanks for taking a look!

Comment 4 Colin Walters 2019-11-08 15:28:29 UTC
Just did a quick test with installer master, `systemctl status crio-configure` looks fine, and 

```
$ grep pause_image /etc/crio/crio.conf 
pause_image = "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e3f70f20ce6be55711b54bc266019d215963935779c178e52ea4bc717da58508"
```

also looks right.

Oh but...I do see `k8s.gcr.io/pause` in `podman images`...

Comment 5 Colin Walters 2019-11-08 15:31:30 UTC
And further, I *don't* see the configured pause image in `podman images`.  This looks like crio is ignoring it...another config file compat issue?

Comment 6 Colin Walters 2019-11-08 15:33:59 UTC
Possibly related to https://github.com/openshift/machine-config-operator/pull/1216 ?

Some sort of crio config file format change?

Comment 7 Colin Walters 2019-11-08 15:36:46 UTC
I did verify the cluster nodes look fine, so this is just the bootstrap.

Comment 8 Urvashi Mohnani 2019-11-08 17:40:55 UTC
CRI-O defaults to 'k8s.gcr.io/pause:3.1" for the pause image when it doesn't find anything set for it in crio.conf or the --pause-image flag. So looks like something changed and the actual pause image value is not being set in the cri-o.conf over here for the bootstrap node.

Comment 11 Johnny Liu 2019-11-12 05:16:19 UTC
Verified this bug with 4.3.0-0.nightly-2019-11-12-000306, and PASS.

Comment 13 errata-xmlrpc 2020-01-23 11:11:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.