Description of problem: Version-Release number of the following components: 4.3.0-0.nightly-2019-11-07-172437 How reproducible: Always Steps to Reproduce: 1. Create a disconnected network env. 2. Mirror payload into local private registry 3. Run a UPI install on baremetal. Actual results: Bootstrap failed. From kubelet log, saw this: Nov 08 06:32:23 qe-gpei-disbz-fc969-bootstrap-0 hyperkube[1736]: E1108 06:32:23.937243 1736 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default(8ab21cd99e1602159ccf69d69e2bc346)" failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default_8ab21cd99e1602159ccf69d69e2bc346_0": Error initializing source docker://k8s.gcr.io/pause:3.1: pinging docker registry returned: Get https://k8s.gcr.io/v2/: dial tcp 209.85.144.82:443: i/o timeout Nov 08 06:32:23 qe-gpei-disbz-fc969-bootstrap-0 hyperkube[1736]: E1108 06:32:23.937264 1736 kuberuntime_manager.go:710] createPodSandbox for pod "bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default(8ab21cd99e1602159ccf69d69e2bc346)" failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_bootstrap-machine-config-operator-qe-gpei-disbz-fc969-bootstrap-0_default_8ab21cd99e1602159ccf69d69e2bc346_0": Error initializing source docker://k8s.gcr.io/pause:3.1: pinging docker registry returned: Get https://k8s.gcr.io/v2/: dial tcp 209.85.144.82:443: i/o timeout Expected results: Pulling image from private mirror registry, installation get completed. Additional info: Similar bug - https://bugzilla.redhat.com/show_bug.cgi?id=1711844 already is fixed, so this is a regression bug. This is blocking QE's restricted network testing.
Colin, was the removal of the pause image logic in https://github.com/openshift/installer/pull/1768/files intentional?
It wasn't removed, just moved right? That said, it could be broken...let me see.
Ahh, correct, it was technically moved to crio-configure.sh.template. Does seem like there may be an issue. Thanks for taking a look!
Just did a quick test with installer master, `systemctl status crio-configure` looks fine, and ``` $ grep pause_image /etc/crio/crio.conf pause_image = "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e3f70f20ce6be55711b54bc266019d215963935779c178e52ea4bc717da58508" ``` also looks right. Oh but...I do see `k8s.gcr.io/pause` in `podman images`...
And further, I *don't* see the configured pause image in `podman images`. This looks like crio is ignoring it...another config file compat issue?
Possibly related to https://github.com/openshift/machine-config-operator/pull/1216 ? Some sort of crio config file format change?
I did verify the cluster nodes look fine, so this is just the bootstrap.
CRI-O defaults to 'k8s.gcr.io/pause:3.1" for the pause image when it doesn't find anything set for it in crio.conf or the --pause-image flag. So looks like something changed and the actual pause image value is not being set in the cri-o.conf over here for the bootstrap node.
Verified this bug with 4.3.0-0.nightly-2019-11-12-000306, and PASS.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062