Description of problem: Try to install a cluster in a disconnect environment, the master machine can not get image from registry due to following error: Aug 03 11:19:33 ip-10-0-66-177 machine-config-daemon[1939]: error: failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-content-608521709 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:21f2620684e969a963316a44b413b9743a78dc83c47df80cc9f6a6acb120c57c failed: error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:21f2620684e969a963316a44b413b9743a78dc83c47df80cc9f6a6acb120c57c: Get "https://quay.io/v2/": Forbidden Aug 03 11:19:33 ip-10-0-66-177 machine-config-daemon[1939]: : exit status 1 Aug 03 11:19:33 ip-10-0-66-177 systemd[1]: machine-config-daemon-firstboot.service: Main process exited, code=exited, status=1/FAILURE Aug 03 11:19:33 ip-10-0-66-177 systemd[1]: machine-config-daemon-firstboot.service: Failed with result 'exit-code'. Aug 03 11:19:33 ip-10-0-66-177 systemd[1]: Failed to start Machine Config Daemon Firstboot. Aug 03 11:19:33 ip-10-0-66-177 systemd[1]: machine-config-daemon-firstboot.service: Consumed 857ms CPU time Failed to provision master node. Version-Release number of the following components: OCP 4.6.0-0.nightly-2020-08-03-054919 How reproducible: Always Steps to Reproduce: 1. Create a disconnect cluster Actual results: Create cluster successfully Expected results: Create cluster failed Additional info: 1. Reproduced problem on GCP, vSphere 2. The above error message does not appear when creating a 4.5 disconnect cluster
Created attachment 1703284 [details] machine-config-daemon-firstboot.service
Created attachment 1703285 [details] bootstrap log
this bug blocks all tests against disconnected environment
We see the same on all baremetal IPv6 jobs, which must be disconnected due to quay not supporting IPv6: Aug 03 12:25:32 master-0.ostest.test.metalkube.org machine-config-daemon[2451]: error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70cfcdee7fa0eac2578f32f197b410d0f50d5bb10ac56ba402eb758e50e76d04: Get "https://quay.io/v2/": dial tcp 34.198.42.182:443: connect: network is unreachable
I think this beings against MCO, which is the source of the problem. Either MCD is running an `oc` command that's not accounting for disconnected installs, or `oc` itself has a problem.
Is https://github.com/openshift/enhancements/pull/334 another option to avoid _not_ using oc?
Adding upgrade blocker tag as well as we don't have any indication it doesn't block as of now. Feel free to remove if we find otherwise.
*** Bug 1862948 has been marked as a duplicate of this bug. ***
*** Bug 1863335 has been marked as a duplicate of this bug. ***
After having a brainstorming session with Antonio today, we came up with another solution to fix the problem and this involves minimal changes: - We keep the current implementation (i.e keep using oc image extract) of CoreOS extensions support - Until oc fixes gets in to support mirror registry- when `oc image extract` fails, we fallback to copying machine-os-content on nodes using `podman pull osImageURL && podman create osImageURL && podman cp container_ID:/ /run/machine-os-content/os-content-XXXX` The fallback solution is applied only when oc image extract has failed.
This bug seems to affect proxy environments too (which are similar to disconnected - image cannot be downloaded directly from quay and `oc image extract` doesn't take mirrors/proxies into account)
(In reply to Vadim Rutkovsky from comment #12) > This bug seems to affect proxy environments too (which are similar to > disconnected - image cannot be downloaded directly from quay and `oc image > extract` doesn't take mirrors/proxies into account) thanks Vadim, we're tackling that separately
(In reply to Antonio Murdaca from comment #16) > (In reply to Vadim Rutkovsky from comment #12) > > This bug seems to affect proxy environments too (which are similar to > > disconnected - image cannot be downloaded directly from quay and `oc image > > extract` doesn't take mirrors/proxies into account) > > thanks Vadim, we're tackling that separately Perhaps we should reopen the bug https://bugzilla.redhat.com/show_bug.cgi?id=1862948 which has proxy setup.
@yunjiang Mike N. is on paternity leave and additionally, we do not have the infrastructure to test disconnected installs. Would it be possible that you could retest this and indicate if the BZ is verified?
(In reply to Sinny Kumari from comment #17) > (In reply to Antonio Murdaca from comment #16) > > (In reply to Vadim Rutkovsky from comment #12) > > > This bug seems to affect proxy environments too (which are similar to > > > disconnected - image cannot be downloaded directly from quay and `oc image > > > extract` doesn't take mirrors/proxies into account) > > > > thanks Vadim, we're tackling that separately > > Perhaps we should reopen the bug > https://bugzilla.redhat.com/show_bug.cgi?id=1862948 which has proxy setup. No, proxy issue is caused by the very same rootcase (so I closed the proxy bug as dupe)
verified. PASS. version: 4.6.0-0.nightly-2020-08-10-180431
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475