Description of problem: With the recent RHCOS 47.83 builds, the bootstrap node will install, but the master and worker nodes will never bootstrap install. The ignition (fetch) job never completes successfully. Version-Release number of selected component (if applicable): The following RHCOS 47.83 builds exhibit the issue. 1. rhcos-47.83.202012072210-0 2. rhcos-47.83.202012101610-0 3. rhcos-47.83.202012140410-0 4. rhcos-47.83.202012141910-0 RHCOS 47.83 rhcos-47.83.202012071550-0 is the last RHCOS 47.83 build that does not have this issue, and master and worker nodes successfully bootstrap install. How reproducible: Easily reproducible; recreates every time. Steps to Reproduce: 1. Attempt to install an OCP 4.7 build from 2020-12-08 or after, using an RHCOS 47.83 build from rhcos-47.83.202012072210-0 through rhcos-47.83.202012141910-0. Actual results: The master and worker nodes never bootstrap install. Expected results: The master and worker nodes should bootstrap install. Additional info: From core@bootstrap node: [core@bootstrap-0 ~]$ journalctl -b -f -u release-image.service -u bootkube.service -- Logs begin at Tue 2020-12-15 06:08:48 UTC. -- Dec 15 06:29:09 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: Error: unhealthy cluster Dec 15 06:29:09 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: etcdctl failed. Retrying in 5 seconds... Dec 15 06:29:21 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: {"level":"warn","ts":"2020-12-15T06:29:21.352Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-c3d70c74-0d80-4b84-a12d-7e486b88de52/localhost:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp [::1]:2379: connect: connection refused\""} Dec 15 06:29:21 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: https://localhost:2379 is unhealthy: failed to commit proposal: context deadline exceeded Dec 15 06:29:21 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: Error: unhealthy cluster Dec 15 06:29:22 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: etcdctl failed. Retrying in 5 seconds... Dec 15 06:29:33 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: {"level":"warn","ts":"2020-12-15T06:29:33.278Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-b15d996e-56b5-44fe-be04-f0b3b6865c6f/localhost:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp [::1]:2379: connect: connection refused\""} Dec 15 06:29:33 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: https://localhost:2379 is unhealthy: failed to commit proposal: context deadline exceeded Dec 15 06:29:33 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: Error: unhealthy cluster Dec 15 06:29:33 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: etcdctl failed. Retrying in 5 seconds... Dec 15 06:29:45 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: {"level":"warn","ts":"2020-12-15T06:29:45.150Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-2396fc88-120e-49c0-b3bb-51dfc7509765/localhost:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp [::1]:2379: connect: connection refused\""} Dec 15 06:29:45 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: https://localhost:2379 is unhealthy: failed to commit proposal: context deadline exceeded Dec 15 06:29:45 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: Error: unhealthy cluster Dec 15 06:29:45 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: etcdctl failed. Retrying in 5 seconds... Dec 15 06:29:56 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: {"level":"warn","ts":"2020-12-15T06:29:56.750Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-8e0db3e9-35cd-4b60-9fe7-7e8871ef27f8/localhost:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp [::1]:2379: connect: connection refused\""} Dec 15 06:29:56 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: https://localhost:2379 is unhealthy: failed to commit proposal: context deadline exceeded Dec 15 06:29:56 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: Error: unhealthy cluster Dec 15 06:29:57 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com bootkube.sh[2384]: etcdctl failed. Retrying in 5 seconds... ^C [core@bootstrap-0 ~]$ [core@bootstrap-0 ~]$ journalctl | tail -30 Dec 15 06:30:24 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:24.474123 2364 kubelet_node_status.go:526] Recording NodeHasSufficientMemory event message for node bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com Dec 15 06:30:24 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:24.474160 2364 kubelet_node_status.go:526] Recording NodeHasNoDiskPressure event message for node bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com Dec 15 06:30:24 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:24.474169 2364 kubelet_node_status.go:526] Recording NodeHasSufficientPID event message for node bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com Dec 15 06:30:24 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:24.474283 2364 kuberuntime_manager.go:425] No sandbox for pod "bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default(98c4cfd884e851cfb3a66f337acbd418)" can be found. Need to start a new one Dec 15 06:30:24 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com crio[2081]: time="2020-12-15 06:30:24.477729593Z" level=info msg="Running pod sandbox: default/bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com/POD" id=83695628-30bf-46de-a693-9533a8143a66 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Dec 15 06:30:25 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com crio[2081]: time="2020-12-15 06:30:25.528861289Z" level=info msg="RunSandbox: releasing container name: k8s_POD_bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default_98c4cfd884e851cfb3a66f337acbd418_0" id=83695628-30bf-46de-a693-9533a8143a66 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Dec 15 06:30:25 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com crio[2081]: time="2020-12-15 06:30:25.529386162Z" level=info msg="RunSandbox: releasing pod sandbox name: k8s_bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default_98c4cfd884e851cfb3a66f337acbd418_0" id=83695628-30bf-46de-a693-9533a8143a66 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Dec 15 06:30:25 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: E1215 06:30:25.530409 2364 remote_runtime.go:113] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default_98c4cfd884e851cfb3a66f337acbd418_0": Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4: Error reading manifest sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized Dec 15 06:30:25 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: E1215 06:30:25.530488 2364 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default(98c4cfd884e851cfb3a66f337acbd418)" failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default_98c4cfd884e851cfb3a66f337acbd418_0": Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4: Error reading manifest sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized Dec 15 06:30:25 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: E1215 06:30:25.530505 2364 kuberuntime_manager.go:730] createPodSandbox for pod "bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default(98c4cfd884e851cfb3a66f337acbd418)" failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default_98c4cfd884e851cfb3a66f337acbd418_0": Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4: Error reading manifest sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized Dec 15 06:30:25 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: E1215 06:30:25.530577 2364 pod_workers.go:191] Error syncing pod 98c4cfd884e851cfb3a66f337acbd418 ("bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default(98c4cfd884e851cfb3a66f337acbd418)"), skipping: failed to "CreatePodSandbox" for "bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default(98c4cfd884e851cfb3a66f337acbd418)" with CreatePodSandboxError: "CreatePodSandbox for pod \"bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default(98c4cfd884e851cfb3a66f337acbd418)\" failed: rpc error: code = Unknown desc = error creating pod sandbox with name \"k8s_bootstrap-machine-config-operator-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_default_98c4cfd884e851cfb3a66f337acbd418_0\": Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4: Error reading manifest sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized" Dec 15 06:30:25 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com systemd[1]: Couldn't stat device /dev/char/10:200: No such file or directory Dec 15 06:30:25 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com systemd[1]: Started libcontainer container 00119b3612bdbcadae212c18853b356a0c04aef3ff7dd96c5c19ccff2919c15f. Dec 15 06:30:25 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com systemd[1]: Couldn't stat device /dev/char/10:200: No such file or directory Dec 15 06:30:26 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:26.451219 2364 kubelet_node_status.go:334] Setting node annotation to enable volume controller attach/detach Dec 15 06:30:26 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:26.469952 2364 kubelet_node_status.go:526] Recording NodeHasSufficientMemory event message for node bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com Dec 15 06:30:26 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:26.469996 2364 kubelet_node_status.go:526] Recording NodeHasNoDiskPressure event message for node bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com Dec 15 06:30:26 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:26.470005 2364 kubelet_node_status.go:526] Recording NodeHasSufficientPID event message for node bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com Dec 15 06:30:26 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:26.470120 2364 kuberuntime_manager.go:425] No sandbox for pod "etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd(676a77c1814571d9f14f63208341374c)" can be found. Need to start a new one Dec 15 06:30:26 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com crio[2081]: time="2020-12-15 06:30:26.471027504Z" level=info msg="Running pod sandbox: openshift-etcd/etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com/POD" id=cc0e5b6c-9cbf-4b15-9967-34fe1b1cbf70 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Dec 15 06:30:27 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com crio[2081]: time="2020-12-15 06:30:27.367015547Z" level=info msg="RunSandbox: releasing container name: k8s_POD_etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd_676a77c1814571d9f14f63208341374c_0" id=cc0e5b6c-9cbf-4b15-9967-34fe1b1cbf70 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Dec 15 06:30:27 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com crio[2081]: time="2020-12-15 06:30:27.367802991Z" level=info msg="RunSandbox: releasing pod sandbox name: k8s_etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd_676a77c1814571d9f14f63208341374c_0" id=cc0e5b6c-9cbf-4b15-9967-34fe1b1cbf70 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Dec 15 06:30:27 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: E1215 06:30:27.369075 2364 remote_runtime.go:113] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd_676a77c1814571d9f14f63208341374c_0": Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4: Error reading manifest sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized Dec 15 06:30:27 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: E1215 06:30:27.369157 2364 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd(676a77c1814571d9f14f63208341374c)" failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd_676a77c1814571d9f14f63208341374c_0": Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4: Error reading manifest sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized Dec 15 06:30:27 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: E1215 06:30:27.369171 2364 kuberuntime_manager.go:730] createPodSandbox for pod "etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd(676a77c1814571d9f14f63208341374c)" failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd_676a77c1814571d9f14f63208341374c_0": Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4: Error reading manifest sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized Dec 15 06:30:27 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: E1215 06:30:27.369238 2364 pod_workers.go:191] Error syncing pod 676a77c1814571d9f14f63208341374c ("etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd(676a77c1814571d9f14f63208341374c)"), skipping: failed to "CreatePodSandbox" for "etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd(676a77c1814571d9f14f63208341374c)" with CreatePodSandboxError: "CreatePodSandbox for pod \"etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd(676a77c1814571d9f14f63208341374c)\" failed: rpc error: code = Unknown desc = error creating pod sandbox with name \"k8s_etcd-bootstrap-member-bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com_openshift-etcd_676a77c1814571d9f14f63208341374c_0\": Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4: Error reading manifest sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized" Dec 15 06:30:29 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:29.602405 2364 kubelet_node_status.go:334] Setting node annotation to enable volume controller attach/detach Dec 15 06:30:29 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:29.627418 2364 kubelet_node_status.go:526] Recording NodeHasSufficientMemory event message for node bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com Dec 15 06:30:29 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:29.627452 2364 kubelet_node_status.go:526] Recording NodeHasNoDiskPressure event message for node bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com Dec 15 06:30:29 bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com hyperkube[2364]: I1215 06:30:29.627461 2364 kubelet_node_status.go:526] Recording NodeHasSufficientPID event message for node bootstrap-0.pok-98.ocptest.pok.stglabs.ibm.com [core@bootstrap-0 ~]$ Thank you.
What kind of environment is this? Bare metal or cloud? What architecture? Could you please provide the complete journal from a master or worker showing the full boot and Ignition stage failing?
I currently have the 4.7 nightly running on s390x and I did not see this issue. the coreos version is one of the ones listed above. [cbaus@rock-kvmlp-3 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-s390x-2020-12-10-120744 True False 3d18h Cluster version is 4.7.0-0.nightly-s390x-2020-12-10-120744 [cbaus@rock-kvmlp-3 ~]$ oc describe node cbaus-ocp-zvzn6-master-0 . . . System Info: Machine ID: 22fa410fa0474cd69926781f6d0c935e System UUID: 22fa410fa0474cd69926781f6d0c935e Boot ID: dd8b10b0-3ff7-4cda-ad30-26a430d23298 Kernel Version: 4.18.0-240.7.1.el8_3.s390x OS Image: Red Hat Enterprise Linux CoreOS 47.83.202012072210-0 (Ootpa) Operating System: linux Architecture: s390x Container Runtime Version: cri-o://1.20.0-0.rhaos4.7.gitb03c34a.el8.30-dev Kubelet Version: v1.19.2+ad738ba Kube-Proxy Version: v1.19.2+ad738ba . . . I noticed this error in the log above which points to a credentials issue with quay: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4: Error reading manifest sha256:65f23040cd19b6eaf5d15ad9fc0526acff11acd65c9e21d97be1e010397e79d4 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized"
Folks, 1. This issue is seen on multiple System z zVM LPARs. 2. The following RHCOS 47.83 build installs the bootstrap and then the masters and workers boot successfully: 1. rhcos-47.83.202012071550-0 3. The following RHCOS 47.83 builds successfully install the bootstrap, but the masters and workers do not boot successfully: 1. rhcos-47.83.202012072210-0 2. rhcos-47.83.202012101610-0 3. rhcos-47.83.202012140410-0 4. rhcos-47.83.202012141910-0 4. Practically all OCP 4.7 builds since December 12th (and many previous builds) install successfully when using the RHCOS 47.83 rhcos-47.83.202012071550-0 build installed on the bootstrap. 5. Using the RHCOS 47.83 rhcos-47.83.202012071550-0 build installed on the bootstrap, the masters and workers then successfully install these RHCOS build levels as required by specific OCP 4.7 builds: 1. rhcos-47.83.202012072210-0 2. rhcos-47.83.202012101610-0 3. rhcos-47.83.202012140410-0 4. rhcos-47.83.202012141910-0 6. The issue for this defect is when using any of the above 4 RHCOS builds to install on the bootstrap node. Thank you, Kyle
I can confirm too that I am seeing this issue in a couple of places: - libvirt IPI install with 47.83.202012101610-0 as the bootimage. I updated the bootimage just for a test and found out it has this same issue. The default older image (47.83.202012030410-0) works fine. - zVM baremetal platform with 47.83.202012101610-0 as the bootimage. Something seems to have happened recently (inside rhcos ?) that is giving this unauthorized request error. This doesn't seem arch specific either and just to confirm I will also try with an updated bootimage on x86. @Micah - off the top of your head can you think of any changes which would have caused this ?
Is there anything like a HTTP proxy involved here? In order to debug this I'd ssh to the bootstrap and try e.g. reverting podman/crio (if indeed they changed) or request debug level output from them. Or instead of ssh-to-bootstrap start up an RHCOS node just providing your pull secret, should be basically the same. Another avenue is to look at the package-level diffs.
Ok. I see the same for x86 too after updating bootimage to the latest (47.83.202012142132-0). @Kyle you mentioned that the build that started failing is 47.83.202012072210-0 ? in that build I see that cri-o has been updated `cri-o 1.20.0-0.rhaos4.7.gitb5f76f7.el8.29 → 1.20.0-0.rhaos4.7.gitb03c34a.el8.30` So this could be related to cri-o? let me confirm with that build and the previous to make sure
Prashanth, Thank you. Yes, we started seeing this issue with the RHCOS 47.83 rhcos-47.83.202012072210-0 build. Thank you, Kyle
(In reply to Colin Walters from comment #5) > Is there anything like a HTTP proxy involved here? > > In order to debug this I'd ssh to the bootstrap > and try e.g. reverting podman/crio (if indeed they changed) or request debug > level output from them. > > Or instead of ssh-to-bootstrap start up an RHCOS node just providing your > pull secret, should be basically the same. > > Another avenue is to look at the package-level diffs. Thanks for this idea @walters! I tried this and indeed downgrading cri-o worked The cri-o version 1.20.0-0.rhaos4.7.gitb03c34a.el8.30 seems to have this issue. I downgraded to cri-o-1.20.0-0.rhaos4.7.gitb5f76f7.el8.29.s390x and things started working again. I will move this for investigation to the cri-o team.
there were only three commits between those two versions: b03c34aa7dd60777cac081d9aa2bb67c7d35bab5 Merge pull request #4397 from saschagrunert/k-release-0.6.0 5e80372b7f65218a9814ba077d33224755d79281 Increase release-notes run timeout to 30m 7150db5ba2877e926675fc987fe1542d653a6a0c Bump k/release to v0.6.0 The only suspicious piece is that we bumped containers/image from v5.5.2 to v5.7.0, which is a change that could cause this (not sure why yet).
The thing I don't understand at all is why this bug isn't happening to all of CI; is there something e.g. special about your pull secrets? Could you perhaps have multiple secrets for quay.io and we're picking the wrong one now?
(In reply to Colin Walters from comment #10) > The thing I don't understand at all is why this bug isn't happening to all > of CI; is there something e.g. special about your pull secrets? > Could you perhaps have multiple secrets for quay.io and we're picking the > wrong one now? The pull secret is the regular pull secret from try.openshift.com. No changes to that. The thing is - this issue is only seen on the bootstrap node. Once the masters are up, they seem to work fine and pull the images fine from quay and they are using the same crio version as well. Maybe something to do with where the config.json is stored in the bootstrap vs masters? not sure.
cri-o PR https://github.com/cri-o/cri-o/pull/4462
Marking this as a test blocker for UPI installs as documented. This is because customers are pointed to the latest rhcos images when they install following the docs, but our IPI installs (and many of our test scenarios) used a fixed rhcos version (which is bumped when the release branch is cut). This has resulted in the up-to-date rhcos image being broken for deployment purposes since December 7th, 2020. See https://coreos.slack.com/archives/CH76YSYSC/p1610037169365200 for the full context.
Folks, 1. This issue is also seen on the OCP 4.7 public mirror build, 4.7.0-fc.0, at: https://mirror.openshift.com/pub/openshift-v4/s390x/clients/ocp-dev-preview/4.7.0-fc.0/ 2. This OCP 4.7 build uses the RHCOS 47.83.202012182210-0 build. Thank you, Kyle
Folks, 1. This issue is also seen with the RHCOS 47.83.202101041010-0 build. Thank you, Kyle
Folks, 1. This issue is also seen with the RHCOS 47.83.202101081310-0 build. Thank you, Kyle
@Kyle - https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.7-s390x&release=47.83.202101100110-0#47.83.202101100110-0 has the updated cri-o with the fix. could you please test it ?
Prashanth, Thank you for the information. Using the RHCOS 47.83.202101100110-0 build, we have successfully tested the following OCP 4.7 builds on both zVM using ECKD and FCP storage, and on KVM using ECKD and FCP storage: 1. 4.7.0-0.nightly-s390x-2021-01-09-013834 2. 4.7.0-0.nightly-s390x-2021-01-09-083526 3. 4.7.0-0.nightly-s390x-2021-01-09-144807 4. 4.7.0-0.nightly-s390x-2021-01-09-202949 5. 4.7.0-fc.0 6. 4.7.0-fc.1 7. 4.7.0-fc.2 Thank you, Kyle
Verified on 4.7.0-0.nightly-2021-01-19-033533 with RHCOS 47.83.202101171239-0 using UPI install. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-01-19-033533 True False 89m Cluster version is 4.7.0-0.nightly-2021-01-19-033533 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-56-217.us-east-2.compute.internal Ready master 114m v1.20.0+d9c52cc 10.0.56.217 <none> Red Hat Enterprise Linux CoreOS 47.83.202101171239-0 (Ootpa) 4.18.0-240.10.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd9f17c8.el8.42 ip-10-0-59-181.us-east-2.compute.internal Ready master 114m v1.20.0+d9c52cc 10.0.59.181 <none> Red Hat Enterprise Linux CoreOS 47.83.202101171239-0 (Ootpa) 4.18.0-240.10.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd9f17c8.el8.42 ip-10-0-63-227.us-east-2.compute.internal Ready worker 104m v1.20.0+d9c52cc 10.0.63.227 <none> Red Hat Enterprise Linux CoreOS 47.83.202101171239-0 (Ootpa) 4.18.0-240.10.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd9f17c8.el8.42 ip-10-0-69-79.us-east-2.compute.internal Ready master 115m v1.20.0+d9c52cc 10.0.69.79 <none> Red Hat Enterprise Linux CoreOS 47.83.202101171239-0 (Ootpa) 4.18.0-240.10.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd9f17c8.el8.42 ip-10-0-70-235.us-east-2.compute.internal Ready worker 104m v1.20.0+d9c52cc 10.0.70.235 <none> Red Hat Enterprise Linux CoreOS 47.83.202101171239-0 (Ootpa) 4.18.0-240.10.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd9f17c8.el8.42
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633