1862948 – [QE build watch] Upgrade stuck in 'failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-content-176084702 ... Timeout exceeded while awaiting headers'

Bug 1862948 - [QE build watch] Upgrade stuck in 'failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-content-176084702 ... Timeout exceeded while awaiting headers'

Summary: [QE build watch] Upgrade stuck in 'failed to run command oc (6 tries): timed ...

Keywords:
Status:	CLOSED DUPLICATE of bug 1862979
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Lalatendu Mohanty
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-03 09:43 UTC by Xingxing Xia
Modified:	2022-10-27 10:24 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-08-11 07:49:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Xingxing Xia 2020-08-03 09:43:20 UTC

Description of problem:
In the QE build watch of http://10.0.76.54/buildcorp/upgrade_CI/4169/console (you can see more details here), found below upgrade stuck error:
"failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-content-176084702 ... Timeout exceeded while awaiting headers"
Matrix: 4.5.4-x86_64 -> 4.6.0-0.nightly-2020-08-01-172303, 15_Disconnected UPI on GCP with RHEL7.7 OVN & http_proxy & Etcd Encryption on


Manual running it, indeed costed not-few time 6m+ with 1.4G content:
[xxia 2020-08-03 17:13:50 CST my]$ time oc image extract --path /:os-content-176084702 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb8cb875ed5ef903df8f3f056a3d48eaf4cca3b34af02a9d6728125ff507bcdc

real    6m18.920s
user    0m17.649s
sys     0m9.285s
[xxia 2020-08-03 17:20:10 CST my]$ 
[xxia 2020-08-03 17:40:49 CST my]$ du -sh os-content-176084702/
1.4G    os-content-176084702/


Version-Release number of the following components:
4.6.0-0.nightly-2020-08-01-172303

How reproducible:
Don't know yet

Steps to Reproduce:
1. The QE CI launched a 4.5.4 env of above matrix
2. The QE CI upgraded it to 4.6.0-0.nightly-2020-08-01-172303

Actual results:
2. The CI printed the upgrade process's output with one master is SchedulingDisabled, which shows "timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-content-176084702":
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.4     True        True          25m     Working towards 4.6.0-0.nightly-2020-08-01-172303: 84% complete

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.4     True        True          26m     Unable to apply 4.6.0-0.nightly-2020-08-01-172303: the cluster operator machine-config has not yet successfully rolled out
...
...
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.4     True        True          119m    Unable to apply 4.6.0-0.nightly-2020-08-01-172303: the cluster operator openshift-apiserver is degraded

**************Post Action after upgrade fail****************

Post action: #oc get node:
NAME                                                 STATUS                     ROLES    AGE    VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
ugdci02033536-08011937-m-0.c.openshift-qe.internal   Ready                      master   4h6m   v1.18.3+012b3ec   10.0.0.4                    Red Hat Enterprise Linux CoreOS 45.82.202007240629-0 (Ootpa)   4.18.0-193.13.2.el8_2.x86_64   cri-o://1.18.3-5.rhaos4.5.git1c13d1d.el8
ugdci02033536-08011937-m-1.c.openshift-qe.internal   Ready,SchedulingDisabled   master   4h5m   v1.18.3+012b3ec   10.0.0.6                    Red Hat Enterprise Linux CoreOS 45.82.202007240629-0 (Ootpa)   4.18.0-193.13.2.el8_2.x86_64   cri-o://1.18.3-5.rhaos4.5.git1c13d1d.el8
ugdci02033536-08011937-m-2.c.openshift-qe.internal   Ready                      master   4h6m   v1.18.3+012b3ec   10.0.0.5                    Red Hat Enterprise Linux CoreOS 45.82.202007240629-0 (Ootpa)   4.18.0-193.13.2.el8_2.x86_64   cri-o://1.18.3-5.rhaos4.5.git1c13d1d.el8
ugdci02033536-08011937-w-a-l-rhel-0                  Ready,SchedulingDisabled   worker   157m   v1.18.3+08c38ef   10.0.32.5                   Red Hat Enterprise Linux Server 7.7 (Maipo)                    3.10.0-1127.18.2.el7.x86_64    cri-o://1.18.3-8.rhaos4.5.gitbefe37e.el7
ugdci02033536-08011937-w-a-l-rhel-1                  Ready                      worker   157m   v1.18.3+08c38ef   10.0.32.6                   Red Hat Enterprise Linux Server 7.7 (Maipo)                    3.10.0-1127.18.2.el7.x86_64    cri-o://1.18.3-8.rhaos4.5.gitbefe37e.el7

Post action: #oc get co:
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.0-0.nightly-2020-08-01-172303   True        False         True       107m
...
ingress                                    4.6.0-0.nightly-2020-08-01-172303   True        False         True       113m
...
machine-config                             4.5.4                               False       True          True       103m
...
openshift-apiserver                        4.6.0-0.nightly-2020-08-01-172303   True        False         True       112m
...

print detail msg for node(SchedulingDisabled) if exist:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Abnormal node details~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Name:               ugdci02033536-08011937-m-1.c.openshift-qe.internal
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=n1-standard-4
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-central1
                    failure-domain.beta.kubernetes.io/zone=us-central1-b
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ugdci02033536-08011937-m-1.c.openshift-qe.internal
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=
                    node.kubernetes.io/instance-type=n1-standard-4
                    node.openshift.io/os_id=rhcos
                    topology.kubernetes.io/region=us-central1
                    topology.kubernetes.io/zone=us-central1-b
Annotations:        k8s.ovn.org/l3-gateway-config:
                      {"default":{"mode":"local","interface-id":"br-local_ugdci02033536-08011937-m-1.c.openshift-qe.internal","mac-address":"7a:2d:9d:09:04:47",...
                    k8s.ovn.org/node-chassis-id: 7459e79d-9f10-46e8-97c6-c7deae5cdf47
                    k8s.ovn.org/node-join-subnets: {"default":"100.64.0.0/29"}
                    k8s.ovn.org/node-mgmt-port-mac-address: 1a:0c:30:4e:f8:d9
                    k8s.ovn.org/node-subnets: {"default":"10.128.0.0/23"}
                    machineconfiguration.openshift.io/currentConfig: rendered-master-9cd892e38d01a0786e750356d578fada
                    machineconfiguration.openshift.io/desiredConfig: rendered-master-eb5b061fa7a90a14a31f03301d1c9e2e
                    machineconfiguration.openshift.io/reason:
                      failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-...
                      : exit status 1
                    machineconfiguration.openshift.io/state: Degraded
                    volumes.kubernetes.io/controller-managed-attach-detach: true

print detail msg for co(AVAILABLE != True or PROGRESSING!=False or DEGRADED!=False or version != target_version) if exist:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Abnormal co details==~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
Name:         machine-config
...
Status:
  Conditions:
    Last Transition Time:  2020-08-01T22:17:55Z
    Message:               Working towards 4.6.0-0.nightly-2020-08-01-172303
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2020-08-01T22:33:26Z
    Message:               Unable to apply 4.6.0-0.nightly-2020-08-01-172303: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-9cd892e38d01a0786e750356d578fada expected 057d852d0d10f94120aaa91e771503baa5b3c242 has 99eb744f5094224edb60d88ca85d607ab151ebdf: pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node ugdci02033536-08011937-m-1.c.openshift-qe.internal is reporting: \"failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-content-176084702 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb8cb875ed5ef903df8f3f056a3d48eaf4cca3b34af02a9d6728125ff507bcdc failed: error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb8cb875ed5ef903df8f3f056a3d48eaf4cca3b34af02a9d6728125ff507bcdc: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\\n: exit status 1\"", retrying
    Reason:                RequiredPoolsFailed
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2020-08-01T22:15:52Z
    Message:               Cluster not available for 4.6.0-0.nightly-2020-08-01-172303
    Status:                False
    Type:                  Available
    Last Transition Time:  2020-08-01T19:56:02Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:
    Last Sync Error:  pool master has not progressed to latest configuration: controller version mismatch for rendered-master-9cd892e38d01a0786e750356d578fada expected 057d852d0d10f94120aaa91e771503baa5b3c242 has 99eb744f5094224edb60d88ca85d607ab151ebdf: pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node ugdci02033536-08011937-m-1.c.openshift-qe.internal is reporting: \"failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-content-176084702 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb8cb875ed5ef903df8f3f056a3d48eaf4cca3b34af02a9d6728125ff507bcdc failed: error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb8cb875ed5ef903df8f3f056a3d48eaf4cca3b34af02a9d6728125ff507bcdc: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\\n: exit status 1\"", retrying
    Master:           pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node ugdci02033536-08011937-m-1.c.openshift-qe.internal is reporting: \"failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-content-176084702 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb8cb875ed5ef903df8f3f056a3d48eaf4cca3b34af02a9d6728125ff507bcdc failed: error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb8cb875ed5ef903df8f3f056a3d48eaf4cca3b34af02a9d6728125ff507bcdc: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\\n: exit status 1\""
    Worker:           pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node ugdci02033536-08011937-w-a-l-rhel-0 is reporting: \"failed to drain node (5 tries): timed out waiting for the condition: [error when evicting pod \\\"router-default-b465b6d9c-bzhwx\\\": global timeout reached: 1m30s, error when evicting pod \\\"router-default-76f5cc7db9-xn2rd\\\": global timeout reached: 1m30s]\""

Expected results:
No such errors.

Additional info:
Another separate ingress issue shown above was filed as https://bugzilla.redhat.com/show_bug.cgi?id=1862892

Comment 2 Vadim Rutkovsky 2020-08-03 16:35:03 UTC


*** This bug has been marked as a duplicate of bug 1862979 ***

Comment 3 Xingxing Xia 2020-08-04 02:28:34 UTC

(In reply to Xingxing Xia from comment #0)
> Description of problem:
> Matrix: 4.5.4-x86_64 -> 4.6.0-0.nightly-2020-08-01-172303, 15_Disconnected UPI on GCP with RHEL7.7 OVN & http_proxy & Etcd Encryption on
This bug env had http_proxy, thus not a pure disconnected env bug as bug 1862979 which is pure disconnected env without proxy. And the error message is a different. This bug error is "Timeout exceeded while awaiting headers", while that bug is "Get "https://quay.io/v2/": Forbidden". So I don't think it is same issue. Reopenning therefore...
Thanks.

Comment 4 Xingxing Xia 2020-08-04 03:08:23 UTC

(In reply to Xingxing Xia from comment #3)
> This bug env had http_proxy
One more clearer point, with http_proxy, this bug env does not use "mirror registry" like bug 1862979 at all to host the images.

Comment 5 Vadim Rutkovsky 2020-08-04 07:38:42 UTC

(In reply to Xingxing Xia from comment #3)
> (In reply to Xingxing Xia from comment #0)
> > Description of problem:
> > Matrix: 4.5.4-x86_64 -> 4.6.0-0.nightly-2020-08-01-172303, 15_Disconnected UPI on GCP with RHEL7.7 OVN & http_proxy & Etcd Encryption on
> This bug env had http_proxy, thus not a pure disconnected env bug as bug
> 1862979 which is pure disconnected env without proxy. And the error message
> is a different. This bug error is "Timeout exceeded while awaiting headers",
> while that bug is "Get "https://quay.io/v2/": Forbidden". So I don't think
> it is same issue. Reopenning therefore...
> Thanks.

Please attach the must-gather. So far the bug cause is the same - MCO is now using `oc image extract` instead of `podman run`, which means mirror and proxy settings are not respected

Comment 6 Xingxing Xia 2020-08-04 10:58:49 UTC

Unlike other failed QE CI jobs with successful must-gather, this QE CI job shown above had failed must-gather result "must-gather file creation fails". Will rebuild it and keep the cluster and come back.

Comment 8 Ke Wang 2020-08-06 02:42:48 UTC

Reproduced in QE build upgrade testing: http://10.0.76.54/buildcorp/upgrade_CI/4207/console

Cluster profile: Disconnected IPI on Azure & Private Cluster

From 4.5.4 to 4.6 nightly build,
oc adm upgrade --to-image=ugdci05072549.mirror-registry.qe.azure.devcluster.openshift.com:5000/openshift-release-dev/ocp-release:4.6.0-0.nightly-2020-08-04-210224 --force=true --allow-explicit-upgrade=true

Hit the same problem, 
...
Status:
Conditions:
Last Transition Time: 2020-08-05T02:43:01Z
Message: Working towards 4.6.0-0.nightly-2020-08-04-210224
Status: True
Type: Progressing
Last Transition Time: 2020-08-05T02:57:45Z
Message: Unable to apply 4.6.0-0.nightly-2020-08-04-210224: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-09c695ebfc8d928626a4c4b84684d89a expected ade383fc8b27be6bdc6aa7985b3154350beaec88 has 99eb744f5094224edb60d88ca85d607ab151ebdf: pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node ugdci05072549-x2q48-master-2 is reporting: \"failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-content-424723676 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:16a8dde4b893ff0b5b4aeb05474f2f5e2ce9cac45d5d3e98b40c4309e23215a7 failed: error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:16a8dde4b893ff0b5b4aeb05474f2f5e2ce9cac45d5d3e98b40c4309e23215a7: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
n: exit status 1\"", retrying
Reason: RequiredPoolsFailed
Status: True
Type: Degraded
Last Transition Time: 2020-08-05T02:41:04Z
Message: Cluster not available for 4.6.0-0.nightly-2020-08-04-210224
...

Comment 10 RamaKasturi 2020-08-06 06:02:31 UTC

Reproduced the issue in QE CI upgrade testing. Below are the details:

Build details : 4.5.5-x86_64 -> 4.6.0-0.nightly-2020-08-05-153221
Matrix : 14_Disconnected IPI on Azure & Private Cluster

Upgrade command: ./oc adm upgrade --to-image=ugdci06022546.mirror-registry.qe.azure.devcluster.openshift.com:5000/openshift-release-dev/ocp-release:4.6.0-0.nightly-2020-08-05-153221 --force=true --allow-explicit-upgrade=true

Hit the same problem:
========================
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Abnormal node details~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Name:               ugdci06022546-lk2z9-master-2
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=Standard_D8s_v3
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=centralus
                    failure-domain.beta.kubernetes.io/zone=centralus-1
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ugdci06022546-lk2z9-master-2
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=
                    node.kubernetes.io/instance-type=Standard_D8s_v3
                    node.openshift.io/os_id=rhcos
                    topology.kubernetes.io/region=centralus
                    topology.kubernetes.io/zone=centralus-1
Annotations:        machine.openshift.io/machine: openshift-machine-api/ugdci06022546-lk2z9-master-2
                    machineconfiguration.openshift.io/currentConfig: rendered-master-7ff4949bb27c168d12c4ed000abfbea5
                    machineconfiguration.openshift.io/desiredConfig: rendered-master-e805edb843a99dd5f5f82c9c07789565
                    machineconfiguration.openshift.io/reason:
                      failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-...
                      : exit status 1
                    machineconfiguration.openshift.io/state: Degraded
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 05 Aug 2020 15:02:27 -0400
Taints:             node-role.kubernetes.io/master:NoSchedule
                    node.kubernetes.io/unschedulable:NoSchedule
Unschedulable:      true
Lease:
  HolderIdentity:  ugdci06022546-lk2z9-master-2

 Last Sync Error:  pool master has not progressed to latest configuration: controller version mismatch for rendered-master-7ff4949bb27c168d12c4ed000abfbea5 expected ade383fc8b27be6bdc6aa7985b3154350beaec88 has 807abb900cf9976a1baad66eab17c6d76016e7b7: pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node ugdci06022546-lk2z9-master-2 is reporting: \"failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-content-308854769 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:16a8dde4b893ff0b5b4aeb05474f2f5e2ce9cac45d5d3e98b40c4309e23215a7 failed: error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:16a8dde4b893ff0b5b4aeb05474f2f5e2ce9cac45d5d3e98b40c4309e23215a7: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\\n: exit status 1\"", retrying

Comment 11 RamaKasturi 2020-08-06 06:08:48 UTC

Hit similar issue on another matrix and upgrade path, below are the details:

Build Details: 4.5.5-x86_64 -> 4.6.0-0.nightly-2020-08-05-174122
Matrix : 26_Disconnected IPI on OSP13 with https_proxy & Etcd Encryption on

oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-05-174122 --force=true --allow-explicit-upgrade=true

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Abnormal node details~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Name:               ugdci06040054-dl2kr-master-0
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m1.xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=regionOne
                    failure-domain.beta.kubernetes.io/zone=nova
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ugdci06040054-dl2kr-master-0
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=
                    node.kubernetes.io/instance-type=m1.xlarge
                    node.openshift.io/os_id=rhcos
                    topology.kubernetes.io/region=regionOne
                    topology.kubernetes.io/zone=nova
Annotations:        machine.openshift.io/machine: openshift-machine-api/ugdci06040054-dl2kr-master-0
                    machineconfiguration.openshift.io/currentConfig: rendered-master-af74a630fea233a531a58c2184fcaa29
                    machineconfiguration.openshift.io/desiredConfig: rendered-master-c1501d842176d612c7c001810cd00069
                    machineconfiguration.openshift.io/reason:
                      failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-...
                      : exit status 1
                    machineconfiguration.openshift.io/state: Degraded
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 05 Aug 2020 16:15:35 -0400
Taints:             node-role.kubernetes.io/master:NoSchedule
                    node.kubernetes.io/unschedulable:NoSchedule
Unschedulable:      true
Lease:
  HolderIdentity:  ugdci06040054-dl2kr-master-0
  AcquireTime:     <unset>
  RenewTime:       Wed, 05 Aug 2020 20:02:29 -0400
Conditions:

Message:               Unable to apply 4.6.0-0.nightly-2020-08-05-174122: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-af74a630fea233a531a58c2184fcaa29 expected ade383fc8b27be6bdc6aa7985b3154350beaec88 has 807abb900cf9976a1baad66eab17c6d76016e7b7: pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node ugdci06040054-dl2kr-master-0 is reporting: \"failed to run command oc (6 tries): timed out waiting for the condition: running oc image extract --path /:/run/mco-machine-os-content/os-content-507018740 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:16a8dde4b893ff0b5b4aeb05474f2f5e2ce9cac45d5d3e98b40c4309e23215a7 failed: error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:16a8dde4b893ff0b5b4aeb05474f2f5e2ce9cac45d5d3e98b40c4309e23215a7: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\\n: exit status 1\"", retrying
    Reason:                RequiredPoolsFailed
    Status:                True

Comment 12 Vadim Rutkovsky 2020-08-06 07:50:39 UTC

must-gather provided in comment #9 shows the same behaviour as in bug 1862979:

2020-08-06T03:35:16.45502974Z error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb8cb875ed5ef903df8f3f056a3d48eaf4cca3b34af02a9d6728125ff507bcdc: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020-08-06T03:35:16.457232233Z W0806 03:35:16.457158  264417 run.go:44] oc failed: running oc image extract --path /:/run/mco-machine-os-content/os-content-688451146 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb8cb875ed5ef903df8f3f056a3d48eaf4cca3b34af02a9d6728125ff507bcdc failed: error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bb8cb875ed5ef903df8f3f056a3d48eaf4cca3b34af02a9d6728125ff507bcdc: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Updated the bug to mention proxy environments too

*** This bug has been marked as a duplicate of bug 1862979 ***

Comment 14 Vadim Rutkovsky 2020-08-11 07:49:51 UTC

See https://bugzilla.redhat.com/show_bug.cgi?id=1862979#c19

*** This bug has been marked as a duplicate of bug 1862979 ***

Comment 15 Sinny Kumari 2022-10-27 10:24:40 UTC

For future reference, proxy issue was fixed in bug https://bugzilla.redhat.com/show_bug.cgi?id=1857162

Note You need to log in before you can comment on or make changes to this bug.