2035757 – [IPI on Alibabacloud] one master node turned NotReady which leads to installation failed

Bug 2035757 - [IPI on Alibabacloud] one master node turned NotReady which leads to installation failed

Summary: [IPI on Alibabacloud] one master node turned NotReady which leads to installa...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.10.0
Assignee:	aos-install
QA Contact:	Jianli Wei
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2005647 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-12-27 12:11 UTC by Jianli Wei
Modified:	2023-09-15 01:18 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-10 16:36:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift installer pull 5535	0	None	Draft	Bug 2035757: cluster-bootstrap/alibaba: set tear-down-delay to wait kube-apiserver rolls out on AlibabaCloud	2022-01-17 17:54:48 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-10 16:36:48 UTC

Description Jianli Wei 2021-12-27 12:11:27 UTC

Version:
$ openshift-install version
openshift-install 4.10.0-0.nightly-2021-12-23-153012
built from commit 94a3ed9cbe4db66dc50dab8b85d2abf40fb56426
release image registry.ci.openshift.org/ocp/release@sha256:39cacdae6214efce10005054fb492f02d26b59fe9d23686dc17ec8a42f428534
release architecture amd64

Platform: alibabacloud

Please specify:
* IPI (automated install with `openshift-install`. If you don't know, then it's IPI)

What happened?
One master node turned NotReady, with "Get \"http://127.0.0.1:10258/healthz\": dial tcp 127.0.0.1:10258: connect: connection refused". 

$ oc get nodes
NAME                                          STATUS     ROLES    AGE    VERSION
jiwei-cc-7cgps-master-0                       Ready      master   179m   v1.22.1+6859754
jiwei-cc-7cgps-master-1                       Ready      master   179m   v1.22.1+6859754
jiwei-cc-7cgps-master-2                       NotReady   master   3h1m   v1.22.1+6859754
jiwei-cc-7cgps-worker-ap-northeast-1a-sgfhf   Ready      worker   168m   v1.22.1+6859754
jiwei-cc-7cgps-worker-ap-northeast-1b-z9wt4   Ready      worker   167m   v1.22.1+6859754
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          3h3m    Unable to apply 4.10.0-0.nightly-2021-12-23-153012: an unknown error has occurred: MultipleErrors
$ 

What did you expect to happen?
All nodes should be Ready and installation should succeed.

How to reproduce it (as minimally and precisely as possible)?
Not sure, it's just we got the issue 1-2 times every week.

Anything else we need to know?
>> master-2 kubelet logs around "2021-12-27 07:38:40":
ec 27 07:38:33 jiwei-cc-7cgps-master-2 hyperkube[1351]: I1227 07:38:33.516968    1351 patch_prober.go:29] interesting pod/alibaba-cloud-controller-manager-66cc9ff74b-8s8lr container/cloud-controller-manager namespace/openshift-cloud-controller-manager: Liveness probe status=failure output="Get \"http://127.0.0.1:10258/healthz\": dial tcp 127.0.0.1:10258: connect: connection refused" start-of-body=
Dec 27 07:38:33 jiwei-cc-7cgps-master-2 hyperkube[1351]: I1227 07:38:33.517017    1351 prober.go:116] "Probe failed" probeType="Liveness" pod="openshift-cloud-controller-manager/alibaba-cloud-controller-manager-66cc9ff74b-8s8lr" podUID=2903ee36-c237-41fc-afb6-9cb069b45c41 containerName="cloud-controller-manager" probeResult=failure output="Get \"http://127.0.0.1:10258/healthz\": dial tcp 127.0.0.1:10258: connect: connection refused"
Dec 27 07:38:33 jiwei-cc-7cgps-master-2 hyperkube[1351]: E1227 07:38:33.680772    1351 kubelet_node_status.go:497] "Error updating node status, will retry" err="error getting node \"jiwei-cc-7cgps-master-2\": Get \"https://api-int.jiwei-cc.alicloud-qe.devcluster.openshift.com:6443/api/v1/nodes/jiwei-cc-7cgps-master-2?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
Dec 27 07:38:33 jiwei-cc-7cgps-master-2 hyperkube[1351]: E1227 07:38:33.680798    1351 kubelet_node_status.go:484] "Unable to update node status" err="update node status exceeds retry count"
Dec 27 07:38:34 jiwei-cc-7cgps-master-2 hyperkube[1351]: E1227 07:38:34.180310    1351 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://api-int.jiwei-cc.alicloud-qe.devcluster.openshift.com:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/jiwei-cc-7cgps-master-2?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Dec 27 07:38:43 jiwei-cc-7cgps-master-2 hyperkube[1351]: I1227 07:38:43.516885    1351 patch_prober.go:29] interesting pod/alibaba-cloud-controller-manager-66cc9ff74b-8s8lr container/cloud-controller-manager namespace/openshift-cloud-controller-manager: Liveness probe status=failure output="Get \"http://127.0.0.1:10258/healthz\": dial tcp 127.0.0.1:10258: connect: connection refused" start-of-body=
Dec 27 07:38:43 jiwei-cc-7cgps-master-2 hyperkube[1351]: I1227 07:38:43.516948    1351 prober.go:116] "Probe failed" probeType="Liveness" pod="openshift-cloud-controller-manager/alibaba-cloud-controller-manager-66cc9ff74b-8s8lr" podUID=2903ee36-c237-41fc-afb6-9cb069b45c41 containerName="cloud-controller-manager" probeResult=failure output="Get \"http://127.0.0.1:10258/healthz\": dial tcp 127.0.0.1:10258: connect: connection refused"
Dec 27 07:38:45 jiwei-cc-7cgps-master-2 hyperkube[1351]: I1227 07:38:45.535993    1351 generic.go:296] "Generic (PLEG): container finished" podID=2903ee36-c237-41fc-afb6-9cb069b45c41 containerID="8176a37c73ee84b38872f1774263b4225f4ef8b678d232499a71e097ff00e08c" exitCode=1
Dec 27 07:38:45 jiwei-cc-7cgps-master-2 hyperkube[1351]: I1227 07:38:45.536014    1351 kubelet.go:2115] "SyncLoop (PLEG): event for pod" pod="openshift-cloud-controller-manager/alibaba-cloud-controller-manager-66cc9ff74b-8s8lr" event=&{ID:2903ee36-c237-41fc-afb6-9cb069b45c41 Type:ContainerDied Data:8176a37c73ee84b38872f1774263b4225f4ef8b678d232499a71e097ff00e08c}
Dec 27 07:38:45 jiwei-cc-7cgps-master-2 hyperkube[1351]: I1227 07:38:45.536050    1351 scope.go:110] "RemoveContainer" containerID="9d3363d96f7b7c3bdebdf5a52b75d7814e8ae2bfcda391aa6ee7d16ef54f7751"
Dec 27 07:38:45 jiwei-cc-7cgps-master-2 hyperkube[1351]: I1227 07:38:45.536526    1351 scope.go:110] "RemoveContainer" containerID="8176a37c73ee84b38872f1774263b4225f4ef8b678d232499a71e097ff00e08c"
Dec 27 07:38:45 jiwei-cc-7cgps-master-2 hyperkube[1351]: E1227 07:38:45.536934    1351 pod_workers.go:836] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cloud-controller-manager\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=cloud-controller-manager pod=alibaba-cloud-controller-manager-66cc9ff74b-8s8lr_openshift-cloud-controller-manager(2903ee36-c237-41fc-afb6-9cb069b45c41)\"" pod="openshift-cloud-controller-manager/alibaba-cloud-controller-manager-66cc9ff74b-8s8lr" podUID=2903ee36-c237-41fc-afb6-9cb069b45c41
Dec 27 07:38:48 jiwei-cc-7cgps-master-2 hyperkube[1351]: E1227 07:38:48.077850    1351 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"revision-pruner-7-jiwei-cc-7cgps-master-2.16c48c6840bd4760", GenerateName:"", Namespace:"openshift-kube-scheduler", SelfLink:"", UID:"", ResourceVersion:"14325", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"openshift-kube-scheduler", Name:"revision-pruner-7-jiwei-cc-7cgps-master-2", UID:"3a7c704e-e07d-49c0-953e-27286b15a126", APIVersion:"v1", ResourceVersion:"13950", FieldPath:""}, Reason:"FailedKillPod", Message:"error killing pod: failed to \"KillPodSandbox\" for \"3a7c704e-e07d-49c0-953e-27286b15a126\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_revision-pruner-7-jiwei-cc-7cgps-master-2_openshift-kube-scheduler_3a7c704e-e07d-49c0-953e-27286b15a126_0(555c82ebb103d7812504c55ad2edeff2ee4890f227ac6e293b06fad4541a0bd1): error removing pod openshift-kube-scheduler_revision-pruner-7-jiwei-cc-7cgps-master-2 from CNI network \\\"multus-cni-network\\\": plugin type=\\\"multus\\\" name=\\\"multus-cni-network\\\" failed (delete): Multus: [openshift-kube-scheduler/revision-pruner-7-jiwei-cc-7cgps-master-2/3a7c704e-e07d-49c0-953e-27286b15a126]: error waiting for pod: Get \\\"https://[api-int.jiwei-cc.alicloud-qe.devcluster.openshift.com]:6443/api/v1/namespaces/openshift-kube-scheduler/pods/revision-pruner-7-jiwei-cc-7cgps-master-2?timeout=1m0s\\\": dial tcp 10.0.11.98:6443: connect: connection refused\"", Source:v1.EventSource{Component:"kubelet", Host:"jiwei-cc-7cgps-master-2"}, FirstTimestamp:time.Date(2021, time.December, 27, 7, 35, 43, 0, time.Local), LastTimestamp:time.Date(2021, time.December, 27, 7, 36, 7, 658824812, time.Local), Count:3, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Patch "https://api-int.jiwei-cc.alicloud-qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-kube-scheduler/events/revision-pruner-7-jiwei-cc-7cgps-master-2.16c48c6840bd4760": dial tcp 10.0.11.98:6443: i/o timeout'(may retry after sleeping)

>>"oc get co"/etc.
$ oc get co | grep -Ev 'True        False         False'
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.10.0-0.nightly-2021-12-23-153012   False       True          True       146m    WellKnownAvailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://10.0.11.100:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)
console                                                                                                                      
dns                                        4.10.0-0.nightly-2021-12-23-153012   True        True          False      144m    DNS "default" reports Progressing=True: "Have 4 available node-resolver pods, want 5."
etcd                                       4.10.0-0.nightly-2021-12-23-153012   True        True          True       145m    NodeControllerDegraded: The master nodes not ready: node "jiwei-cc-7cgps-master-2" not ready since 2021-12-27 07:38:40 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-apiserver                             4.10.0-0.nightly-2021-12-23-153012   False       True          True       147m    StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 8
kube-controller-manager                    4.10.0-0.nightly-2021-12-23-153012   True        True          True       145m    NodeControllerDegraded: The master nodes not ready: node "jiwei-cc-7cgps-master-2" not ready since 2021-12-27 07:38:40 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-scheduler                             4.10.0-0.nightly-2021-12-23-153012   True        True          True       144m    InstallerPodContainerWaitingDegraded: Pod "installer-7-jiwei-cc-7cgps-master-2" on node "jiwei-cc-7cgps-master-2" container "installer" is waiting since 2021-12-27 07:35:31 +0000 UTC because ContainerCreating...
machine-config                             4.10.0-0.nightly-2021-12-23-153012   False       False         True       128m    Cluster not available for [{operator 4.10.0-0.nightly-2021-12-23-153012}]
network                                    4.10.0-0.nightly-2021-12-23-153012   True        True          True       146m    DaemonSet "openshift-multus/multus" rollout is not making progress - last change 2021-12-27T07:40:23Z...
openshift-multus/multus" rollout is not making progress - last change 2021-12-27T07:40:23Z...
openshift-apiserver                        4.10.0-0.nightly-2021-12-23-153012   True        True          False      141m    APIServerDeploymentProgressing: deployment/apiserver.openshift-apiserver: 1/3 pods have been updated to the latest generation
openshift-controller-manager               4.10.0-0.nightly-2021-12-23-153012   True        True          False      135m    Progressing: daemonset/controller-manager: updated number scheduled is 2, desired number scheduled is 3
openshift-samples
storage                                    4.10.0-0.nightly-2021-12-23-153012   True        True          False      142m    AlibabaDiskCSIDriverOperatorCRProgressing: AlibabaCloudDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
$ 
$ oc describe co kube-controller-manager | grep Message
    Message:               NodeControllerDegraded: The master nodes not ready: node "jiwei-cc-7cgps-master-2" not ready since 2021-12-27 07:38:40 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
    Message:               NodeInstallerProgressing: 2 nodes are at revision 6; 1 nodes are at revision 7; 0 nodes have achieved new revision 9
    Message:               StaticPodsAvailable: 3 nodes are active; 2 nodes are at revision 6; 1 nodes are at revision 7; 0 nodes have achieved new revision 9
    Message:               All is well
$ 
$ oc describe co machine-config | grep Message
    Message:               Cluster version is 4.10.0-0.nightly-2021-12-23-153012
    Message:               Failed to resync 4.10.0-0.nightly-2021-12-23-153012 because: failed to apply machine config daemon manifests: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-daemon is not ready. status: (desired: 5, updated: 5, ready: 4, unavailable: 1)
    Message:               Cluster not available for [{operator 4.10.0-0.nightly-2021-12-23-153012}]
    Message:               One or more machine config pools are updating, please see `oc get mcp` for further details
$ 
$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-44cfabe16ea78c26ecf76cbe9009ba2b   False     True       False      3              2                   3                     0                      174m
worker   rendered-worker-8e4b60bb75a931a47212b087caf28333   True      False      False      2              2                   2                     0                      174m
$ oc -n openshift-kube-controller-manager get pods | grep -Ev 'Completed|Running'
NAME                                              READY   STATUS              RESTARTS         AGE
installer-7-jiwei-cc-7cgps-master-2               0/1     Terminating         0                167m
installer-8-jiwei-cc-7cgps-master-2               0/1     Terminating         0                165m
installer-9-jiwei-cc-7cgps-master-2               0/1     Pending             0                164m
revision-pruner-7-jiwei-cc-7cgps-master-2         0/1     ContainerCreating   0                169m
revision-pruner-8-jiwei-cc-7cgps-master-2         0/1     Pending             0                165m
revision-pruner-9-jiwei-cc-7cgps-master-2         0/1     Pending             0                164m
$

Comment 12 Jianli Wei 2022-01-15 08:37:52 UTC

@mrbraga FYI I retried with "build openshift/installer#5535" (https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1482198246280400896), by launching 5 clusters, 2 of them succeeded, the other 3 failed but it seems not the node NotReady issue. 

#1
QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/67219/ (SUCCESS)
debug msg=Time elapsed per stage:
debug msg=           cluster: 2m40s
debug msg=         bootstrap: 1m8s
debug msg=Bootstrap Complete: 22m34s
debug msg=               API: 2m42s
debug msg= Bootstrap Destroy: 46s
debug msg= Cluster Operators: 16m19s
info msg=Time elapsed: 43m30s

#2
QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/67240/ (SUCCESS)
debug msg=Time elapsed per stage:
debug msg=           cluster: 3m32s
debug msg=         bootstrap: 1m2s
debug msg=Bootstrap Complete: 22m17s
debug msg=               API: 3m30s
debug msg= Bootstrap Destroy: 34s
debug msg= Cluster Operators: 13m12s
info msg=Time elapsed: 40m39s

#3
QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/67241/ (FAILURE)
> All nodes are Ready, but the operator "console" doesn't tell VERSION.
$ oc get nodes
NAME                                      STATUS   ROLES    AGE   VERSION
jiwei-603-67zxn-master-0                  Ready    master   68m   v1.23.0+60f5a1c
jiwei-603-67zxn-master-1                  Ready    master   68m   v1.23.0+60f5a1c
jiwei-603-67zxn-master-2                  Ready    master   67m   v1.23.0+60f5a1c
jiwei-603-67zxn-worker-us-east-1a-pl58n   Ready    worker   47m   v1.23.0+60f5a1c
jiwei-603-67zxn-worker-us-east-1b-dfcr5   Ready    worker   47m   v1.23.0+60f5a1c
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          70m     Unable to apply 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest: an unknown error has occurred: MultipleErrors
$ oc get co | grep -Ev 'True        False         False'
NAME                                       VERSION                                                   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console                                                                                                                                           
$ 

#4
QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/67242/ (FAILURE)
> All nodes are Ready, but the operator "console" doesn't tell VERSION.
$ oc get nodes
NAME                             STATUS   ROLES    AGE   VERSION
jiwei-604-wv22c-master-0         Ready    master   74m   v1.23.0+60f5a1c
jiwei-604-wv22c-master-1         Ready    master   74m   v1.23.0+60f5a1c
jiwei-604-wv22c-master-2         Ready    master   74m   v1.23.0+60f5a1c
jiwei-604-wv22c-worker-a-b2hdn   Ready    worker   29m   v1.23.0+60f5a1c
jiwei-604-wv22c-worker-c-ldl5h   Ready    worker   49m   v1.23.0+60f5a1c
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          75m     Unable to apply 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest: an unknown error has occurred: MultipleErrors
$ oc get co | grep -Ev 'True        False         False'
NAME                                       VERSION                                                   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console                                                                                                                                           
$ 

#5
QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/67243/ (FAILURE)
> Only master nodes are Ready.
$ oc get nodes
NAME                       STATUS   ROLES    AGE   VERSION
jiwei-605-8m5g7-master-0   Ready    master   27m   v1.23.0+60f5a1c
jiwei-605-8m5g7-master-1   Ready    master   18m   v1.23.0+60f5a1c
jiwei-605-8m5g7-master-2   Ready    master   33m   v1.23.0+60f5a1c
$ oc get machines -n openshift-machine-api
NAME                             PHASE         TYPE            REGION        ZONE            AGE
jiwei-605-8m5g7-master-0         Running       ecs.g6.xlarge   cn-hangzhou   cn-hangzhou-k   41m
jiwei-605-8m5g7-master-1         Running       ecs.g6.xlarge   cn-hangzhou   cn-hangzhou-i   41m
jiwei-605-8m5g7-master-2         Running       ecs.g6.xlarge   cn-hangzhou   cn-hangzhou-j   41m
jiwei-605-8m5g7-worker-i-7stnd   Provisioned   ecs.g6.large    cn-hangzhou   cn-hangzhou-i   19m
jiwei-605-8m5g7-worker-k-7t466   Provisioned   ecs.g6.large    cn-hangzhou   cn-hangzhou-k   19m
$ oc get co
NAME                                       VERSION                                                   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   False       False         True       24m     APIServicesAvailable: "oauth.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request...
baremetal                                  4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      19m
cloud-controller-manager                   4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      32m
cloud-credential                           4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      22m
cluster-autoscaler                         4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      22m
config-operator                            4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      24m
console                                                                                                                                  
csi-snapshot-controller                    4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      23m
dns                                        4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      19m
etcd                                       4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      18m
image-registry                                                                                                                           
ingress                                                                                              False       True          True       3m40s   The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
insights                                   4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      18m
kube-apiserver                             4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      13m
kube-controller-manager                    4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      14m
kube-scheduler                             4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      2m26s
kube-storage-version-migrator              4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      23m
machine-api                                4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      16m
machine-approver                           4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      22m
machine-config                             4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      17m
marketplace                                4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      22m
monitoring                                                                                           False       True          True       2m24s   Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
network                                                                                              False       True          False      31m     The network is starting up
node-tuning                                4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      22m
openshift-apiserver                        4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   False       False         False      22m     APIServicesAvailable: "apps.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request...
openshift-controller-manager               4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      18m
openshift-samples                                                                                                                        
operator-lifecycle-manager                 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      23m
operator-lifecycle-manager-catalog         4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      23m
operator-lifecycle-manager-packageserver                                                             False       True          False      23m     ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install timeout
service-ca                                 4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      24m
storage                                    4.10.0-0.ci.test-2022-01-15-041121-ci-ln-nsz3x1k-latest   True        False         False      19m
$

Comment 14 Matthew Staebler 2022-01-17 16:34:34 UTC

*** Bug 2005647 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2022-03-10 16:36:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Comment 21 Red Hat Bugzilla 2023-09-15 01:18:24 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.