Description of problem: KCM in progressing state for a long time and the message reads NodeInstallerProgressing: 2 nodes are at revision 7; 1 nodes are at revision 8 kube-controller-manager 4.10.0-0.nightly-2022-01-17-023213 True True False 177m NodeInstallerProgressing: 2 nodes are at revision 7; 1 nodes are at revision 8 Also i see that one of the guard manager pod is in 0/1 state [knarra@knarra must-gather-logs]$ oc get pods -n openshift-kube-controller-manager NAME READY STATUS RESTARTS AGE installer-3-knarra-alicloud1-dmlg5-master-2 0/1 Completed 0 4h35m installer-4-knarra-alicloud1-dmlg5-master-1 0/1 Completed 0 4h34m installer-4-knarra-alicloud1-dmlg5-master-2 0/1 Completed 0 4h34m installer-5-knarra-alicloud1-dmlg5-master-0 0/1 Completed 0 4h33m installer-5-knarra-alicloud1-dmlg5-master-2 0/1 Completed 0 4h31m installer-6-knarra-alicloud1-dmlg5-master-2 0/1 Completed 0 4h28m installer-7-knarra-alicloud1-dmlg5-master-0 0/1 Completed 0 4h22m installer-7-knarra-alicloud1-dmlg5-master-1 0/1 Completed 0 4h25m installer-7-knarra-alicloud1-dmlg5-master-2 0/1 Completed 0 4h28m installer-8-knarra-alicloud1-dmlg5-master-1 0/1 Completed 0 4h16m installer-8-knarra-alicloud1-dmlg5-master-2 0/1 Completed 0 4h17m kube-controller-manager-guard-knarra-alicloud1-dmlg5-master-0 1/1 Running 0 4h31m kube-controller-manager-guard-knarra-alicloud1-dmlg5-master-1 0/1 Running 0 4h33m kube-controller-manager-guard-knarra-alicloud1-dmlg5-master-2 1/1 Running 0 4h35m kube-controller-manager-knarra-alicloud1-dmlg5-master-0 4/4 Running 2 (4h11m ago) 4h21m kube-controller-manager-knarra-alicloud1-dmlg5-master-1 4/4 Running 1 (4h20m ago) 4h23m kube-controller-manager-knarra-alicloud1-dmlg5-master-2 4/4 Running 3 (4h10m ago) 4h16m revision-pruner-7-knarra-alicloud1-dmlg5-master-0 0/1 Completed 0 4h21m revision-pruner-7-knarra-alicloud1-dmlg5-master-1 0/1 Completed 0 4h21m revision-pruner-7-knarra-alicloud1-dmlg5-master-2 0/1 Completed 0 4h21m revision-pruner-8-knarra-alicloud1-dmlg5-master-0 0/1 Completed 0 4h17m revision-pruner-8-knarra-alicloud1-dmlg5-master-1 0/1 Completed 0 4h17m revision-pruner-8-knarra-alicloud1-dmlg5-master-2 0/1 Completed 0 4h17m Version-Release number of selected component (if applicable): [knarra@knarra must-gather-logs]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-01-17-023213 True False 4h20m Cluster version is 4.10.0-0.nightly-2022-01-17-023213 How reproducible: Hit it once Steps to Reproduce: 1. Install 4.10 cluster on alibaba cloud 2. 3. Actual results: Install fails with error KCM in progressing state. 01-19 19:44:49.087 [ERROR] Cluster is NOT ready/stable after 10 attempts. Aborting execution. 01-19 19:44:50.143 Bad clusteroperators not in health state: 01-19 19:44:50.143 kube-controller-manager 4.10.0-0.nightly-2022-01-17-023213 True True False 35m NodeInstallerProgressing: 2 nodes are at revision 7; 1 nodes are at revision 8 01-19 19:44:50.143 Using oc describe to check status of bad core clusteroperators ... 01-19 19:44:50.143 Name: kube-controller-manager 01-19 19:44:51.543 Status: 01-19 19:44:51.543 Conditions: 01-19 19:44:51.543 Last Transition Time: 2022-01-19T13:41:43Z 01-19 19:44:51.543 Message: NodeControllerDegraded: All master nodes are ready 01-19 19:44:51.543 Reason: AsExpected 01-19 19:44:51.543 Status: False 01-19 19:44:51.543 Type: Degraded 01-19 19:44:51.543 Last Transition Time: 2022-01-19T13:56:18Z 01-19 19:44:51.543 Message: NodeInstallerProgressing: 2 nodes are at revision 7; 1 nodes are at revision 8 01-19 19:44:51.543 Reason: NodeInstaller 01-19 19:44:51.543 Status: True 01-19 19:44:51.543 Type: Progressing 01-19 19:44:51.543 Last Transition Time: 2022-01-19T13:39:18Z 01-19 19:44:51.543 Message: StaticPodsAvailable: 3 nodes are active; 2 nodes are at revision 7; 1 nodes are at revision 8 01-19 19:44:51.543 Reason: AsExpected 01-19 19:44:51.543 Status: True 01-19 19:44:51.543 Type: Available Expected results: KCM should not be in progressing for a long time. Additional info:
Must-gather and node logs are present in the link below http://virt-openshift-05.lab.eng.nay.redhat.com/knarra/2042579/
From installer-8-knarra-alicloud1-dmlg5-master-1 (pulled locally through `oc logs` before the cluster was destroyed): ``` I0119 13:57:17.624603 1 cmd.go:494] Writing static pod manifest "/etc/kubernetes/manifests/kube-controller-manager-pod.yaml" ... {"kind":"Pod","apiVersion":"v1","metadata":{"name":"kube-controller-manager","namespace":"openshift-kube-controller-manager","creationTimestamp":null,"labels":{"app":"kube-controller-manager","kube-controller-manager":"true","revision":"8"}, ... From installer-7-knarra-alicloud1-dmlg5-master-1 (pulled from the attached must-gather): ``` 2022-01-19T13:58:58.543540810Z I0119 13:58:58.543512 1 cmd.go:370] Writing a pod under "kube-scheduler-pod.yaml" key 2022-01-19T13:58:58.543540810Z {"kind":"Pod","apiVersion":"v1","metadata":{"name":"openshift-kube-scheduler","namespace":"openshift-kube-scheduler","creationTimestamp":null,"labels":{"app":"openshift-kube-scheduler","revision":"7","scheduler":"true"} ``` *** This bug has been marked as a duplicate of bug 2029470 ***