Bug 2042579 - KCM in progressing state for a long time and the message reads NodeInstallerProgressing: 2 nodes are at revision 7; 1 nodes are at revision 8
Summary: KCM in progressing state for a long time and the message reads NodeInstaller...
Keywords:
Status: CLOSED DUPLICATE of bug 2029470
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Maciej Szulik
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-19 18:16 UTC by RamaKasturi
Modified: 2022-01-21 15:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-19 20:00:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description RamaKasturi 2022-01-19 18:16:59 UTC
Description of problem:
KCM in progressing state for a long time and the message reads  NodeInstallerProgressing: 2 nodes are at revision 7; 1 nodes are at revision 8

kube-controller-manager                    4.10.0-0.nightly-2022-01-17-023213   True        True          False      177m    NodeInstallerProgressing: 2 nodes are at revision 7; 1 nodes are at revision 8

Also i see that one of the guard manager pod is in 0/1 state 

[knarra@knarra must-gather-logs]$ oc get pods -n openshift-kube-controller-manager
NAME                                                            READY   STATUS      RESTARTS        AGE
installer-3-knarra-alicloud1-dmlg5-master-2                     0/1     Completed   0               4h35m
installer-4-knarra-alicloud1-dmlg5-master-1                     0/1     Completed   0               4h34m
installer-4-knarra-alicloud1-dmlg5-master-2                     0/1     Completed   0               4h34m
installer-5-knarra-alicloud1-dmlg5-master-0                     0/1     Completed   0               4h33m
installer-5-knarra-alicloud1-dmlg5-master-2                     0/1     Completed   0               4h31m
installer-6-knarra-alicloud1-dmlg5-master-2                     0/1     Completed   0               4h28m
installer-7-knarra-alicloud1-dmlg5-master-0                     0/1     Completed   0               4h22m
installer-7-knarra-alicloud1-dmlg5-master-1                     0/1     Completed   0               4h25m
installer-7-knarra-alicloud1-dmlg5-master-2                     0/1     Completed   0               4h28m
installer-8-knarra-alicloud1-dmlg5-master-1                     0/1     Completed   0               4h16m
installer-8-knarra-alicloud1-dmlg5-master-2                     0/1     Completed   0               4h17m
kube-controller-manager-guard-knarra-alicloud1-dmlg5-master-0   1/1     Running     0               4h31m
kube-controller-manager-guard-knarra-alicloud1-dmlg5-master-1   0/1     Running     0               4h33m
kube-controller-manager-guard-knarra-alicloud1-dmlg5-master-2   1/1     Running     0               4h35m
kube-controller-manager-knarra-alicloud1-dmlg5-master-0         4/4     Running     2 (4h11m ago)   4h21m
kube-controller-manager-knarra-alicloud1-dmlg5-master-1         4/4     Running     1 (4h20m ago)   4h23m
kube-controller-manager-knarra-alicloud1-dmlg5-master-2         4/4     Running     3 (4h10m ago)   4h16m
revision-pruner-7-knarra-alicloud1-dmlg5-master-0               0/1     Completed   0               4h21m
revision-pruner-7-knarra-alicloud1-dmlg5-master-1               0/1     Completed   0               4h21m
revision-pruner-7-knarra-alicloud1-dmlg5-master-2               0/1     Completed   0               4h21m
revision-pruner-8-knarra-alicloud1-dmlg5-master-0               0/1     Completed   0               4h17m
revision-pruner-8-knarra-alicloud1-dmlg5-master-1               0/1     Completed   0               4h17m
revision-pruner-8-knarra-alicloud1-dmlg5-master-2               0/1     Completed   0               4h17m


Version-Release number of selected component (if applicable):
[knarra@knarra must-gather-logs]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-01-17-023213   True        False         4h20m   Cluster version is 4.10.0-0.nightly-2022-01-17-023213


How reproducible:
Hit it once

Steps to Reproduce:
1. Install 4.10 cluster on alibaba cloud
2.
3.

Actual results:
Install fails with error KCM in progressing state.

01-19 19:44:49.087  [ERROR] Cluster is NOT ready/stable after 10 attempts. Aborting execution.
01-19 19:44:50.143  Bad clusteroperators not in health state:
01-19 19:44:50.143  kube-controller-manager                    4.10.0-0.nightly-2022-01-17-023213   True   True    False   35m   NodeInstallerProgressing: 2 nodes are at revision 7; 1 nodes are at revision 8
01-19 19:44:50.143  Using oc describe to check status of bad core clusteroperators ...
01-19 19:44:50.143  Name: kube-controller-manager
01-19 19:44:51.543  Status:
01-19 19:44:51.543    Conditions:
01-19 19:44:51.543      Last Transition Time:  2022-01-19T13:41:43Z
01-19 19:44:51.543      Message:               NodeControllerDegraded: All master nodes are ready
01-19 19:44:51.543      Reason:                AsExpected
01-19 19:44:51.543      Status:                False
01-19 19:44:51.543      Type:                  Degraded
01-19 19:44:51.543      Last Transition Time:  2022-01-19T13:56:18Z
01-19 19:44:51.543      Message:               NodeInstallerProgressing: 2 nodes are at revision 7; 1 nodes are at revision 8
01-19 19:44:51.543      Reason:                NodeInstaller
01-19 19:44:51.543      Status:                True
01-19 19:44:51.543      Type:                  Progressing
01-19 19:44:51.543      Last Transition Time:  2022-01-19T13:39:18Z
01-19 19:44:51.543      Message:               StaticPodsAvailable: 3 nodes are active; 2 nodes are at revision 7; 1 nodes are at revision 8
01-19 19:44:51.543      Reason:                AsExpected
01-19 19:44:51.543      Status:                True
01-19 19:44:51.543      Type:                  Available

Expected results:
KCM should not be in progressing for a long time.

Additional info:

Comment 1 RamaKasturi 2022-01-19 18:21:09 UTC
Must-gather and node logs are present in the link below

http://virt-openshift-05.lab.eng.nay.redhat.com/knarra/2042579/

Comment 2 Jan Chaloupka 2022-01-19 20:00:52 UTC
From installer-8-knarra-alicloud1-dmlg5-master-1 (pulled locally through `oc logs` before the cluster was destroyed):
```
I0119 13:57:17.624603       1 cmd.go:494] Writing static pod manifest "/etc/kubernetes/manifests/kube-controller-manager-pod.yaml" ...
{"kind":"Pod","apiVersion":"v1","metadata":{"name":"kube-controller-manager","namespace":"openshift-kube-controller-manager","creationTimestamp":null,"labels":{"app":"kube-controller-manager","kube-controller-manager":"true","revision":"8"},
...

From installer-7-knarra-alicloud1-dmlg5-master-1 (pulled from the attached must-gather):
```
2022-01-19T13:58:58.543540810Z I0119 13:58:58.543512       1 cmd.go:370] Writing a pod under "kube-scheduler-pod.yaml" key
2022-01-19T13:58:58.543540810Z {"kind":"Pod","apiVersion":"v1","metadata":{"name":"openshift-kube-scheduler","namespace":"openshift-kube-scheduler","creationTimestamp":null,"labels":{"app":"openshift-kube-scheduler","revision":"7","scheduler":"true"}
```

*** This bug has been marked as a duplicate of bug 2029470 ***


Note You need to log in before you can comment on or make changes to this bug.