1876484 – Static pod installer controller deadlocks with non-existing installer pod, WAS: kube-apisrever of clsuter operator always with incorrect status due to pleg error

Bug 1876484 - Static pod installer controller deadlocks with non-existing installer pod, WAS: kube-apisrever of clsuter operator always with incorrect status due to pleg error

Summary: Static pod installer controller deadlocks with non-existing installer pod, WA...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.4.z
Assignee:	Stefan Schimanski
QA Contact:	Ke Wang
Docs Contact:
URL:
Whiteboard:
Depends On:	1874597
Blocks:	1822016 1876486 1880086
TreeView+	depends on / blocked

Reported:	2020-09-07 10:13 UTC by Maciej Szulik
Modified:	2021-06-16 10:51 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1874597
Clones:	1876486 (view as bug list)
Environment:
Last Closed:	2020-11-11 04:57:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-etcd-operator pull 450	None	closed	Bug 1876484: static-pod-installer: remove deadlock by recreating installers that disappear	2021-02-09 07:13:43 UTC
Github	openshift cluster-kube-apiserver-operator pull 959	None	closed	Bug 1876484: static-pod-installer: remove deadlock by recreating inst…	2021-02-09 07:13:43 UTC
Github	openshift cluster-kube-controller-manager-operator pull 457	None	closed	[release-4.4] Bug 1876484: static-pod-installer: remove deadlock by recreating inst…	2021-02-09 07:13:43 UTC
Github	openshift cluster-kube-scheduler-operator pull 284	None	closed	[release-4.4] Bug 1876484: static-pod-installer: remove deadlock by recreating installers that disappear	2021-02-09 07:13:44 UTC
Github	openshift library-go pull 881	None	closed	[release-4.4] Bug 1876484: static-pod-installer: recreate installers that disappeared	2021-02-09 07:13:44 UTC
Red Hat Product Errata	RHBA-2020:4321	None	None	None	2020-11-11 04:57:22 UTC

Comment 1 Stefan Schimanski 2020-09-11 15:33:48 UTC

We are waiting for PRs to merge and to be verified for 4.5. Adding UpcomingSprint.

Comment 2 Stefan Schimanski 2020-10-02 09:05:21 UTC

PRs are in the queue. Adding UpcomingSprint.

Comment 4 Ke Wang 2020-10-26 08:14:45 UTC

- IPI installed one connected cluster on GCP successfully, see cluster detail,

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-10-25-180448   True        False         10m     Cluster version is 4.4.0-0.nightly-2020-10-25-180448


$ oc get infrastructures.config.openshift.io  -o json | jq .items[0].status.platform
"GCP"

$ oc get node
NAME                                                      STATUS   ROLES    AGE   VERSION
kewang2641-zc6tz-master-0.c.openshift-qe.internal         Ready    master   50m   v1.17.1+fd2e9f9
kewang2641-zc6tz-master-1.c.openshift-qe.internal         Ready    master   49m   v1.17.1+fd2e9f9
kewang2641-zc6tz-master-2.c.openshift-qe.internal         Ready    master   50m   v1.17.1+fd2e9f9
kewang2641-zc6tz-worker-a-mj9g5.c.openshift-qe.internal   Ready    worker   25m   v1.17.1+fd2e9f9
kewang2641-zc6tz-worker-b-5f5zk.c.openshift-qe.internal   Ready    worker   25m   v1.17.1+fd2e9f9
kewang2641-zc6tz-worker-c-h4j4f.c.openshift-qe.internal   Ready    worker   25m   v1.17.1+fd2e9f9

$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.4.0-0.nightly-2020-10-25-180448   True        False         False      18m
cloud-credential                           4.4.0-0.nightly-2020-10-25-180448   True        False         False      50m
cluster-autoscaler                         4.4.0-0.nightly-2020-10-25-180448   True        False         False      27m
console                                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      20m
csi-snapshot-controller                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      23m
dns                                        4.4.0-0.nightly-2020-10-25-180448   True        False         False      46m
etcd                                       4.4.0-0.nightly-2020-10-25-180448   True        False         False      46m
image-registry                             4.4.0-0.nightly-2020-10-25-180448   True        False         False      24m
ingress                                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      24m
insights                                   4.4.0-0.nightly-2020-10-25-180448   True        False         False      28m
kube-apiserver                             4.4.0-0.nightly-2020-10-25-180448   True        False         False      46m
kube-controller-manager                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      28m
kube-scheduler                             4.4.0-0.nightly-2020-10-25-180448   True        False         False      26m
kube-storage-version-migrator              4.4.0-0.nightly-2020-10-25-180448   True        False         False      24m
machine-api                                4.4.0-0.nightly-2020-10-25-180448   True        False         False      28m
machine-config                             4.4.0-0.nightly-2020-10-25-180448   True        False         False      28m
marketplace                                4.4.0-0.nightly-2020-10-25-180448   True        False         False      28m
monitoring                                 4.4.0-0.nightly-2020-10-25-180448   True        False         False      22m
network                                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      49m
node-tuning                                4.4.0-0.nightly-2020-10-25-180448   True        False         False      48m
openshift-apiserver                        4.4.0-0.nightly-2020-10-25-180448   True        False         False      45m
openshift-controller-manager               4.4.0-0.nightly-2020-10-25-180448   True        False         False      28m
openshift-samples                          4.4.0-0.nightly-2020-10-25-180448   True        False         False      19m
operator-lifecycle-manager                 4.4.0-0.nightly-2020-10-25-180448   True        False         False      46m
operator-lifecycle-manager-catalog         4.4.0-0.nightly-2020-10-25-180448   True        False         False      46m
operator-lifecycle-manager-packageserver   4.4.0-0.nightly-2020-10-25-180448   True        False         False      45m
service-ca                                 4.4.0-0.nightly-2020-10-25-180448   True        False         False      48m
service-catalog-apiserver                  4.4.0-0.nightly-2020-10-25-180448   True        False         False      48m
service-catalog-controller-manager         4.4.0-0.nightly-2020-10-25-180448   True        False         False      49m
storage                                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      28m

- UPI installed one disconnected cluster on vSphere6.7 successfully, see cluster detail,

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-10-25-180448   True        False         135m    Cluster version is 4.4.0-0.nightly-2020-10-25-180448

$ oc get infrastructures.config.openshift.io  -o json | jq .items[0].status.platform
"VSphere"

$ oc  get no
NAME              STATUS   ROLES    AGE    VERSION
compute-0         Ready    worker   146m   v1.17.1+fd2e9f9
compute-1         Ready    worker   146m   v1.17.1+fd2e9f9
control-plane-0   Ready    master   3h7m   v1.17.1+fd2e9f9
control-plane-1   Ready    master   3h8m   v1.17.1+fd2e9f9
control-plane-2   Ready    master   3h7m   v1.17.1+fd2e9f9

$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.4.0-0.nightly-2020-10-25-180448   True        False         False      139m
cloud-credential                           4.4.0-0.nightly-2020-10-25-180448   True        False         False      3h12m
cluster-autoscaler                         4.4.0-0.nightly-2020-10-25-180448   True        False         False      157m
console                                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      92m
csi-snapshot-controller                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      109m
dns                                        4.4.0-0.nightly-2020-10-25-180448   True        False         False      3h4m
etcd                                       4.4.0-0.nightly-2020-10-25-180448   True        False         False      3h5m
image-registry                             4.4.0-0.nightly-2020-10-25-180448   True        False         False      109m
ingress                                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      109m
insights                                   4.4.0-0.nightly-2020-10-25-180448   True        False         False      160m
kube-apiserver                             4.4.0-0.nightly-2020-10-25-180448   True        False         False      3h4m
kube-controller-manager                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      163m
kube-scheduler                             4.4.0-0.nightly-2020-10-25-180448   True        False         False      169m
kube-storage-version-migrator              4.4.0-0.nightly-2020-10-25-180448   True        False         False      109m
machine-api                                4.4.0-0.nightly-2020-10-25-180448   True        False         False      160m
machine-config                             4.4.0-0.nightly-2020-10-25-180448   True        False         False      90m
marketplace                                4.4.0-0.nightly-2020-10-25-180448   True        False         False      90m
monitoring                                 4.4.0-0.nightly-2020-10-25-180448   True        False         False      16m
network                                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      3h2m
node-tuning                                4.4.0-0.nightly-2020-10-25-180448   True        False         False      3h7m
openshift-apiserver                        4.4.0-0.nightly-2020-10-25-180448   True        False         False      103m
openshift-controller-manager               4.4.0-0.nightly-2020-10-25-180448   True        False         False      160m
openshift-samples                          4.4.0-0.nightly-2020-10-25-180448   True        False         False      88m
operator-lifecycle-manager                 4.4.0-0.nightly-2020-10-25-180448   True        False         False      3h5m
operator-lifecycle-manager-catalog         4.4.0-0.nightly-2020-10-25-180448   True        False         False      3h5m
operator-lifecycle-manager-packageserver   4.4.0-0.nightly-2020-10-25-180448   True        False         False      92m
service-ca                                 4.4.0-0.nightly-2020-10-25-180448   True        False         False      3h7m
service-catalog-apiserver                  4.4.0-0.nightly-2020-10-25-180448   True        False         False      92m
service-catalog-controller-manager         4.4.0-0.nightly-2020-10-25-180448   True        False         False      91m
storage                                    4.4.0-0.nightly-2020-10-25-180448   True        False         False      160m

Both connected and disconnected clusters work well, so move the bug VERIFIED.

Comment 7 errata-xmlrpc 2020-11-11 04:57:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.4.30 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4321

Note You need to log in before you can comment on or make changes to this bug.