1718061 – KCM cannot recover from deleting its lease configmap

Bug 1718061 - KCM cannot recover from deleting its lease configmap

Summary: KCM cannot recover from deleting its lease configmap

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-controller-manager
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.2.z
Assignee:	Mike Dame
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:	1744984 1780843
Blocks:	1781240
TreeView+	depends on / blocked

Reported:	2019-06-06 19:45 UTC by David Eads
Modified:	2020-01-22 10:47 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Kube Controller Manager did not have permissions to recreate its lease configmap Consequence: KCM would fail to properly secure lease Fix: Updated KCM permissions created by OpenShift KCM Operator Result: KCM can now recover from lease configmap deletion
Clone Of:
Clones:	1744984 1781240 (view as bug list)
Environment:
Last Closed:	2020-01-22 10:46:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-kube-controller-manager-operator pull 323	0	None	closed	Bug 1718061: Fix KCM leader election configmap deletion recovery	2020-12-07 06:56:38 UTC
Red Hat Product Errata	RHBA-2020:0107	0	None	None	None	2020-01-22 10:47:03 UTC

Description David Eads 2019-06-06 19:45:17 UTC

You get `E0606 19:42:53.690619       1 leaderelection.go:310] error initially creating leader election record: configmaps is forbidden: User "system:kube-controller-manager" cannot create resource "configmaps" in API group "" in the namespace "kube-system"
` in the log.

We need to fix the RBAC rules to allow this so that we can auto-recover from a user deletion.

Comment 1 Mike Dame 2019-07-23 19:20:10 UTC

Trying to reproduce this now, I'm not entirely clear on the details (which configmap this is related to and where this log message comes from) I can't see this problem anymore in either KCM or KCM-O logs:

$ oc delete configmaps/kube-controller-manager -n kube-system ##(is this the right configmap?)
$ oc logs pod/openshift-kube-scheduler-operator-66b8c9947b-8qd6r -n openshift-kube-scheduler-operator | grep "configmaps is forbidden"
## no output
$ oc logs pod/openshift-kube-scheduler-ip-10-0-155-71.us-west-2.compute.internal -n openshift-kube-scheduler | grep "configmaps is forbidden"
## no output

Given that, I'm going to put this ON_QA to verify and if not include any details I missed

Comment 2 Mike Dame 2019-07-23 19:35:32 UTC

Even though the error isn't there, the configmap doesn't get recreated... taking off qa

Comment 4 Maciej Szulik 2019-08-23 11:13:35 UTC

Mike is it a thing we need to track for 4.2?

Comment 5 David Eads 2019-08-23 12:27:31 UTC

We need this fixed.  I recommend creating a test that deletes the configmap and then deletes the pods.  Then creates a replicaset and sees pods created.

Comment 7 Mike Dame 2019-12-09 14:40:24 UTC

This has been fixed in https://github.com/openshift/cluster-kube-controller-manager-operator/pull/311 and will be backported onced merged

Comment 9 zhou ying 2020-01-14 07:36:53 UTC

Confirmed with latest version, the issue can't be reproduced:

[root@dhcp-140-138 roottest]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2020-01-13-060909   True        False         10m     Cluster version is 4.2.0-0.nightly-2020-01-13-060909
[root@dhcp-140-138 roottest]# oc  get cm -n kube-system
NAME                                 DATA   AGE
bootstrap                            1      23m
cluster-config-v1                    1      29m
extension-apiserver-authentication   6      29m
kube-controller-manager              0      29m
root-ca                              1      29m
[root@dhcp-140-138 roottest]# oc delete cm/kube-controller-manager  -n kube-system
configmap "kube-controller-manager" deleted
[root@dhcp-140-138 roottest]# oc  get cm -n kube-system
NAME                                 DATA   AGE
bootstrap                            1      24m
cluster-config-v1                    1      29m
extension-apiserver-authentication   6      29m
kube-controller-manager              0      1s
root-ca                              1      29m

Comment 11 errata-xmlrpc 2020-01-22 10:46:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0107

Note You need to log in before you can comment on or make changes to this bug.