Bug 1941980 - cluster-kube-descheduler operator is broken when upgraded from 4.7 to 4.8
Summary: cluster-kube-descheduler operator is broken when upgraded from 4.7 to 4.8
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.8.0
Assignee: Jan Chaloupka
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-23 10:54 UTC by RamaKasturi
Modified: 2021-07-27 22:55 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:55:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-descheduler-operator pull 174 0 None open bug 1941980: Add rbac rules for rendering metric related manifests 2021-03-23 11:05:42 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:55:26 UTC

Description RamaKasturi 2021-03-23 10:54:57 UTC
Description of problem:
cluster-kube-descheduler operator is broken when upgraded from 4.7 to 4.8. Below error is seen in the operator logs
E0323 09:30:41.391229       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 09:35:12.639124       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 09:36:09.085741       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 09:45:12.636708       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 09:52:49.099532       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 09:55:12.636796       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 10:05:12.636712       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 10:09:29.113344       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 10:15:12.636566       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 10:25:12.636193       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 10:26:09.127975       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 10:35:12.637483       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 10:42:49.141855       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"

E0323 10:45:12.636575       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"


Version-Release number of selected component (if applicable):
[knarra@knarra ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-03-22-025559   True        False         3h52m   Cluster version is 4.7.0-0.nightly-2021-03-22-025559


How reproducible:
Always

Steps to Reproduce:
1. Install latest 4.7 cluster
2. Edit the subscription, change the channel to 4.8 & starting csv to the one in 4.8
3. wait for the descheduler operator to respin

Actual results:
Descheduler operator gets respinned but below appear in the operator log which indicates that operator is broken
E0323 10:15:12.636566       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 10:25:12.636193       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 10:26:09.127975       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 10:35:12.637483       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"
E0323 10:42:49.141855       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"

E0323 10:45:12.636575       1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator"


Expected results:
No errors should be thrown related to rbac.

Additional info:
Looks like this happens only on 4.8 as there is a recent code change which added support for metrics and forgot to add additional rbac rules

Comment 2 RamaKasturi 2021-03-25 09:08:28 UTC
Verified with build below and i see that there are no errors present in the descheduler operator logs when 4.7 upgraded to 4.8

[knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator
NAME                                                   DISPLAY                     VERSION                 REPLACES                                               PHASE
clusterkubedescheduleroperator.4.8.0-202103241237.p0   Kube Descheduler Operator   4.8.0-202103241237.p0   clusterkubedescheduleroperator.4.7.0-202103060100.p0   Succeeded

Below are the steps followed to verify the bugs:
================================================
1) Install 4.7 cluster
2) Edit subscription, change the channel to 4.8, set starting CSV to 4.8.0-202103241237.p0 and change source to qe-app-registry
3) Now we can see that descheduler operator gets upgraded to 4.8
4) verify descheduler operator pod & cluster pod logs to make sure that there are no errors present.

[knarra@knarra ~]$ oc get pods -n openshift-kube-descheduler-operator
NAME                                    READY   STATUS    RESTARTS   AGE
cluster-55789d67fd-c4hxs                1/1     Running   0          28s
descheduler-operator-5b48df6849-b68x2   1/1     Running   0          91s
[knarra@knarra ~]$ 
[knarra@knarra ~]$ oc get sub
No resources found in default namespace.
[knarra@knarra ~]$ oc get sub -n openshift-kube-descheduler-operator
NAME                                PACKAGE                             SOURCE             CHANNEL
cluster-kube-descheduler-operator   cluster-kube-descheduler-operator   redhat-operators   4.7
[knarra@knarra ~]$ oc edit sub cluster-kube-descheduler-operator -n openshift-kube-descheduler-operator
subscription.operators.coreos.com/cluster-kube-descheduler-operator edited
[knarra@knarra ~]$ oc get sub
No resources found in default namespace.
[knarra@knarra ~]$ oc get sub -n openshift-kube-descheduler-operator
NAME                                PACKAGE                             SOURCE            CHANNEL
cluster-kube-descheduler-operator   cluster-kube-descheduler-operator   qe-app-registry   4.8
[knarra@knarra ~]$ oc get ip -n openshift-kube-descheduler-operator
NAME            CSV                                                    APPROVAL    APPROVED
install-5srs7   clusterkubedescheduleroperator.4.8.0-202103241237.p0   Automatic   true
install-x5gzg   clusterkubedescheduleroperator.4.7.0-202103060100.p0   Automatic   true
[knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator
NAME                                                   DISPLAY                     VERSION                 REPLACES                                               PHASE
clusterkubedescheduleroperator.4.7.0-202103060100.p0   Kube Descheduler Operator   4.7.0-202103060100.p0                                                          Replacing
clusterkubedescheduleroperator.4.8.0-202103241237.p0   Kube Descheduler Operator   4.8.0-202103241237.p0   clusterkubedescheduleroperator.4.7.0-202103060100.p0   Installing
[knarra@knarra ~]$ oc get pods -n openshift-kube-descheduler-operator
NAME                                    READY   STATUS    RESTARTS   AGE
cluster-55789d67fd-c4hxs                1/1     Running   0          2m5s
descheduler-operator-79f74d86d7-ds9vl   1/1     Running   0          22s
[knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator
NAME                                                   DISPLAY                     VERSION                 REPLACES                                               PHASE
clusterkubedescheduleroperator.4.8.0-202103241237.p0   Kube Descheduler Operator   4.8.0-202103241237.p0   clusterkubedescheduleroperator.4.7.0-202103060100.p0   Succeeded
[knarra@knarra ~]$ oc get ip -n openshift-kube-descheduler-operator
NAME            CSV                                                    APPROVAL    APPROVED
install-5srs7   clusterkubedescheduleroperator.4.8.0-202103241237.p0   Automatic   true
install-x5gzg   clusterkubedescheduleroperator.4.7.0-202103060100.p0   Automatic   true
[knarra@knarra ~]$ oc get sub -n openshift-kube-descheduler-operator
NAME                                PACKAGE                             SOURCE            CHANNEL
cluster-kube-descheduler-operator   cluster-kube-descheduler-operator   qe-app-registry   4.8

operator logs:
===================
[knarra@knarra ~]$ oc logs -f descheduler-operator-79f74d86d7-ds9vl -n openshift-kube-descheduler-operator
W0325 07:13:47.018815       1 cmd.go:204] Using insecure, self-signed certificates
I0325 07:13:47.364362       1 observer_polling.go:159] Starting file observer
I0325 07:13:47.392110       1 builder.go:238] openshift-cluster-kube-descheduler-operator version -
W0325 07:13:47.709899       1 secure_serving.go:69] Use of insecure cipher 'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256' detected.
W0325 07:13:47.709917       1 secure_serving.go:69] Use of insecure cipher 'TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256' detected.
I0325 07:13:47.712293       1 leaderelection.go:243] attempting to acquire leader lease openshift-kube-descheduler-operator/openshift-cluster-kube-descheduler-operator-lock...
I0325 07:13:47.714475       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0325 07:13:47.714489       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0325 07:13:47.714511       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0325 07:13:47.714516       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0325 07:13:47.714529       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0325 07:13:47.714534       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0325 07:13:47.714803       1 secure_serving.go:197] Serving securely on [::]:8443
I0325 07:13:47.714889       1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/serving-cert-123549745/tls.crt::/tmp/serving-cert-123549745/tls.key
I0325 07:13:47.714910       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0325 07:13:47.815370       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I0325 07:13:47.815393       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
I0325 07:13:47.815370       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 
I0325 07:14:08.697436       1 leaderelection.go:253] successfully acquired lease openshift-kube-descheduler-operator/openshift-cluster-kube-descheduler-operator-lock
I0325 07:14:08.697799       1 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-kube-descheduler-operator", Name:"openshift-cluster-kube-descheduler-operator-lock", UID:"4e114cbd-c8dc-4dfb-a65c-8ed1669b7ca5", APIVersion:"v1", ResourceVersion:"51125", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' descheduler-operator-79f74d86d7-ds9vl_2172be2a-6156-487b-96b8-f22b049e957d became leader
I0325 07:14:08.707558       1 starter.go:65] Starting informers
I0325 07:14:08.707576       1 starter.go:69] Starting log level controller
I0325 07:14:08.707583       1 starter.go:71] Starting target config reconciler
I0325 07:14:08.707598       1 target_config_reconciler.go:322] Starting TargetConfigReconciler
I0325 07:14:08.707685       1 base_controller.go:66] Waiting for caches to sync for LoggingSyncer
I0325 07:14:08.743528       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-descheduler-operator", Name:"cluster", UID:"59143a26-48dd-4abf-8efb-a77cc827b1df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ServiceCreated' Created Service/metrics -n openshift-kube-descheduler-operator because it was missing
I0325 07:14:08.771826       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-descheduler-operator", Name:"cluster", UID:"59143a26-48dd-4abf-8efb-a77cc827b1df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RoleCreated' Created Role.rbac.authorization.k8s.io/prometheus-k8s -n openshift-kube-descheduler-operator because it was missing
I0325 07:14:08.803744       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-descheduler-operator", Name:"cluster", UID:"59143a26-48dd-4abf-8efb-a77cc827b1df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RoleBindingCreated' Created RoleBinding.rbac.authorization.k8s.io/prometheus-k8s -n openshift-kube-descheduler-operator because it was missing
I0325 07:14:08.807861       1 base_controller.go:72] Caches are synced for LoggingSyncer 
I0325 07:14:08.807877       1 base_controller.go:109] Starting #1 worker of LoggingSyncer controller ...
I0325 07:14:08.825304       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-descheduler-operator", Name:"cluster", UID:"59143a26-48dd-4abf-8efb-a77cc827b1df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ServiceMonitorCreated' Created ServiceMonitor.monitoring.coreos.com/v1 because it was missing
I0325 07:14:08.859009       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-descheduler-operator", Name:"cluster", UID:"59143a26-48dd-4abf-8efb-a77cc827b1df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'DeploymentUpdated' Updated Deployment.apps/cluster -n openshift-kube-descheduler-operator because it changed

Based on the above moving bug to verified state.

Comment 5 errata-xmlrpc 2021-07-27 22:55:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.