Description of problem: cluster-kube-descheduler operator is broken when upgraded from 4.7 to 4.8. Below error is seen in the operator logs E0323 09:30:41.391229 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 09:35:12.639124 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 09:36:09.085741 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 09:45:12.636708 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 09:52:49.099532 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 09:55:12.636796 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:05:12.636712 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:09:29.113344 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:15:12.636566 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:25:12.636193 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:26:09.127975 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:35:12.637483 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:42:49.141855 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:45:12.636575 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" Version-Release number of selected component (if applicable): [knarra@knarra ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-03-22-025559 True False 3h52m Cluster version is 4.7.0-0.nightly-2021-03-22-025559 How reproducible: Always Steps to Reproduce: 1. Install latest 4.7 cluster 2. Edit the subscription, change the channel to 4.8 & starting csv to the one in 4.8 3. wait for the descheduler operator to respin Actual results: Descheduler operator gets respinned but below appear in the operator log which indicates that operator is broken E0323 10:15:12.636566 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:25:12.636193 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:26:09.127975 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:35:12.637483 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:42:49.141855 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" E0323 10:45:12.636575 1 target_config_reconciler.go:349] key failed with : roles.rbac.authorization.k8s.io "prometheus-k8s" is forbidden: User "system:serviceaccount:openshift-kube-descheduler-operator:openshift-descheduler" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "openshift-kube-descheduler-operator" Expected results: No errors should be thrown related to rbac. Additional info: Looks like this happens only on 4.8 as there is a recent code change which added support for metrics and forgot to add additional rbac rules
Verified with build below and i see that there are no errors present in the descheduler operator logs when 4.7 upgraded to 4.8 [knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.8.0-202103241237.p0 Kube Descheduler Operator 4.8.0-202103241237.p0 clusterkubedescheduleroperator.4.7.0-202103060100.p0 Succeeded Below are the steps followed to verify the bugs: ================================================ 1) Install 4.7 cluster 2) Edit subscription, change the channel to 4.8, set starting CSV to 4.8.0-202103241237.p0 and change source to qe-app-registry 3) Now we can see that descheduler operator gets upgraded to 4.8 4) verify descheduler operator pod & cluster pod logs to make sure that there are no errors present. [knarra@knarra ~]$ oc get pods -n openshift-kube-descheduler-operator NAME READY STATUS RESTARTS AGE cluster-55789d67fd-c4hxs 1/1 Running 0 28s descheduler-operator-5b48df6849-b68x2 1/1 Running 0 91s [knarra@knarra ~]$ [knarra@knarra ~]$ oc get sub No resources found in default namespace. [knarra@knarra ~]$ oc get sub -n openshift-kube-descheduler-operator NAME PACKAGE SOURCE CHANNEL cluster-kube-descheduler-operator cluster-kube-descheduler-operator redhat-operators 4.7 [knarra@knarra ~]$ oc edit sub cluster-kube-descheduler-operator -n openshift-kube-descheduler-operator subscription.operators.coreos.com/cluster-kube-descheduler-operator edited [knarra@knarra ~]$ oc get sub No resources found in default namespace. [knarra@knarra ~]$ oc get sub -n openshift-kube-descheduler-operator NAME PACKAGE SOURCE CHANNEL cluster-kube-descheduler-operator cluster-kube-descheduler-operator qe-app-registry 4.8 [knarra@knarra ~]$ oc get ip -n openshift-kube-descheduler-operator NAME CSV APPROVAL APPROVED install-5srs7 clusterkubedescheduleroperator.4.8.0-202103241237.p0 Automatic true install-x5gzg clusterkubedescheduleroperator.4.7.0-202103060100.p0 Automatic true [knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.7.0-202103060100.p0 Kube Descheduler Operator 4.7.0-202103060100.p0 Replacing clusterkubedescheduleroperator.4.8.0-202103241237.p0 Kube Descheduler Operator 4.8.0-202103241237.p0 clusterkubedescheduleroperator.4.7.0-202103060100.p0 Installing [knarra@knarra ~]$ oc get pods -n openshift-kube-descheduler-operator NAME READY STATUS RESTARTS AGE cluster-55789d67fd-c4hxs 1/1 Running 0 2m5s descheduler-operator-79f74d86d7-ds9vl 1/1 Running 0 22s [knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.8.0-202103241237.p0 Kube Descheduler Operator 4.8.0-202103241237.p0 clusterkubedescheduleroperator.4.7.0-202103060100.p0 Succeeded [knarra@knarra ~]$ oc get ip -n openshift-kube-descheduler-operator NAME CSV APPROVAL APPROVED install-5srs7 clusterkubedescheduleroperator.4.8.0-202103241237.p0 Automatic true install-x5gzg clusterkubedescheduleroperator.4.7.0-202103060100.p0 Automatic true [knarra@knarra ~]$ oc get sub -n openshift-kube-descheduler-operator NAME PACKAGE SOURCE CHANNEL cluster-kube-descheduler-operator cluster-kube-descheduler-operator qe-app-registry 4.8 operator logs: =================== [knarra@knarra ~]$ oc logs -f descheduler-operator-79f74d86d7-ds9vl -n openshift-kube-descheduler-operator W0325 07:13:47.018815 1 cmd.go:204] Using insecure, self-signed certificates I0325 07:13:47.364362 1 observer_polling.go:159] Starting file observer I0325 07:13:47.392110 1 builder.go:238] openshift-cluster-kube-descheduler-operator version - W0325 07:13:47.709899 1 secure_serving.go:69] Use of insecure cipher 'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256' detected. W0325 07:13:47.709917 1 secure_serving.go:69] Use of insecure cipher 'TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256' detected. I0325 07:13:47.712293 1 leaderelection.go:243] attempting to acquire leader lease openshift-kube-descheduler-operator/openshift-cluster-kube-descheduler-operator-lock... I0325 07:13:47.714475 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I0325 07:13:47.714489 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController I0325 07:13:47.714511 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0325 07:13:47.714516 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0325 07:13:47.714529 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0325 07:13:47.714534 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0325 07:13:47.714803 1 secure_serving.go:197] Serving securely on [::]:8443 I0325 07:13:47.714889 1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/serving-cert-123549745/tls.crt::/tmp/serving-cert-123549745/tls.key I0325 07:13:47.714910 1 tlsconfig.go:240] Starting DynamicServingCertificateController I0325 07:13:47.815370 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController I0325 07:13:47.815393 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0325 07:13:47.815370 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0325 07:14:08.697436 1 leaderelection.go:253] successfully acquired lease openshift-kube-descheduler-operator/openshift-cluster-kube-descheduler-operator-lock I0325 07:14:08.697799 1 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-kube-descheduler-operator", Name:"openshift-cluster-kube-descheduler-operator-lock", UID:"4e114cbd-c8dc-4dfb-a65c-8ed1669b7ca5", APIVersion:"v1", ResourceVersion:"51125", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' descheduler-operator-79f74d86d7-ds9vl_2172be2a-6156-487b-96b8-f22b049e957d became leader I0325 07:14:08.707558 1 starter.go:65] Starting informers I0325 07:14:08.707576 1 starter.go:69] Starting log level controller I0325 07:14:08.707583 1 starter.go:71] Starting target config reconciler I0325 07:14:08.707598 1 target_config_reconciler.go:322] Starting TargetConfigReconciler I0325 07:14:08.707685 1 base_controller.go:66] Waiting for caches to sync for LoggingSyncer I0325 07:14:08.743528 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-descheduler-operator", Name:"cluster", UID:"59143a26-48dd-4abf-8efb-a77cc827b1df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ServiceCreated' Created Service/metrics -n openshift-kube-descheduler-operator because it was missing I0325 07:14:08.771826 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-descheduler-operator", Name:"cluster", UID:"59143a26-48dd-4abf-8efb-a77cc827b1df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RoleCreated' Created Role.rbac.authorization.k8s.io/prometheus-k8s -n openshift-kube-descheduler-operator because it was missing I0325 07:14:08.803744 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-descheduler-operator", Name:"cluster", UID:"59143a26-48dd-4abf-8efb-a77cc827b1df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RoleBindingCreated' Created RoleBinding.rbac.authorization.k8s.io/prometheus-k8s -n openshift-kube-descheduler-operator because it was missing I0325 07:14:08.807861 1 base_controller.go:72] Caches are synced for LoggingSyncer I0325 07:14:08.807877 1 base_controller.go:109] Starting #1 worker of LoggingSyncer controller ... I0325 07:14:08.825304 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-descheduler-operator", Name:"cluster", UID:"59143a26-48dd-4abf-8efb-a77cc827b1df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ServiceMonitorCreated' Created ServiceMonitor.monitoring.coreos.com/v1 because it was missing I0325 07:14:08.859009 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-descheduler-operator", Name:"cluster", UID:"59143a26-48dd-4abf-8efb-a77cc827b1df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'DeploymentUpdated' Updated Deployment.apps/cluster -n openshift-kube-descheduler-operator because it changed Based on the above moving bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438