Description of problem: Deploy cluster monitoring, kube-state-metrics pod/service/deployment/replicaset are not created # kubectl -n openshift-monitoring get all NAME READY STATUS RESTARTS AGE pod/alertmanager-main-0 3/3 Running 0 1h pod/alertmanager-main-1 3/3 Running 0 1h pod/alertmanager-main-2 3/3 Running 0 1h pod/cluster-monitoring-operator-6d7c9f5759-256xw 1/1 Running 0 1h pod/grafana-7476cc5c4b-pkxpb 2/2 Running 0 1h pod/node-exporter-42n9x 2/2 Running 0 1h pod/node-exporter-9s8b9 2/2 Running 0 1h pod/node-exporter-wxr46 2/2 Running 0 1h pod/prometheus-k8s-0 4/4 Running 1 1h pod/prometheus-k8s-1 4/4 Running 1 1h pod/prometheus-operator-7cbb8d577f-zk9w4 1/1 Running 0 1h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/alertmanager-main ClusterIP 172.30.187.172 <none> 9094/TCP 1h service/alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 1h service/cluster-monitoring-operator ClusterIP None <none> 8080/TCP 1h service/grafana ClusterIP 172.30.142.91 <none> 3000/TCP 1h service/node-exporter ClusterIP None <none> 9100/TCP 1h service/prometheus-k8s ClusterIP 172.30.87.55 <none> 9091/TCP 1h service/prometheus-operated ClusterIP None <none> 9090/TCP 1h service/prometheus-operator ClusterIP None <none> 8080/TCP 1h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/node-exporter 3 3 3 3 3 beta.kubernetes.io/os=linux 1h NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-monitoring-operator 1 1 1 1 1h deployment.apps/grafana 1 1 1 1 1h deployment.apps/prometheus-operator 1 1 1 1 1h NAME DESIRED CURRENT READY AGE replicaset.apps/cluster-monitoring-operator-6d7c9f5759 1 1 1 1h replicaset.apps/grafana-7476cc5c4b 1 1 1 1h replicaset.apps/prometheus-operator-7cbb8d577f 1 1 1 1h NAME DESIRED CURRENT AGE statefulset.apps/alertmanager-main 3 3 1h statefulset.apps/prometheus-k8s 2 2 1h NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route.route.openshift.io/alertmanager-main alertmanager-main-openshift-monitoring.apps.0816-xud.qe.rhcloud.com alertmanager-main web reencrypt None route.route.openshift.io/grafana grafana-openshift-monitoring.apps.0816-xud.qe.rhcloud.com grafana https reencrypt None route.route.openshift.io/prometheus-k8s prometheus-k8s-openshift-monitoring.apps.0816-xud.qe.rhcloud.com prometheus-k8s web reencrypt None Version-Release number of selected component (if applicable): ose-prometheus-operator:v3.11.0-0.16.0.0 # openshift version openshift v3.11.0-0.16.0 How reproducible: Always Steps to Reproduce: 1. Deploy cluster monitoring 2. 3. Actual results: no kube-state-metrics pod is created Expected results: Should have kube-state-metrics pod Additional info: # parameters openshift_cluster_monitoring_operator_install=true openshift_cluster_monitoring_operator_node_selector={'role': 'node'}
Created attachment 1476373 [details] No CPU/Memory/IO data
Could you share the logs of the cluster-monitoring-operator Pod? Thanks!
cluster-monitoring-operator failed task Updating kube-state-metrics //cluster-monitoring-operator logs I0816 08:22:58.371739 1 tasks.go:37] running task Updating kube-state-metrics I0816 08:22:58.371807 1 decoder.go:224] decoding stream as YAML I0816 08:22:58.379803 1 decoder.go:224] decoding stream as YAML I0816 08:22:58.467631 1 decoder.go:224] decoding stream as YAML E0816 08:22:58.482241 1 operator.go:206] Syncing "openshift-monitoring/cluster-monitoring-config" failed E0816 08:22:58.482266 1 operator.go:207] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating kube-state-metrics failed: reconciling kube-state-metrics ClusterRole failed: creating ClusterRole object failed: clusterroles.rbac.authorization.k8s.io "kube-state-metrics" is forbidden: attempt to grant extra privileges: [{[list] [apps] [replicasets] [] []} {[watch] [apps] [replicasets] [] []}] user=&{system:serviceaccount:openshift-monitoring:cluster-monitoring-operator 21b2106a-a11b-11e8-8686-0eee3f410034 [system:serviceaccounts system:serviceaccounts:openshift-monitoring system:authenticated] map[]} ownerrules=[{[get] [ user.openshift.io] [users] [~] []} {[list] [ project.openshift.io] [projectrequests] [] []} {[get list] [ authorization.openshift.io] [clusterroles] [] []} {[get list watch] [rbac.authorization.k8s.io] [clusterroles] [] []} {[get list] [storage.k8s.io] [storageclasses] [] []} {[list watch] [ project.openshift.io] [projects] [] []} {[create] [ authorization.openshift.io] [selfsubjectrulesreviews] [] []} {[create] [authorization.k8s.io] [selfsubjectaccessreviews] [] []} {[create get list watch update delete] [rbac.authorization.k8s.io] [roles rolebindings clusterroles clusterrolebindings] [] []} {[create get list watch update delete] [] [serviceaccounts] [] []} {[create get list watch update delete] [apps] [deployments daemonsets] [] []} {[create get list watch update delete] [route.openshift.io] [routes] [] []} {[create get list watch update delete] [security.openshift.io] [securitycontextconstraints] [] []} {[create] [authentication.k8s.io] [tokenreviews] [] []} {[create] [authorization.k8s.io] [subjectaccessreviews] [] []} {[list watch] [] [nodes pods services resourcequotas replicationcontrollers limitranges persistentvolumeclaims persistentvolumes namespaces endpoints] [] []} {[list watch] [extensions] [daemonsets deployments replicasets] [] []} {[list watch] [apps] [statefulsets] [] []} {[list watch] [batch] [cronjobs jobs] [] []} {[list watch] [autoscaling] [horizontalpodautoscalers] [] []} {[create] [authentication.k8s.io] [tokenreviews] [] []} {[create] [authorization.k8s.io] [subjectaccessreviews] [] []} {[get] [] [pods] [] []} {[get update] [extensions] [deployments] [kube-state-metrics] []} {[create] [authentication.k8s.io] [tokenreviews] [] []} {[create] [authorization.k8s.io] [subjectaccessreviews] [] []} {[get] [] [] [] [/metrics]} {[create] [authentication.k8s.io] [tokenreviews] [] []} {[create] [authorization.k8s.io] [subjectaccessreviews] [] []} {[get] [] [namespaces nodes/metrics] [] []} {[get list watch] [] [nodes services endpoints pods] [] []} {[get] [] [configmaps] [] []} {[*] [extensions] [thirdpartyresources] [] []} {[*] [apiextensions.k8s.io] [customresourcedefinitions] [] []} {[*] [monitoring.coreos.com] [alertmanagers prometheuses prometheuses/finalizers alertmanagers/finalizers servicemonitors prometheusrules] [] []} {[*] [apps] [statefulsets] [] []} {[*] [] [configmaps secrets] [] []} {[list delete] [] [pods] [] []} {[get create update] [] [services endpoints] [] []} {[list watch] [] [nodes] [] []} {[list] [] [namespaces] [] []} {[get] [] [] [] [/healthz /healthz/*]} {[get] [] [] [] [/version /version/* /api /api/* /apis /apis/* /oapi /oapi/* /openapi/v2 /swaggerapi /swaggerapi/* /swagger.json /swagger-2.0.0.pb-v1 /osapi /osapi/ /.well-known /.well-known/* /]} {[create] [ authorization.openshift.io] [selfsubjectrulesreviews] [] []} {[create] [authorization.k8s.io] [selfsubjectaccessreviews] [] []} {[list watch get] [servicecatalog.k8s.io] [clusterserviceclasses clusterserviceplans] [] []} {[create] [authorization.k8s.io] [selfsubjectaccessreviews selfsubjectrulesreviews] [] []} {[create] [ build.openshift.io] [builds/docker builds/optimizeddocker] [] []} {[create] [ build.openshift.io] [builds/jenkinspipeline] [] []} {[create] [ build.openshift.io] [builds/source] [] []} {[get] [] [] [] [/version /version/* /api /api/* /apis /apis/* /oapi /oapi/* /openapi/v2 /swaggerapi /swaggerapi/* /swagger.json /swagger-2.0.0.pb-v1 /osapi /osapi/ /.well-known /.well-known/* /]} {[get] [] [] [] [/version /version/* /api /api/* /apis /apis/* /oapi /oapi/* /openapi/v2 /swaggerapi /swaggerapi/* /swagger.json /swagger-2.0.0.pb-v1 /osapi /osapi/ /.well-known /.well-known/* /]} {[delete] [ oauth.openshift.io] [oauthaccesstokens oauthauthorizetokens] [] []} {[impersonate] [authentication.k8s.io] [userextras/scopes.authorization.openshift.io] [] []} {[create get] [ build.openshift.io] [buildconfigs/webhooks] [] []}] ruleResolutionErrors=[]
The pull request to fix this was merged: https://github.com/openshift/openshift-ansible/pull/9626
issue is fixed # oc get pod | grep kube-state-metrics kube-state-metrics-776f9667b-dzmsz 3/3 Running 0 7m But in grafana UI, instance is not listed under Nodes
Created attachment 1476606 [details] no instance listed under Nodes and CPU/Memory/Dish data is empty
This defect could be set to ON_QA, issues mentioned in Comment 5 - Comment 6 is in Bug 1619132
issue is fixed cluster monitorning component images version: v3.11.0-0.17.0.0 # openshift version openshift v3.11.0-0.17.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652