Created attachment 1528883 [details] there are metrics diagram for pod on master node Description of problem: Cloned from https://jira.coreos.com/browse/MON-552 Login cluster console with cluster admin, click "Workloads -> Pods", see the attached picture pod_on_master.png, node-exporter-2qfxz is on master node, so there are metrics diagram for this pod; see attached picture pod_on_worker.png, alertmanager-main-0 pods is on worker node, and there is not metrics diagram for this pod nodes $oc get node NAME STATUS ROLES AGE VERSION ip-10-0-138-64.us-east-2.compute.internal Ready worker 3h53m v1.12.4+bdfe8e3f3a ip-10-0-152-149.us-east-2.compute.internal Ready worker 3h53m v1.12.4+bdfe8e3f3a ip-10-0-171-54.us-east-2.compute.internal Ready worker 3h53m v1.12.4+bdfe8e3f3a ip-10-0-2-30.us-east-2.compute.internal Ready master 4h7m v1.12.4+bdfe8e3f3a ip-10-0-24-53.us-east-2.compute.internal Ready master 4h7m v1.12.4+bdfe8e3f3a ip-10-0-37-21.us-east-2.compute.internal Ready master 4h7m v1.12.4+bdfe8e3f3a $oc get pod -n openshift-monitoring -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE alertmanager-main-0 3/3 Running 0 5h7m 10.131.0.8 ip-10-0-138-64.us-east-2.compute.internal <none> alertmanager-main-1 3/3 Running 0 5h7m 10.129.2.7 ip-10-0-152-149.us-east-2.compute.internal <none> alertmanager-main-2 3/3 Running 0 5h7m 10.128.2.8 ip-10-0-171-54.us-east-2.compute.internal <none> cluster-monitoring-operator-8499bf9b58-m6dqk 1/1 Running 0 5h12m 10.129.0.22 ip-10-0-37-21.us-east-2.compute.internal <none> grafana-78765ddcc7-p4vhn 2/2 Running 0 5h11m 10.129.2.5 ip-10-0-152-149.us-east-2.compute.internal <none> kube-state-metrics-67479bfb84-dmpb9 3/3 Running 0 5h6m 10.129.2.8 ip-10-0-152-149.us-east-2.compute.internal <none> node-exporter-2qfxz 2/2 Running 0 5h6m 10.0.2.30 ip-10-0-2-30.us-east-2.compute.internal <none> node-exporter-496nm 2/2 Running 0 5h6m 10.0.152.149 ip-10-0-152-149.us-east-2.compute.internal <none> node-exporter-7v7sl 2/2 Running 0 5h6m 10.0.24.53 ip-10-0-24-53.us-east-2.compute.internal <none> node-exporter-jm9sp 2/2 Running 0 5h6m 10.0.171.54 ip-10-0-171-54.us-east-2.compute.internal <none> node-exporter-qrdlg 2/2 Running 0 5h6m 10.0.138.64 ip-10-0-138-64.us-east-2.compute.internal <none> node-exporter-vb7f5 2/2 Running 0 5h6m 10.0.37.21 ip-10-0-37-21.us-east-2.compute.internal <none> prometheus-adapter-78bd784f5d-n8xgd 1/1 Running 0 6m9s 10.128.2.11 ip-10-0-171-54.us-east-2.compute.internal <none> prometheus-k8s-0 6/6 Running 1 5h9m 10.128.2.7 ip-10-0-171-54.us-east-2.compute.internal <none> prometheus-k8s-1 6/6 Running 1 5h9m 10.129.2.6 ip-10-0-152-149.us-east-2.compute.internal <none> prometheus-operator-6cbfc9949-kcllm 1/1 Running 0 5h12m 10.129.2.3 ip-10-0-152-149.us-east-2.compute.internal <none> telemeter-client-6b7dd49d98-jxxrn 3/3 Running 0 5h6m 10.129.2.9 ip-10-0-152-149.us-east-2.compute.internal <none> Version-Release number of selected component (if applicable): Images: NAME DIGEST cluster-monitoring-operator sha256:dab9fb50d49b7f86f365f190051b62e00fa4f8fd95dd14e9e581b8f2a7c40bc3 configmap-reloader sha256:34d864ec23d52c2a7079c27b1f13042aea4c28f87040e16660c6110332b66793 grafana sha256:3c0ddf2f88e070acdd5276d31ef39f7e4dffdb005330cdcb4cdd6992acd27dbe k8s-prometheus-adapter sha256:227479bffec9dca3e3406a3ffef5a01292ab27e4517a7c49569f1c32c9600d42 kube-rbac-proxy sha256:fd602ef255d3bf8a4cdc5ae801fe165e173a6bb0a338310424b80b972bde9f20 kube-state-metrics sha256:e244502d4b00e95f5e68bcfa08b926ced8e874b5afc6a002372f9bd53862a96f prom-label-proxy sha256:8e188e8623daa9bcdadd0b2b815bd7a88c8087891101a62ffbad18618a097404 prometheus sha256:ecfdeea05d7d005e53cbd3ff1bc9c1b543ef14becf88bbba67affef045705037 prometheus-alertmanager sha256:c8a562dc7304a89128d47a852c96406d27c98b9eb7818b89992c022b14b08d6c prometheus-config-reloader sha256:0454a7e3d5bdcdaf77483e20c4776decff3dfa19a41e6b628511635c8c3c2458 prometheus-node-exporter sha256:1e179d8f99f88247bcca8e3c0628d3e5c18878b24c7f0803a72498236694bed1 prometheus-operator sha256:fc5aa7d371096afc4580fc5c5081868c2fcad0ec129229bc23feb54145a How reproducible: Always Steps to Reproduce: 1. Login cluster console with cluster admin, click "Workloads -> Pods" 2. 3. Actual results: There is not metrics diagram for pods on worker nodes Expected results: Should show metrics diagram for pods on worker and master nodes Additional info:
Created attachment 1528884 [details] there is not metrics diagram for pod on worker node
Validated this bug. Prometheus is not able to scrape kubelets on worker nodes (`Get https://xxx.xxx.xxx:10250/metrics/cadvisor: x509: certificate signed by unknown authority`). Thereby it is not able to retrieve `container_*` metrics of pods running on container nodes. > we need Prometheus to trust the rotating CA the kube-controller-manager is using to sign CSRs in the cluster (namely the kubelet server certs) > see https://github.com/openshift/cluster-kube-controller-manager-operator/pull/132 @sjenning given the recent discussions on Slack, can you advice on future steps here?
It also affects the diagram in grafana, if the pods are in worker node, since the x509 error, there is not CPU/Memory usage from grafana.
Created attachment 1533989 [details] there is not CPU/Memory diagram for pod on worker node from grafana UI
*** Bug 1674361 has been marked as a duplicate of this bug. ***
*** This bug has been marked as a duplicate of bug 1674372 ***