Bug 1674368 - There is not metrics diagram for pods on worker nodes
Summary: There is not metrics diagram for pods on worker nodes
Keywords:
Status: CLOSED DUPLICATE of bug 1674372
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.1.0
Assignee: Frederic Branczyk
QA Contact: Junqi Zhao
URL:
Whiteboard:
: 1674361 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-11 07:13 UTC by Junqi Zhao
Modified: 2019-03-12 14:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-12 14:34:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
there are metrics diagram for pod on master node (209.12 KB, image/png)
2019-02-11 07:13 UTC, Junqi Zhao
no flags Details
there is not metrics diagram for pod on worker node (181.73 KB, image/png)
2019-02-11 07:14 UTC, Junqi Zhao
no flags Details
there is not CPU/Memory diagram for pod on worker node from grafana UI (60.42 KB, image/png)
2019-02-12 10:33 UTC, Junqi Zhao
no flags Details

Description Junqi Zhao 2019-02-11 07:13:05 UTC
Created attachment 1528883 [details]
there are metrics diagram for pod on master node

Description of problem:
Cloned from https://jira.coreos.com/browse/MON-552

Login cluster console with cluster admin, click "Workloads -> Pods", see the attached picture pod_on_master.png, node-exporter-2qfxz is on master node, so there are metrics diagram for this pod; see attached picture pod_on_worker.png, alertmanager-main-0 pods is on worker node, and there is not metrics diagram for this pod


nodes
$oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-138-64.us-east-2.compute.internal Ready worker 3h53m v1.12.4+bdfe8e3f3a
ip-10-0-152-149.us-east-2.compute.internal Ready worker 3h53m v1.12.4+bdfe8e3f3a
ip-10-0-171-54.us-east-2.compute.internal Ready worker 3h53m v1.12.4+bdfe8e3f3a
ip-10-0-2-30.us-east-2.compute.internal Ready master 4h7m v1.12.4+bdfe8e3f3a
ip-10-0-24-53.us-east-2.compute.internal Ready master 4h7m v1.12.4+bdfe8e3f3a
ip-10-0-37-21.us-east-2.compute.internal Ready master 4h7m v1.12.4+bdfe8e3f3a

$oc get pod -n openshift-monitoring -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
alertmanager-main-0 3/3 Running 0 5h7m 10.131.0.8 ip-10-0-138-64.us-east-2.compute.internal <none>
alertmanager-main-1 3/3 Running 0 5h7m 10.129.2.7 ip-10-0-152-149.us-east-2.compute.internal <none>
alertmanager-main-2 3/3 Running 0 5h7m 10.128.2.8 ip-10-0-171-54.us-east-2.compute.internal <none>
cluster-monitoring-operator-8499bf9b58-m6dqk 1/1 Running 0 5h12m 10.129.0.22 ip-10-0-37-21.us-east-2.compute.internal <none>
grafana-78765ddcc7-p4vhn 2/2 Running 0 5h11m 10.129.2.5 ip-10-0-152-149.us-east-2.compute.internal <none>
kube-state-metrics-67479bfb84-dmpb9 3/3 Running 0 5h6m 10.129.2.8 ip-10-0-152-149.us-east-2.compute.internal <none>
node-exporter-2qfxz 2/2 Running 0 5h6m 10.0.2.30 ip-10-0-2-30.us-east-2.compute.internal <none>
node-exporter-496nm 2/2 Running 0 5h6m 10.0.152.149 ip-10-0-152-149.us-east-2.compute.internal <none>
node-exporter-7v7sl 2/2 Running 0 5h6m 10.0.24.53 ip-10-0-24-53.us-east-2.compute.internal <none>
node-exporter-jm9sp 2/2 Running 0 5h6m 10.0.171.54 ip-10-0-171-54.us-east-2.compute.internal <none>
node-exporter-qrdlg 2/2 Running 0 5h6m 10.0.138.64 ip-10-0-138-64.us-east-2.compute.internal <none>
node-exporter-vb7f5 2/2 Running 0 5h6m 10.0.37.21 ip-10-0-37-21.us-east-2.compute.internal <none>
prometheus-adapter-78bd784f5d-n8xgd 1/1 Running 0 6m9s 10.128.2.11 ip-10-0-171-54.us-east-2.compute.internal <none>
prometheus-k8s-0 6/6 Running 1 5h9m 10.128.2.7 ip-10-0-171-54.us-east-2.compute.internal <none>
prometheus-k8s-1 6/6 Running 1 5h9m 10.129.2.6 ip-10-0-152-149.us-east-2.compute.internal <none>
prometheus-operator-6cbfc9949-kcllm 1/1 Running 0 5h12m 10.129.2.3 ip-10-0-152-149.us-east-2.compute.internal <none>
telemeter-client-6b7dd49d98-jxxrn 3/3 Running 0 5h6m 10.129.2.9 ip-10-0-152-149.us-east-2.compute.internal <none>

Version-Release number of selected component (if applicable):
 Images:
NAME DIGEST
cluster-monitoring-operator sha256:dab9fb50d49b7f86f365f190051b62e00fa4f8fd95dd14e9e581b8f2a7c40bc3
configmap-reloader sha256:34d864ec23d52c2a7079c27b1f13042aea4c28f87040e16660c6110332b66793
grafana sha256:3c0ddf2f88e070acdd5276d31ef39f7e4dffdb005330cdcb4cdd6992acd27dbe
k8s-prometheus-adapter sha256:227479bffec9dca3e3406a3ffef5a01292ab27e4517a7c49569f1c32c9600d42
kube-rbac-proxy sha256:fd602ef255d3bf8a4cdc5ae801fe165e173a6bb0a338310424b80b972bde9f20
kube-state-metrics sha256:e244502d4b00e95f5e68bcfa08b926ced8e874b5afc6a002372f9bd53862a96f
prom-label-proxy sha256:8e188e8623daa9bcdadd0b2b815bd7a88c8087891101a62ffbad18618a097404
prometheus sha256:ecfdeea05d7d005e53cbd3ff1bc9c1b543ef14becf88bbba67affef045705037
prometheus-alertmanager sha256:c8a562dc7304a89128d47a852c96406d27c98b9eb7818b89992c022b14b08d6c
prometheus-config-reloader sha256:0454a7e3d5bdcdaf77483e20c4776decff3dfa19a41e6b628511635c8c3c2458
prometheus-node-exporter sha256:1e179d8f99f88247bcca8e3c0628d3e5c18878b24c7f0803a72498236694bed1
prometheus-operator sha256:fc5aa7d371096afc4580fc5c5081868c2fcad0ec129229bc23feb54145a

How reproducible:
Always

Steps to Reproduce:
1. Login cluster console with cluster admin, click "Workloads -> Pods"
2.
3.

Actual results:
There is not metrics diagram for pods on worker nodes

Expected results:
Should show metrics diagram for pods on worker and master nodes

Additional info:

Comment 1 Junqi Zhao 2019-02-11 07:14:11 UTC
Created attachment 1528884 [details]
there is not metrics diagram for pod on worker node

Comment 2 minden 2019-02-12 10:20:03 UTC
Validated this bug. Prometheus is not able to scrape kubelets on worker nodes (`Get https://xxx.xxx.xxx:10250/metrics/cadvisor: x509: certificate signed by unknown authority`). Thereby it is not able to retrieve `container_*` metrics of pods running on container nodes.

> we need Prometheus to trust the rotating CA the kube-controller-manager is using to sign CSRs in the cluster (namely the kubelet server certs)
> see https://github.com/openshift/cluster-kube-controller-manager-operator/pull/132

@sjenning given the recent discussions on Slack, can you advice on future steps here?

Comment 3 Junqi Zhao 2019-02-12 10:33:06 UTC
It also affects the diagram in grafana, if the pods are in worker node, since the x509 error, there is not CPU/Memory usage from grafana.

Comment 4 Junqi Zhao 2019-02-12 10:33:46 UTC
Created attachment 1533989 [details]
there is not CPU/Memory diagram for pod on worker node from grafana UI

Comment 5 minden 2019-02-12 12:39:57 UTC
*** Bug 1674361 has been marked as a duplicate of this bug. ***

Comment 6 Seth Jennings 2019-02-12 14:34:33 UTC

*** This bug has been marked as a duplicate of bug 1674372 ***


Note You need to log in before you can comment on or make changes to this bug.