1674047 – [RFE] - Request to add "node" metadata to "kublet" job in Prometheus in order to scrape data based on node hostname

Bug 1674047 - [RFE] - Request to add "node" metadata to "kublet" job in Prometheus in order to scrape data based on node hostname

Summary: [RFE] - Request to add "node" metadata to "kublet" job in Prometheus in order...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	3.11.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Frederic Branczyk
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-02-08 21:17 UTC by Tony Garcia
Modified:	2020-03-09 11:37 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-04 10:42:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:43:51 UTC

Description Tony Garcia 2019-02-08 21:17:38 UTC

Description of problem:
Prior to upgrading to OCP 3.11, Prometheus, it had a scrape job called “kubernetes-nodes” and metrics from this scrape job had a “node” metadata entry that we use to get metrics based on cluster node. After upgrading to OCP 3.11, it looks like these metrics are now being scraped by a job called “kubelet”, but the job does not have the “node” metadata. 

We have a consultant onsite that was was able to modify the dashboards to go after the “instance” metadata, but this uses/displays IP address. However, the customer would prefer to have node host-names to be used/shown.

Version-Release number of selected component (if applicable):
3.11

Additional info:
This is the query the consultant is using to check on how much docker storage is being used per cluster node. Basically, they want to use “node” instead “instance” is:


sum(container_fs_usage_bytes{id="/",device=~"rootvg-docker--pool",instance=~".*"}) by (instance) / sum (container_fs_limit_bytes{id="/",device=~"rootvg-docker--pool",instance=~".*"}) by (instance) * 100

Comment 1 minden 2019-02-11 14:56:09 UTC

This might be related to a change in the Prometheus Operator. Would you mind sharing the Prometheus configuration. You can find it in the openshift-monitoring namespace as a secret with the name `prometheus-k8s`. Please make sure to remove all secrets beforehand and share it in private, just in case.

In addition joining the metric with metrics exposed by kube-state-metrics about the nodes might help [1].

[1] https://prometheus.io/docs/prometheus/latest/querying/operators/#vector-matching

Comment 4 Frederic Branczyk 2019-02-19 16:10:26 UTC

Marking this as modified as this is available as requested in 4.0, for the time being vector joining in combination with the `label_replace` function are the way to go.

Comment 6 Junqi Zhao 2019-03-01 09:18:15 UTC

search container_fs_usage_bytes from prometheus, "node" is added, and name is node Hostname
eg

            {
                "metric": {
                    "__name__": "container_fs_usage_bytes",
                    "container_name": "kube-rbac-proxy-main",
                    "device": "/dev/xvda2",
                    "endpoint": "https-metrics",
                    "id": "/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod01a39bbd_3bc9_11e9_ba8e_06ead38c04ac.slice/crio-2686d03fcf60c44f669c31dc58333e25d71236e183b1489aec89a445ef70f310.scope",
                    "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b6de05167ecab0472279cdc430105fac4b97fb2c43d854e1c1aa470d20a36572",
                    "instance": "10.0.134.117:10250",
                    "job": "kubelet",
                    "name": "k8s_kube-rbac-proxy-main_kube-state-metrics-769cdfcd87-bhwh4_openshift-monitoring_01a39bbd-3bc9-11e9-ba8e-06ead38c04ac_0",
                    "namespace": "openshift-monitoring",
                    "node": "ip-10-0-134-117.us-east-2.compute.internal",
                    "pod_name": "kube-state-metrics-769cdfcd87-bhwh4",
                    "service": "kubelet"
                },
                "value": [
                    1551421151.549,
                    "4096"
                ]
            },

# oc describe node | grep Hostname
  Hostname:     ip-10-0-134-117.us-east-2.compute.internal
  Hostname:     ip-10-0-136-27.us-east-2.compute.internal
  Hostname:     ip-10-0-147-13.us-east-2.compute.internal
  Hostname:     ip-10-0-152-104.us-east-2.compute.internal
  Hostname:     ip-10-0-168-118.us-east-2.compute.internal
  Hostname:     ip-10-0-174-80.us-east-2.compute.internal

payload: 4.0.0-0.nightly-2019-02-27-213933

Comment 10 errata-xmlrpc 2019-06-04 10:42:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.