Bug 1674047

Summary: [RFE] - Request to add "node" metadata to "kublet" job in Prometheus in order to scrape data based on node hostname
Product: OpenShift Container Platform Reporter: Tony Garcia <antgarci>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: antgarci, juzhao, mloibl, surbania
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:42:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tony Garcia 2019-02-08 21:17:38 UTC
Description of problem:
Prior to upgrading to OCP 3.11, Prometheus, it had a scrape job called “kubernetes-nodes” and metrics from this scrape job had a “node” metadata entry that we use to get metrics based on cluster node. After upgrading to OCP 3.11, it looks like these metrics are now being scraped by a job called “kubelet”, but the job does not have the “node” metadata. 

We have a consultant onsite that was was able to modify the dashboards to go after the “instance” metadata, but this uses/displays IP address. However, the customer would prefer to have node host-names to be used/shown.

Version-Release number of selected component (if applicable):
3.11

Additional info:
This is the query the consultant is using to check on how much docker storage is being used per cluster node. Basically, they want to use “node” instead “instance” is:


sum(container_fs_usage_bytes{id="/",device=~"rootvg-docker--pool",instance=~".*"}) by (instance) / sum (container_fs_limit_bytes{id="/",device=~"rootvg-docker--pool",instance=~".*"}) by (instance) * 100

Comment 1 minden 2019-02-11 14:56:09 UTC
This might be related to a change in the Prometheus Operator. Would you mind sharing the Prometheus configuration. You can find it in the openshift-monitoring namespace as a secret with the name `prometheus-k8s`. Please make sure to remove all secrets beforehand and share it in private, just in case.

In addition joining the metric with metrics exposed by kube-state-metrics about the nodes might help [1].

[1] https://prometheus.io/docs/prometheus/latest/querying/operators/#vector-matching

Comment 4 Frederic Branczyk 2019-02-19 16:10:26 UTC
Marking this as modified as this is available as requested in 4.0, for the time being vector joining in combination with the `label_replace` function are the way to go.

Comment 6 Junqi Zhao 2019-03-01 09:18:15 UTC
search container_fs_usage_bytes from prometheus, "node" is added, and name is node Hostname
eg

            {
                "metric": {
                    "__name__": "container_fs_usage_bytes",
                    "container_name": "kube-rbac-proxy-main",
                    "device": "/dev/xvda2",
                    "endpoint": "https-metrics",
                    "id": "/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod01a39bbd_3bc9_11e9_ba8e_06ead38c04ac.slice/crio-2686d03fcf60c44f669c31dc58333e25d71236e183b1489aec89a445ef70f310.scope",
                    "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b6de05167ecab0472279cdc430105fac4b97fb2c43d854e1c1aa470d20a36572",
                    "instance": "10.0.134.117:10250",
                    "job": "kubelet",
                    "name": "k8s_kube-rbac-proxy-main_kube-state-metrics-769cdfcd87-bhwh4_openshift-monitoring_01a39bbd-3bc9-11e9-ba8e-06ead38c04ac_0",
                    "namespace": "openshift-monitoring",
                    "node": "ip-10-0-134-117.us-east-2.compute.internal",
                    "pod_name": "kube-state-metrics-769cdfcd87-bhwh4",
                    "service": "kubelet"
                },
                "value": [
                    1551421151.549,
                    "4096"
                ]
            },

# oc describe node | grep Hostname
  Hostname:     ip-10-0-134-117.us-east-2.compute.internal
  Hostname:     ip-10-0-136-27.us-east-2.compute.internal
  Hostname:     ip-10-0-147-13.us-east-2.compute.internal
  Hostname:     ip-10-0-152-104.us-east-2.compute.internal
  Hostname:     ip-10-0-168-118.us-east-2.compute.internal
  Hostname:     ip-10-0-174-80.us-east-2.compute.internal

payload: 4.0.0-0.nightly-2019-02-27-213933

Comment 10 errata-xmlrpc 2019-06-04 10:42:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758