Bug 1674047
| Summary: | [RFE] - Request to add "node" metadata to "kublet" job in Prometheus in order to scrape data based on node hostname | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Tony Garcia <antgarci> |
| Component: | Monitoring | Assignee: | Frederic Branczyk <fbranczy> |
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.11.0 | CC: | antgarci, juzhao, mloibl, surbania |
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:42:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This might be related to a change in the Prometheus Operator. Would you mind sharing the Prometheus configuration. You can find it in the openshift-monitoring namespace as a secret with the name `prometheus-k8s`. Please make sure to remove all secrets beforehand and share it in private, just in case. In addition joining the metric with metrics exposed by kube-state-metrics about the nodes might help [1]. [1] https://prometheus.io/docs/prometheus/latest/querying/operators/#vector-matching Marking this as modified as this is available as requested in 4.0, for the time being vector joining in combination with the `label_replace` function are the way to go. search container_fs_usage_bytes from prometheus, "node" is added, and name is node Hostname
eg
{
"metric": {
"__name__": "container_fs_usage_bytes",
"container_name": "kube-rbac-proxy-main",
"device": "/dev/xvda2",
"endpoint": "https-metrics",
"id": "/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod01a39bbd_3bc9_11e9_ba8e_06ead38c04ac.slice/crio-2686d03fcf60c44f669c31dc58333e25d71236e183b1489aec89a445ef70f310.scope",
"image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b6de05167ecab0472279cdc430105fac4b97fb2c43d854e1c1aa470d20a36572",
"instance": "10.0.134.117:10250",
"job": "kubelet",
"name": "k8s_kube-rbac-proxy-main_kube-state-metrics-769cdfcd87-bhwh4_openshift-monitoring_01a39bbd-3bc9-11e9-ba8e-06ead38c04ac_0",
"namespace": "openshift-monitoring",
"node": "ip-10-0-134-117.us-east-2.compute.internal",
"pod_name": "kube-state-metrics-769cdfcd87-bhwh4",
"service": "kubelet"
},
"value": [
1551421151.549,
"4096"
]
},
# oc describe node | grep Hostname
Hostname: ip-10-0-134-117.us-east-2.compute.internal
Hostname: ip-10-0-136-27.us-east-2.compute.internal
Hostname: ip-10-0-147-13.us-east-2.compute.internal
Hostname: ip-10-0-152-104.us-east-2.compute.internal
Hostname: ip-10-0-168-118.us-east-2.compute.internal
Hostname: ip-10-0-174-80.us-east-2.compute.internal
payload: 4.0.0-0.nightly-2019-02-27-213933
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |
Description of problem: Prior to upgrading to OCP 3.11, Prometheus, it had a scrape job called “kubernetes-nodes” and metrics from this scrape job had a “node” metadata entry that we use to get metrics based on cluster node. After upgrading to OCP 3.11, it looks like these metrics are now being scraped by a job called “kubelet”, but the job does not have the “node” metadata. We have a consultant onsite that was was able to modify the dashboards to go after the “instance” metadata, but this uses/displays IP address. However, the customer would prefer to have node host-names to be used/shown. Version-Release number of selected component (if applicable): 3.11 Additional info: This is the query the consultant is using to check on how much docker storage is being used per cluster node. Basically, they want to use “node” instead “instance” is: sum(container_fs_usage_bytes{id="/",device=~"rootvg-docker--pool",instance=~".*"}) by (instance) / sum (container_fs_limit_bytes{id="/",device=~"rootvg-docker--pool",instance=~".*"}) by (instance) * 100