Hide Forgot
Description of problem: Prior to upgrading to OCP 3.11, Prometheus, it had a scrape job called “kubernetes-nodes” and metrics from this scrape job had a “node” metadata entry that we use to get metrics based on cluster node. After upgrading to OCP 3.11, it looks like these metrics are now being scraped by a job called “kubelet”, but the job does not have the “node” metadata. We have a consultant onsite that was was able to modify the dashboards to go after the “instance” metadata, but this uses/displays IP address. However, the customer would prefer to have node host-names to be used/shown. Version-Release number of selected component (if applicable): 3.11 Additional info: This is the query the consultant is using to check on how much docker storage is being used per cluster node. Basically, they want to use “node” instead “instance” is: sum(container_fs_usage_bytes{id="/",device=~"rootvg-docker--pool",instance=~".*"}) by (instance) / sum (container_fs_limit_bytes{id="/",device=~"rootvg-docker--pool",instance=~".*"}) by (instance) * 100
This might be related to a change in the Prometheus Operator. Would you mind sharing the Prometheus configuration. You can find it in the openshift-monitoring namespace as a secret with the name `prometheus-k8s`. Please make sure to remove all secrets beforehand and share it in private, just in case. In addition joining the metric with metrics exposed by kube-state-metrics about the nodes might help [1]. [1] https://prometheus.io/docs/prometheus/latest/querying/operators/#vector-matching
Marking this as modified as this is available as requested in 4.0, for the time being vector joining in combination with the `label_replace` function are the way to go.
search container_fs_usage_bytes from prometheus, "node" is added, and name is node Hostname eg { "metric": { "__name__": "container_fs_usage_bytes", "container_name": "kube-rbac-proxy-main", "device": "/dev/xvda2", "endpoint": "https-metrics", "id": "/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod01a39bbd_3bc9_11e9_ba8e_06ead38c04ac.slice/crio-2686d03fcf60c44f669c31dc58333e25d71236e183b1489aec89a445ef70f310.scope", "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b6de05167ecab0472279cdc430105fac4b97fb2c43d854e1c1aa470d20a36572", "instance": "10.0.134.117:10250", "job": "kubelet", "name": "k8s_kube-rbac-proxy-main_kube-state-metrics-769cdfcd87-bhwh4_openshift-monitoring_01a39bbd-3bc9-11e9-ba8e-06ead38c04ac_0", "namespace": "openshift-monitoring", "node": "ip-10-0-134-117.us-east-2.compute.internal", "pod_name": "kube-state-metrics-769cdfcd87-bhwh4", "service": "kubelet" }, "value": [ 1551421151.549, "4096" ] }, # oc describe node | grep Hostname Hostname: ip-10-0-134-117.us-east-2.compute.internal Hostname: ip-10-0-136-27.us-east-2.compute.internal Hostname: ip-10-0-147-13.us-east-2.compute.internal Hostname: ip-10-0-152-104.us-east-2.compute.internal Hostname: ip-10-0-168-118.us-east-2.compute.internal Hostname: ip-10-0-174-80.us-east-2.compute.internal payload: 4.0.0-0.nightly-2019-02-27-213933
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758