1913618 – Completed pods skew the Quota metrics

Bug 1913618 - Completed pods skew the Quota metrics

Summary: Completed pods skew the Quota metrics

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Philip Gough
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-07 09:15 UTC by iwatson
Modified:	2024-06-13 23:51 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Prometheus stores metrics for Pods/Jobs that have Failed/Completed. These Pods/Jobs may have had associated resources (requests and/ore limits). Consequence: Despite the fact that the Failed/Completed Pods/Jobs were no longer requesting resources, the dashboards did not take that into account and was skewing values for CPU and memory. Fix: Rewrite PromQL expression to filter out irrelevant data by ensuring we only account for running and pending containers. Result: Dashboards display correct memory and CPU resource requests and limits.
Clone Of:
Environment:
Last Closed:	2021-10-18 17:28:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Screenshot of Kube/Compute Resources / Cluster dashboard memory requests (185.92 KB, image/png) 2021-06-18 10:16 UTC, Philip Gough	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubernetes-monitoring kubernetes-mixin pull 639	None	open	CPU and Memory Quota should not include series from workloads that have released resources	2021-07-07 09:14:46 UTC
Github	openshift cluster-monitoring-operator pull 1236	None	closed	Sync with kube-prometheus	2021-06-23 09:05:05 UTC
Github	openshift cluster-monitoring-operator pull 1291	None	open	jsonnet: Sync with kube-prometheus	2021-07-20 14:14:14 UTC
Red Hat Product Errata	RHSA-2021:3759	None	None	None	2021-10-18 17:29:10 UTC

Description iwatson 2021-01-07 09:15:15 UTC

Description of problem:

Team, with the out of the box dashboard for Compute Resources / Cluster. The CPU and memory quota do not take into account Completed pods.

ie it can show a project as a very high CPU request, a low CPU utilisation whereas the truth is they have several completed builds that and are not using any resources at all.

As a cluster admin, we use this dashboard to identify namespaces with the largest difference between utilisation and requests in order to reduce the unused capacity.  

Are you aware of this problem / plans to address it?

Version-Release number of selected component (if applicable):

All versions from 3 up to latest

How reproducible:

Set a resource request on a build to 1 core and then perform 10 builds. Go to the monitoring, note that the namespace is shown to have 10 cpu requests and 0 utilisation

Comment 3 peter ducai 2021-04-06 08:00:14 UTC

I got infor form customer:

Grafana was incorrect on both in Grafana and on the console

Comment 5 Jayapriya Pai 2021-05-21 06:50:04 UTC

@Peter

The CPU and memory quota considers pods in Running/Pending state only as specified in [1] and [2] which is expected

[1]: https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/2785a9f0addd11c77c82a0c3e8580b556621049d/rules/apps.libsonnet#L69
[2]: https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/2785a9f0addd11c77c82a0c3e8580b556621049d/rules/apps.libsonnet#L83

Can you please attach the screenshot of dashboard you are referring to and let us know what is the exact issue

Comment 6 Jayapriya Pai 2021-05-25 16:31:34 UTC

Reviewed in sprint awaiting response from Peter to proceed further

Comment 7 Jayapriya Pai 2021-06-17 12:16:28 UTC

Closing this bug since insufficient data to proceed further. Please re-open the bug if this is still needed

Comment 8 iwatson 2021-06-17 12:19:02 UTC

Reopening as its still a bug.

Comment 9 iwatson 2021-06-17 12:19:52 UTC

Please read the initial description of the bug which has all the information required.

Comment 10 Jayapriya Pai 2021-06-17 14:49:18 UTC

Can you help me with the screenshot of dashboard you are referring to

I had posted this comment https://bugzilla.redhat.com/show_bug.cgi?id=1913618#c5 it was in private mode, changed to public if that was not visible to you

Comment 11 iwatson 2021-06-17 15:01:15 UTC

The query in question is

sum(kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) by (namespace)

This is on the Default / Kubernetes / Compute Resources / Cluster under "CPU Quota". This is a incorrect metric as it does not take into account completed pods. 

The CPU Quota is made up of several queries, the query I have picked out is for "CPU Requests".

Also the same issue for the Memory Quota.

The correct query should be sum( kube_pod_container_resource_requests_cpu_cores{cluster=\"$cluster\"}) join <query to determine running pods> ) by (namespace)

Comment 12 Philip Gough 2021-06-17 15:07:21 UTC

@iwatson I believe this is the correct behaviour as Kubernetes will release resource requests by Completed/Failed jobs and builds.

Comment 13 Philip Gough 2021-06-17 15:52:12 UTC

@ianwatson If you can, would you mind dropping in a screenshot of the dashboard that is causing you issues. Thanks

Comment 14 iwatson 2021-06-17 16:57:57 UTC

Kubernetes will indeed release the resources when the job/build completes. This is exactly the issue. The dashboard does not show this release as it accounts for all pods regardless of their status.

Ie create a new project and start a job with 1 cpu request/limit.

Look at the difference between oc describe quota and your graph/Prometheus metric to see the difference. Oc describe quota will show you at 0 cpu request resources. The dashboard will show you at 1 cpu requests resources.

I don’t see how the screenshot will help, I’ve identified the exact Prometheus query that is behind the dashboard and now provided exact steps to see this difference.

Comment 15 Philip Gough 2021-06-18 10:16:29 UTC

Created attachment 1792025 [details]
Screenshot of Kube/Compute Resources / Cluster dashboard memory requests

Comment 16 Philip Gough 2021-06-18 10:21:16 UTC

Firstly, I take your point and see that it could potentially lead to some confusion in the dashboards and we extend the query to take into account the Pod phase

The metric you mention `kube_pod_container_resource_requests` has been deprecated from kube-state-metrics and is not present in the latest version, see https://github.com/kubernetes/kube-state-metrics/pull/1224. 

The metric is behaving correctly see https://github.com/kubernetes/kube-state-metrics/issues/458 for explanation as is it's replacement https://github.com/kubernetes/kube-state-metrics/issues/1051

Secondly, the reason you were asked for a screenshot is so we could determine exactly what dashboard and section you felt was misleading, since the metric you mention is used in several places, so let me make sure we are on the same page. 

In the screenshot I have attached of the latest Kube/Compute Resources / Cluster Memory requests, I have requested 1Gi memory for 6 jobs and ran them to completion. The requests still show as 6 and there is no utilisation. Is the ask to ensure that this value should be 0 since the Pods owned by the Job are in the Completed phase? Besides doing the same for CPU requests, are there any other particular dashboards that you feel are misleading?

Thanks

Comment 17 iwatson 2021-06-18 10:31:46 UTC

Thanks Phillip

Yes that is the ask, that in your scenario the CPU Request Quota / Memory Request Quota on the Kube/Compute Resources / Cluster should be 0.

The use case is for cluster administrators to identify users who are using lots of requests but little to no utilization. This is not tricky given the current behaviour as the top projects are in general not the worst offenders, due to completed pods being present. 

The other graphs under Kube/Compute Resources/* also display the CPU Request /Memory Request quota a finer detail, I would argue that the behavior should be made consistent across all graphs.

Comment 19 Junqi Zhao 2021-07-05 07:35:03 UTC

use following script to deploy in test namespace
***********************
apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
        resources:
          requests:
            memory: "200Mi"
            cpu: "1000m"
          limits:
            memory: "200Mi"
            cpu: "1000m"
      restartPolicy: Never
*********************** 
checked with 4.9.0-0.nightly-2021-07-04-140102, only Kubernetes / Compute Resources / Cluster dashboard shows no value for Memory/CPU limit/request
other three dashboards
Kubernetes / Compute Resources / Pod
Kubernetes / Compute Resources / Namespace (Pods)
Kubernetes / Compute Resources / Node (Pods)
still show value for Memory/CPU limit/request

Comment 25 Junqi Zhao 2021-07-26 03:12:22 UTC

tested with 4.9.0-0.nightly-2021-07-25-125326, checked the following dashboards, no value for Memory/CPU limit/request
Kubernetes / Compute Resources / Cluster dashboard
Kubernetes / Compute Resources / Pod
Kubernetes / Compute Resources / Namespace (Pods)
Kubernetes / Compute Resources / Node (Pods)

Comment 32 errata-xmlrpc 2021-10-18 17:28:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 33 Red Hat Bugzilla 2023-09-15 00:57:49 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.