Bug 1948926

Summary:

Memory Usage of Dashboard 'Kubernetes / Compute Resources / Pod' contain wrong CPU query

Product:

OpenShift Container Platform

Reporter:

hongyan li <hongyli>

Component:

Monitoring

Assignee:

Jan Fajerski <jfajersk>

Status:

CLOSED ERRATA

QA Contact:

hongyan li <hongyli>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

4.8

CC:

alegrand, anpicker, dgrisonn, erooth, juzhao, kakkoyun, lcosic, pkrupa, spasquie

Target Milestone:

---

Keywords:

Regression, Reopened

Target Release:

4.8.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-07-27 22:59:54 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
query screenshot	none
console screenshot	none
CPU request	none

Description hongyan li 2021-04-13 04:27:57 UTC

Created attachment 1771496 [details]
query screenshot

Created attachment 1771496 [details]
query screenshot

Description of problem:
Memory of the following Dashboards include wrong query
'Kubernetes / Compute Resources / Pod'

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-04-09-222447

How reproducible:
always

Steps to Reproduce:
1. Go to Monitoring -> Dashboards
2. Select DB 'Kubernetes / Compute Resources / Pod'
3. Find 'Memory Usage' and mouse over picture, there is request
4. click inspect, there are wrong query 
---------------------------------
sum(
    kube_pod_container_resource_requests{cluster="", namespace="openshift-monitoring", pod="prometheus-k8s-0", resource="cpu"}
)
-----
sum(
    kube_pod_container_resource_limits{cluster="", namespace="openshift-monitoring", pod="prometheus-k8s-0", resource="cpu"}
)



Actual results:


Expected results:


Additional info:
Screenshot for DB 'Kubernetes / Compute Resources / Pod' is uploaded

Comment 1 hongyan li 2021-04-13 04:57:10 UTC

Created attachment 1771497 [details]
console screenshot

Comment 2 hongyan li 2021-04-13 05:55:00 UTC

Request has been included in CPU usage which is responding to query 
sum(
    kube_pod_container_resource_requests{cluster="", namespace="openshift-monitoring", pod="prometheus-k8s-0", resource="cpu"}
)

Please see screenshot

Comment 3 hongyan li 2021-04-13 05:55:43 UTC

Created attachment 1771505 [details]
CPU request

Comment 4 Junqi Zhao 2021-04-13 06:16:31 UTC

sum(
    kube_pod_container_resource_requests{cluster="", namespace="openshift-monitoring", pod="prometheus-k8s-0", resource="cpu"}
)
-----
sum(
    kube_pod_container_resource_limits{cluster="", namespace="openshift-monitoring", pod="prometheus-k8s-0", resource="cpu"}
)

resource="cpu"
should change  to resource="memory", or remove resource="cpu"

Comment 5 Junqi Zhao 2021-04-13 06:19:18 UTC

"Kubernetes / Compute Resources / Pod" dashboard
configmap is grafana-dashboard-k8s-resources-pod

"Kubernetes / Compute Resources / Namespace (Pods)" dashboard
configmap is grafana-dashboard-k8s-resources-namespace

 "Kubernetes / Compute Resources / Namespace (Workloads)" dashboard
configmap is grafana-dashboard-k8s-resources-workloads-namespace

Comment 6 Junqi Zhao 2021-04-13 06:29:47 UTC

(In reply to Junqi Zhao from comment #5)
> "Kubernetes / Compute Resources / Pod" dashboard
> configmap is grafana-dashboard-k8s-resources-pod
> 
> "Kubernetes / Compute Resources / Namespace (Pods)" dashboard
> configmap is grafana-dashboard-k8s-resources-namespace
> 
>  "Kubernetes / Compute Resources / Namespace (Workloads)" dashboard
> configmap is grafana-dashboard-k8s-resources-workloads-namespace

only "Kubernetes / Compute Resources / Pod" dashboard has issue, need to fix it

Comment 7 hongyan li 2021-04-13 06:51:40 UTC

Memory usage for DB 'Kubernetes / Compute Resources / Namespace (Pods)' and DB 'Kubernetes / Compute Resources / Namespace (Workloads)' don't need the following queries which return no data
scalar(kube_resourcequota{cluster="", namespace="openshift-monitoring", type="hard",resource="requests.memory"})
and
scalar(kube_resourcequota{cluster="", namespace="openshift-monitoring", type="hard",resource="limits.memory"})

Comment 8 hongyan li 2021-04-13 07:21:01 UTC

On 4.7, query for DB 'Kubernetes / Compute Resources / Pod' is correct as the following

                            {
                                "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\", image!=\"\"}) by (container)",
                                "format": "time_series",
                                "intervalFactor": 2,
                                "legendFormat": "{{container}}",
                                "legendLink": null,
                                "step": 10
                            },
                            {
                                "expr": "sum(\n    kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n",
                                "format": "time_series",
                                "intervalFactor": 2,
                                "legendFormat": "requests",
                                "legendLink": null,
                                "step": 10
                            },
                            {
                                "expr": "sum(\n    kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n",
                                "format": "time_series",
                                "intervalFactor": 2,
                                "legendFormat": "limits",
                                "legendLink": null,
                                "step": 10
                            }

Comment 9 hongyan li 2021-04-13 07:26:47 UTC

(In reply to hongyan li from comment #7)
> Memory usage for DB 'Kubernetes / Compute Resources / Namespace (Pods)' and
> DB 'Kubernetes / Compute Resources / Namespace (Workloads)' don't need the
> following queries which return no data
> scalar(kube_resourcequota{cluster="", namespace="openshift-monitoring",
> type="hard",resource="requests.memory"})
> and
> scalar(kube_resourcequota{cluster="", namespace="openshift-monitoring",
> type="hard",resource="limits.memory"})

For these two DBs, 4.7 include same queries with name quota - limits and quota - request

Comment 10 hongyan li 2021-04-13 07:46:36 UTC

Issue related to comment #6, #7 and #9, filed a new bug https://bugzilla.redhat.com/show_bug.cgi?id=1948972 which exists on both 4.7 and 4.8

Comment 11 Damien Grisonnet 2021-04-14 10:12:07 UTC

This doesn't seem to be a bug, it's normal for the query to return no data since we don't define resource limits for the monitoring stack pods.

Comment 12 hongyan li 2021-04-15 03:15:51 UTC

This is a bug with wrong CPU data in Memory Usage. Refer https://bugzilla.redhat.com/show_bug.cgi?id=1948926#c4

Comment 13 Damien Grisonnet 2021-04-15 08:09:10 UTC

You are very right, there is definitely a bug with the memory limits/requests query for which we set `resource=cpu` instead of `resource=memory`.

This issue seems to be coming from upstream as we don't replace the resource value here: https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/dashboards/resources/pod.libsonnet#L50-L58

Comment 15 Junqi Zhao 2021-05-25 10:52:19 UTC

tested with 4.8.0-0.nightly-2021-05-21-233425, "Kubernetes / Compute Resources / Pod" dashboard, select any pod under any project, "Memory Usage" section, click "Inspect" to check the expression, the wrong expression in Comment 4 is updated to correct values, see from resource="memory" in the expr  

sum(
    kube_pod_container_resource_requests{cluster="", namespace="openshift-monitoring", pod="alertmanager-main-0", resource="memory"}
)

Comment 18 errata-xmlrpc 2021-07-27 22:59:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438