Bug 1948926
Summary: | Memory Usage of Dashboard 'Kubernetes / Compute Resources / Pod' contain wrong CPU query | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | hongyan li <hongyli> | ||||||||
Component: | Monitoring | Assignee: | Jan Fajerski <jfajersk> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | hongyan li <hongyli> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 4.8 | CC: | alegrand, anpicker, dgrisonn, erooth, juzhao, kakkoyun, lcosic, pkrupa, spasquie | ||||||||
Target Milestone: | --- | Keywords: | Regression, Reopened | ||||||||
Target Release: | 4.8.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2021-07-27 22:59:54 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
hongyan li
2021-04-13 04:27:57 UTC
Created attachment 1771497 [details]
console screenshot
Request has been included in CPU usage which is responding to query sum( kube_pod_container_resource_requests{cluster="", namespace="openshift-monitoring", pod="prometheus-k8s-0", resource="cpu"} ) Please see screenshot Created attachment 1771505 [details]
CPU request
sum( kube_pod_container_resource_requests{cluster="", namespace="openshift-monitoring", pod="prometheus-k8s-0", resource="cpu"} ) ----- sum( kube_pod_container_resource_limits{cluster="", namespace="openshift-monitoring", pod="prometheus-k8s-0", resource="cpu"} ) resource="cpu" should change to resource="memory", or remove resource="cpu" "Kubernetes / Compute Resources / Pod" dashboard configmap is grafana-dashboard-k8s-resources-pod "Kubernetes / Compute Resources / Namespace (Pods)" dashboard configmap is grafana-dashboard-k8s-resources-namespace "Kubernetes / Compute Resources / Namespace (Workloads)" dashboard configmap is grafana-dashboard-k8s-resources-workloads-namespace (In reply to Junqi Zhao from comment #5) > "Kubernetes / Compute Resources / Pod" dashboard > configmap is grafana-dashboard-k8s-resources-pod > > "Kubernetes / Compute Resources / Namespace (Pods)" dashboard > configmap is grafana-dashboard-k8s-resources-namespace > > "Kubernetes / Compute Resources / Namespace (Workloads)" dashboard > configmap is grafana-dashboard-k8s-resources-workloads-namespace only "Kubernetes / Compute Resources / Pod" dashboard has issue, need to fix it Memory usage for DB 'Kubernetes / Compute Resources / Namespace (Pods)' and DB 'Kubernetes / Compute Resources / Namespace (Workloads)' don't need the following queries which return no data scalar(kube_resourcequota{cluster="", namespace="openshift-monitoring", type="hard",resource="requests.memory"}) and scalar(kube_resourcequota{cluster="", namespace="openshift-monitoring", type="hard",resource="limits.memory"}) On 4.7, query for DB 'Kubernetes / Compute Resources / Pod' is correct as the following { "expr": "sum(container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"POD\", container!=\"\", image!=\"\"}) by (container)", "format": "time_series", "intervalFactor": 2, "legendFormat": "{{container}}", "legendLink": null, "step": 10 }, { "expr": "sum(\n kube_pod_container_resource_requests_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n", "format": "time_series", "intervalFactor": 2, "legendFormat": "requests", "legendLink": null, "step": 10 }, { "expr": "sum(\n kube_pod_container_resource_limits_memory_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\"})\n", "format": "time_series", "intervalFactor": 2, "legendFormat": "limits", "legendLink": null, "step": 10 } (In reply to hongyan li from comment #7) > Memory usage for DB 'Kubernetes / Compute Resources / Namespace (Pods)' and > DB 'Kubernetes / Compute Resources / Namespace (Workloads)' don't need the > following queries which return no data > scalar(kube_resourcequota{cluster="", namespace="openshift-monitoring", > type="hard",resource="requests.memory"}) > and > scalar(kube_resourcequota{cluster="", namespace="openshift-monitoring", > type="hard",resource="limits.memory"}) For these two DBs, 4.7 include same queries with name quota - limits and quota - request Issue related to comment #6, #7 and #9, filed a new bug https://bugzilla.redhat.com/show_bug.cgi?id=1948972 which exists on both 4.7 and 4.8 This doesn't seem to be a bug, it's normal for the query to return no data since we don't define resource limits for the monitoring stack pods. This is a bug with wrong CPU data in Memory Usage. Refer https://bugzilla.redhat.com/show_bug.cgi?id=1948926#c4 You are very right, there is definitely a bug with the memory limits/requests query for which we set `resource=cpu` instead of `resource=memory`. This issue seems to be coming from upstream as we don't replace the resource value here: https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/dashboards/resources/pod.libsonnet#L50-L58 tested with 4.8.0-0.nightly-2021-05-21-233425, "Kubernetes / Compute Resources / Pod" dashboard, select any pod under any project, "Memory Usage" section, click "Inspect" to check the expression, the wrong expression in Comment 4 is updated to correct values, see from resource="memory" in the expr sum( kube_pod_container_resource_requests{cluster="", namespace="openshift-monitoring", pod="alertmanager-main-0", resource="memory"} ) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |