Bug 2018880
Summary: | Get 'No datapoints found.' when query metrics about alert rule KubeCPUQuotaOvercommit and KubeMemoryQuotaOvercommit | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | hongyan li <hongyli> |
Component: | Monitoring | Assignee: | Simon Pasquier <spasquie> |
Status: | CLOSED ERRATA | QA Contact: | hongyan li <hongyli> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.10 | CC: | amuller, anpicker, aos-bugs, arajkuma, erooth, jiewu, pgough |
Target Milestone: | --- | Flags: | hongyli:
needinfo-
|
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-10 16:23:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
hongyan li
2021-11-01 06:13:53 UTC
PromQL query results: kube_resourcequota{resource=~".*cpu.*"} With 4 results: kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="limits.cpu",resourcequota="compute-resources",service="kube-state-metrics",type="hard"} 2 kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="limits.cpu",resourcequota="compute-resources",service="kube-state-metrics",type="used"} 0 kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="requests.cpu",resourcequota="compute-resources",service="kube-state-metrics",type="hard"} 1 kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="requests.cpu",resourcequota="compute-resources",service="kube-state-metrics",type="used"} 0 Only the "requests.cpu" & "limits.cpu" are showing in 'resource' field, "resource=cpu" will not show any results. PromQL query: kube_resourcequota{resource=~".*memory.*"} With 4 results: Element Value kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="limits.memory",resourcequota="compute-resources",service="kube-state-metrics",type="hard"} 2147483648 kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="limits.memory",resourcequota="compute-resources",service="kube-state-metrics",type="used"} 0 kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="requests.memory",resourcequota="compute-resources",service="kube-state-metrics",type="hard"} 1073741824 kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="requests.memory",resourcequota="compute-resources",service="kube-state-metrics",type="used"} Only the "limits.memory" & "requests.memory" are showing in 'resource' field, "resource=memory" will not show any results. these are the alerts from 4.10.0-0.nightly-2021-10-31-133814, Comment 1 is 4.9, not 4.10 **************************** - alert: KubeCPUQuotaOvercommit annotations: description: Cluster has overcommitted CPU resource requests for Namespaces. summary: Cluster has overcommitted CPU resource requests. expr: | sum(kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default)",job="kube-state-metrics", type="hard", resource="cpu"}) / sum(kube_node_status_allocatable{resource="cpu"}) > 1.5 for: 5m labels: severity: warning - alert: KubeMemoryQuotaOvercommit annotations: description: Cluster has overcommitted memory resource requests for Namespaces. summary: Cluster has overcommitted memory resource requests. expr: | sum(kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default)",job="kube-state-metrics", type="hard", resource="memory"}) / sum(kube_node_status_allocatable{resource="memory",job="kube-state-metrics"}) > 1.5 for: 5m labels: severity: warning **************************** reason why you get "No datapoints found" is there is not kube_resourcequota with labels {type="hard", resource="cpu"} and {type="hard", resource="memory"} see from ******************************** count(kube_resourcequota) by (namespace, job, type, resource) {job="kube-state-metrics", namespace="openshift-host-network", resource="count/daemonsets.apps", type="hard"} 1 {job="kube-state-metrics", namespace="openshift-host-network", resource="count/deployments.apps", type="hard"} 1 {job="kube-state-metrics", namespace="openshift-host-network", resource="limits.cpu", type="hard"} 1 {job="kube-state-metrics", namespace="openshift-host-network", resource="limits.cpu", type="used"} 1 {job="kube-state-metrics", namespace="openshift-host-network", resource="limits.memory", type="hard"} 1 {job="kube-state-metrics", namespace="openshift-host-network", resource="limits.memory", type="used"} 1 {job="kube-state-metrics", namespace="openshift-host-network", resource="pods", type="used"} 1 {job="kube-state-metrics", namespace="openshift-host-network", resource="count/daemonsets.apps", type="used"} 1 {job="kube-state-metrics", namespace="openshift-host-network", resource="count/deployments.apps", type="used"} 1 {job="kube-state-metrics", namespace="openshift-host-network", resource="pods", type="hard"} 1 ******************************** (In reply to Junqi Zhao from comment #3) > reason why you get "No datapoints found" is there is not kube_resourcequota with labels {type="hard", resource="cpu"} and {type="hard", resource="memory"} change to reason why you get "No datapoints found" is there is not kube_resourcequota with labels {type="hard", resource="cpu"} and {type="hard", resource="memory"} from your cluster From the #c1 and #c2, we can know the environment on which we face the issue, both requests.cpu and requests.memory have data, but our alert use cpu and memory in the expr and show 'No datapoint found' (In reply to hongyan li from comment #9) > From the #c1 and #c2, we can know the environment on which we face the > issue, both requests.cpu and requests.memory have data, but our alert use > cpu and memory in the expr and show 'No datapoint found' yes, indeed According to https://kubernetes.io/docs/concepts/policy/resource-quotas/#compute-resource-quota, cpu is same as requests.cpu and memory is same as requests.memory. IMHO, the expression must be modified to kube_resourcequota{resource=~"(requests.cpu|cpu)"} and kube_resourcequota{resource=~"(requests.memory|memory)"}. I can raise an upstream PR to fix the same. @arajkuma Yes, you're right. The query expression needs to be adjusted. Do you want to raise the PR or should I do it? @fpetkovs https://github.com/openshift/cluster-monitoring-operator/pull/1491 should fix this issue. verified in payload 4.10.0-0.nightly-2021-12-01-072705 alert expr are changed as the following and expr works well. alert: KubeCPUQuotaOvercommit expr: sum(min without(resource) (kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default)",resource=~"(cpu|requests.cpu)",type="hard"})) / sum(kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) > 1.5 alert: KubeMemoryQuotaOvercommit expr: sum(min without(resource) (kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default)",resource=~"(memory|requests.memory)",type="hard"})) / sum(kube_node_status_allocatable{job="kube-state-metrics",resource="memory"}) > 1.5 need more test % oc label ns default openshift.io/cluster-monitoring="true" % oc project default Now using project "default" on server "https://api.hongyli-1202.qe.devcluster.openshift.com:6443". % oc apply -f - <<EOF heredoc> apiVersion: v1 kind: ResourceQuota metadata: name: compute-resources spec: hard: pods: "4" requests.cpu: "1" requests.memory: 1Gi limits.cpu: "2" limits.memory: 2Gi heredoc> EOF resourcequota/compute-resources created % oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=sum(min without(resource) (kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default)",resource=~"(memory|requests.memory)",type="hard"})) / sum(kube_node_status_allocatable{job="kube-state-metrics",resource="memory"})' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0{"status":"success","data":{"resultType":"vector","result":[{"metric":{},"value":[1638496559.589,"0.011685650023197984"]}]}} 100 473 100 125 100 348 8333 23200 --:--:-- --:--:-- --:--:-- 31533 % oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=sum(min without(resource) (kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default)",resource=~"(cpu|requests.cpu)",type="hard"})) / sum(kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"})' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 464 100 125 100 339 6944 18833 --:--:-- --:--:-- --:--:-- 27294 {"status":"success","data":{"resultType":"vector","result":[{"metric":{},"value":[1638496645.636,"0.047619047619047616"]}]}} Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |