Bug 2018880 - Get 'No datapoints found.' when query metrics about alert rule KubeCPUQuotaOvercommit and KubeMemoryQuotaOvercommit
Summary: Get 'No datapoints found.' when query metrics about alert rule KubeCPUQuotaOv...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Simon Pasquier
QA Contact: hongyan li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-01 06:13 UTC by hongyan li
Modified: 2023-09-15 01:16 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:23:41 UTC
Target Upstream Version:
Embargoed:
hongyli: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes-monitoring kubernetes-mixin pull 694 0 None open fix: Consider `requests.(cpu|memory)` for quota overcommit alerts 2021-11-10 13:27:16 UTC
Github openshift cluster-monitoring-operator pull 1491 0 None open [bot] Automated jsonnet dependencies update 2021-11-30 03:38:34 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:24:03 UTC

Description hongyan li 2021-11-01 06:13:53 UTC
Description of problem:

alert: KubeCPUQuotaOvercommit
expr: sum(kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default)",resource="cpu",type="hard"}) / sum(kube_node_status_allocatable{resource="cpu"}) > 1.5

for: 5m
labels:
  severity: warning
annotations:
  message: Cluster has overcommitted CPU resource requests for Namespaces.

when query metric sum(kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default|logging)",resource="cpu",type="hard"}) get 'No datapoints found.'.  

alert: KubeMemoryQuotaOvercommit
expr: sum(kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default)",resource="memory",type="hard"}) / sum(kube_node_status_allocatable{job="kube-state-metrics",resource="memory"}) > 1.5
for: 5m
labels:
  severity: warning
annotations:
  message: Cluster has overcommitted memory resource requests for Namespaces.

when query metric sum(kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default)",resource="memory",type="hard"}) get 'No datapoints found.'.  

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2021-10-31-133814
4.8 customer has the issue, suppose 4.9 has the issue also

How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Our query expr is wrong

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-resources
spec:
  hard:
    requests.cpu: "1"
    requests.memory: 1Gi
    limits.cpu: "2"
    limits.memory: 2Gi
    requests.nvidia.com/gpu: 4

4.8 alert rule
alert: KubeCPUQuotaOvercommit
expr: sum(kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default|logging)",resource="cpu",type="hard"}) / sum(kube_node_status_allocatable{resource="cpu"}) > 1.5
for: 5m
labels:
  severity: warning
annotations:
  message: Cluster has overcommitted CPU resource requests for Namespaces.
----
alert: KubeMemoryQuotaOvercommit
expr: sum(kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default|logging)",resource="memory",type="hard"}) / sum(kube_node_status_allocatable{job="kube-state-metrics",resource="memory"}) > 1.5
for: 5m
labels:
  severity: warning
annotations:
  message: Cluster has overcommitted memory resource requests for Namespaces.

Comment 1 Jie Wu 2021-11-01 07:10:47 UTC
PromQL query results:
kube_resourcequota{resource=~".*cpu.*"}

With 4 results:
kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="limits.cpu",resourcequota="compute-resources",service="kube-state-metrics",type="hard"}	2

kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="limits.cpu",resourcequota="compute-resources",service="kube-state-metrics",type="used"}	0

kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="requests.cpu",resourcequota="compute-resources",service="kube-state-metrics",type="hard"}	1

kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="requests.cpu",resourcequota="compute-resources",service="kube-state-metrics",type="used"}	0

Only the "requests.cpu" & "limits.cpu" are showing in 'resource' field, "resource=cpu" will not show any results.

Comment 2 Jie Wu 2021-11-01 07:18:54 UTC
PromQL query:
kube_resourcequota{resource=~".*memory.*"}

With 4 results:

Element	Value
kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="limits.memory",resourcequota="compute-resources",service="kube-state-metrics",type="hard"}	2147483648
kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="limits.memory",resourcequota="compute-resources",service="kube-state-metrics",type="used"}	0
kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="requests.memory",resourcequota="compute-resources",service="kube-state-metrics",type="hard"}	1073741824
kube_resourcequota{container="kube-rbac-proxy-main",endpoint="https-main",instance="10.129.2.7:8443",job="kube-state-metrics",namespace="quotapj",pod="kube-state-metrics-6d766d775-qtl5d",resource="requests.memory",resourcequota="compute-resources",service="kube-state-metrics",type="used"}

Only the "limits.memory" & "requests.memory" are showing in 'resource' field, "resource=memory" will not show any results.

Comment 3 Junqi Zhao 2021-11-01 07:22:00 UTC
these are the alerts from 4.10.0-0.nightly-2021-10-31-133814, Comment 1 is 4.9, not 4.10
****************************
        - alert: KubeCPUQuotaOvercommit
          annotations:
            description: Cluster has overcommitted CPU resource requests for Namespaces.
            summary: Cluster has overcommitted CPU resource requests.
          expr: |
            sum(kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default)",job="kube-state-metrics", type="hard", resource="cpu"})
              /
            sum(kube_node_status_allocatable{resource="cpu"})
              > 1.5
          for: 5m
          labels:
            severity: warning
        - alert: KubeMemoryQuotaOvercommit
          annotations:
            description: Cluster has overcommitted memory resource requests for Namespaces.
            summary: Cluster has overcommitted memory resource requests.
          expr: |
            sum(kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default)",job="kube-state-metrics", type="hard", resource="memory"})
              /
            sum(kube_node_status_allocatable{resource="memory",job="kube-state-metrics"})
              > 1.5
          for: 5m
          labels:
            severity: warning
****************************
reason why you get "No datapoints found" is there is not kube_resourcequota with labels {type="hard", resource="cpu"} and {type="hard", resource="memory"}
see from
********************************
count(kube_resourcequota) by (namespace, job, type, resource)
{job="kube-state-metrics", namespace="openshift-host-network", resource="count/daemonsets.apps", type="hard"} 1
{job="kube-state-metrics", namespace="openshift-host-network", resource="count/deployments.apps", type="hard"} 1
{job="kube-state-metrics", namespace="openshift-host-network", resource="limits.cpu", type="hard"} 1
{job="kube-state-metrics", namespace="openshift-host-network", resource="limits.cpu", type="used"} 1
{job="kube-state-metrics", namespace="openshift-host-network", resource="limits.memory", type="hard"} 1
{job="kube-state-metrics", namespace="openshift-host-network", resource="limits.memory", type="used"} 1
{job="kube-state-metrics", namespace="openshift-host-network", resource="pods", type="used"} 1
{job="kube-state-metrics", namespace="openshift-host-network", resource="count/daemonsets.apps", type="used"} 1
{job="kube-state-metrics", namespace="openshift-host-network", resource="count/deployments.apps", type="used"} 1
{job="kube-state-metrics", namespace="openshift-host-network", resource="pods", type="hard"} 1
********************************

Comment 4 Junqi Zhao 2021-11-01 07:23:16 UTC
(In reply to Junqi Zhao from comment #3)
> reason why you get "No datapoints found" is there is not kube_resourcequota with labels {type="hard", resource="cpu"} and {type="hard", resource="memory"}


change to
reason why you get "No datapoints found" is there is not kube_resourcequota with labels {type="hard", resource="cpu"} and {type="hard", resource="memory"} from your cluster

Comment 9 hongyan li 2021-11-08 05:46:47 UTC
From the #c1 and #c2, we can know the environment on which we face the issue, both requests.cpu and requests.memory have data, but our alert use cpu and memory in the expr and show 'No datapoint found'

Comment 10 Junqi Zhao 2021-11-08 06:35:53 UTC
(In reply to hongyan li from comment #9)
> From the #c1 and #c2, we can know the environment on which we face the
> issue, both requests.cpu and requests.memory have data, but our alert use
> cpu and memory in the expr and show 'No datapoint found'

yes, indeed

Comment 11 Arunprasad Rajkumar 2021-11-10 12:22:16 UTC
According to https://kubernetes.io/docs/concepts/policy/resource-quotas/#compute-resource-quota, cpu is same as requests.cpu and memory is same as requests.memory.

IMHO, the expression must be modified to kube_resourcequota{resource=~"(requests.cpu|cpu)"} and kube_resourcequota{resource=~"(requests.memory|memory)"}. I can raise an upstream PR to fix the same.

Comment 12 Filip Petkovski 2021-11-22 05:59:28 UTC
@arajkuma Yes, you're right. The query expression needs to be adjusted. Do you want to raise the PR or should I do it?

Comment 13 Arunprasad Rajkumar 2021-11-30 03:38:35 UTC
@fpetkovs https://github.com/openshift/cluster-monitoring-operator/pull/1491 should fix this issue.

Comment 16 hongyan li 2021-12-01 09:29:35 UTC
verified in payload 4.10.0-0.nightly-2021-12-01-072705

alert expr are changed as the following and expr works well.

alert: KubeCPUQuotaOvercommit
expr: sum(min without(resource) (kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default)",resource=~"(cpu|requests.cpu)",type="hard"})) / sum(kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"}) > 1.5

 

alert: KubeMemoryQuotaOvercommit
expr: sum(min without(resource) (kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default)",resource=~"(memory|requests.memory)",type="hard"})) / sum(kube_node_status_allocatable{job="kube-state-metrics",resource="memory"}) > 1.5

Comment 17 hongyan li 2021-12-02 02:07:00 UTC
need more test

Comment 18 hongyan li 2021-12-03 01:58:36 UTC
% oc label ns default openshift.io/cluster-monitoring="true"
% oc project default
Now using project "default" on server "https://api.hongyli-1202.qe.devcluster.openshift.com:6443".
% oc apply -f - <<EOF
heredoc> apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-resources
spec:
  hard:
    pods: "4"
    requests.cpu: "1"
    requests.memory: 1Gi
    limits.cpu: "2"
    limits.memory: 2Gi
heredoc> EOF
resourcequota/compute-resources created

% oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=sum(min without(resource) (kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default)",resource=~"(memory|requests.memory)",type="hard"})) / sum(kube_node_status_allocatable{job="kube-state-metrics",resource="memory"})' 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{"status":"success","data":{"resultType":"vector","result":[{"metric":{},"value":[1638496559.589,"0.011685650023197984"]}]}}
100   473  100   125  100   348   8333  23200 --:--:-- --:--:-- --:--:-- 31533
% oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=sum(min without(resource) (kube_resourcequota{job="kube-state-metrics",namespace=~"(openshift-.*|kube-.*|default)",resource=~"(cpu|requests.cpu)",type="hard"})) / sum(kube_node_status_allocatable{job="kube-state-metrics",resource="cpu"})'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   464  100   125  100   339   6944  18833 --:--:-- --:--:-- --:--:-- 27294
{"status":"success","data":{"resultType":"vector","result":[{"metric":{},"value":[1638496645.636,"0.047619047619047616"]}]}}

Comment 22 errata-xmlrpc 2022-03-10 16:23:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Comment 23 Red Hat Bugzilla 2023-09-15 01:16:52 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.