Bug 1857248 - KubeQuotaExceeded fires even if quota is not _exceeded_
Summary: KubeQuotaExceeded fires even if quota is not _exceeded_
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.5
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.5.z
Assignee: Rick Rackow
QA Contact: Junqi Zhao
Depends On: 1846707
TreeView+ depends on / blocked
Reported: 2020-07-15 14:26 UTC by Rick Rackow
Modified: 2020-08-10 13:50 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1846707
Last Closed: 2020-08-10 13:50:20 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 857 0 None closed Bug 1857248: changing KubeQuotaExceeded to KubeQuotaFullyUsed 2020-08-11 05:14:12 UTC
Red Hat Product Errata RHBA-2020:3188 0 None None None 2020-08-10 13:50:39 UTC

Description Rick Rackow 2020-07-15 14:26:52 UTC
+++ This bug was initially created as a clone of Bug #1846707 +++

Description of problem:
On OSD we create a ResourceQuota [1] for logging that is at 100% of what we allow allocating for logging PVCs.  The KubeQuotaExceeded alert fires if quota is at or above 90% allocated.  This means the alert is always firing.  The name of the alert implies quota is exceeded and we are getting concerned queries from customers regarding this alert firing in the cluster.  We don't think allocating less than 90% of the quota is the right answer and request the expr for this alert be adjusted.

We are tracking a workaround for our alerting [2] that will fire if quota is actually exceeded.  The expr would become:

kube_resourcequota{job="kube-state-metrics",namespace="(openshift-.*|kube-.*|default|logging)",type="used"} / ignoring(instance, job, type) (kube_resourcequota{job="kube-state-metrics",namespace="(openshift-.*|kube-.*|default|logging)",type="hard"} > 0) > 1 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create quota in an "openshift" namespace
2. Allocate 100% of quota in that namespace

Actual results:
KubeQuotaExceeded fires.

Expected results:
KubeQuotaExceeded does not fire.

Additional info:
Expect KubeQuotaExceeded to fire only if storage quota allocated is above 100% (actually exceeded).

[1] https://github.com/openshift/managed-cluster-config/blob/master/deploy/osd-logging/03-storage-quota.yaml
[2] https://issues.redhat.com/browse/OSD-4017

Comment 3 Junqi Zhao 2020-07-30 03:29:01 UTC
tested with 4.5.0-0.nightly-2020-07-30-020337,KubeQuotaExceeded is changed to KubeQuotaFullyUsed
  - alert: KubeQuotaFullyUsed
      message: Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage
        }} of its {{ $labels.resource }} quota.
    expr: |
      kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics", type="used"}
        / ignoring(instance, job, type)
      (kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics", type="hard"} > 0)
        >= 1
    for: 15m
      severity: info

Comment 5 errata-xmlrpc 2020-08-10 13:50:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.5 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.