Bug 1846707

Summary: KubeQuotaExceeded fires even if quota is not _exceeded_
Product: OpenShift Container Platform Reporter: Naveen Malik <nmalik>
Component: MonitoringAssignee: Rick Rackow <rrackow>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: medium    
Version: 4.6CC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, rrackow, surbania
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1857248 (view as bug list) Environment:
Last Closed: 2020-10-27 16:06:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1857248    

Description Naveen Malik 2020-06-12 20:06:15 UTC
Description of problem:
On OSD we create a ResourceQuota [1] for logging that is at 100% of what we allow allocating for logging PVCs.  The KubeQuotaExceeded alert fires if quota is at or above 90% allocated.  This means the alert is always firing.  The name of the alert implies quota is exceeded and we are getting concerned queries from customers regarding this alert firing in the cluster.  We don't think allocating less than 90% of the quota is the right answer and request the expr for this alert be adjusted.

We are tracking a workaround for our alerting [2] that will fire if quota is actually exceeded.  The expr would become:

kube_resourcequota{job="kube-state-metrics",namespace="(openshift-.*|kube-.*|default|logging)",type="used"} / ignoring(instance, job, type) (kube_resourcequota{job="kube-state-metrics",namespace="(openshift-.*|kube-.*|default|logging)",type="hard"} > 0) > 1 





Version-Release number of selected component (if applicable):
4.3.18


How reproducible:
Always

Steps to Reproduce:
1. Create quota in an "openshift" namespace
2. Allocate 100% of quota in that namespace
3.

Actual results:
KubeQuotaExceeded fires.


Expected results:
KubeQuotaExceeded does not fire.

Additional info:
Expect KubeQuotaExceeded to fire only if storage quota allocated is above 100% (actually exceeded).


[1] https://github.com/openshift/managed-cluster-config/blob/master/deploy/osd-logging/03-storage-quota.yaml
[2] https://issues.redhat.com/browse/OSD-4017

Comment 6 Junqi Zhao 2020-07-17 05:33:35 UTC
tested with 4.6.0-0.nightly-2020-07-16-162619,KubeQuotaExceeded is changed to KubeQuotaFullyUsed
**********************
  - alert: KubeQuotaFullyUsed
    annotations:
      message: Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage
        }} of its {{ $labels.resource }} quota.
    expr: |
      kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics", type="used"}
        / ignoring(instance, job, type)
      (kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics", type="hard"} > 0)
        >= 1
    for: 15m
    labels:
      severity: info
**********************

Comment 8 errata-xmlrpc 2020-10-27 16:06:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196