Description of problem: On OSD we create a ResourceQuota [1] for logging that is at 100% of what we allow allocating for logging PVCs. The KubeQuotaExceeded alert fires if quota is at or above 90% allocated. This means the alert is always firing. The name of the alert implies quota is exceeded and we are getting concerned queries from customers regarding this alert firing in the cluster. We don't think allocating less than 90% of the quota is the right answer and request the expr for this alert be adjusted. We are tracking a workaround for our alerting [2] that will fire if quota is actually exceeded. The expr would become: kube_resourcequota{job="kube-state-metrics",namespace="(openshift-.*|kube-.*|default|logging)",type="used"} / ignoring(instance, job, type) (kube_resourcequota{job="kube-state-metrics",namespace="(openshift-.*|kube-.*|default|logging)",type="hard"} > 0) > 1 Version-Release number of selected component (if applicable): 4.3.18 How reproducible: Always Steps to Reproduce: 1. Create quota in an "openshift" namespace 2. Allocate 100% of quota in that namespace 3. Actual results: KubeQuotaExceeded fires. Expected results: KubeQuotaExceeded does not fire. Additional info: Expect KubeQuotaExceeded to fire only if storage quota allocated is above 100% (actually exceeded). [1] https://github.com/openshift/managed-cluster-config/blob/master/deploy/osd-logging/03-storage-quota.yaml [2] https://issues.redhat.com/browse/OSD-4017
tested with 4.6.0-0.nightly-2020-07-16-162619,KubeQuotaExceeded is changed to KubeQuotaFullyUsed ********************** - alert: KubeQuotaFullyUsed annotations: message: Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage }} of its {{ $labels.resource }} quota. expr: | kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics", type="used"} / ignoring(instance, job, type) (kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics", type="hard"} > 0) >= 1 for: 15m labels: severity: info **********************
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196