Bug 1846707
| Summary: | KubeQuotaExceeded fires even if quota is not _exceeded_ | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Naveen Malik <nmalik> | |
| Component: | Monitoring | Assignee: | Rick Rackow <rrackow> | |
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | |
| Severity: | low | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 4.6 | CC: | alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, rrackow, surbania | |
| Target Milestone: | --- | |||
| Target Release: | 4.6.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1857248 (view as bug list) | Environment: | ||
| Last Closed: | 2020-10-27 16:06:58 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1857248 | |||
tested with 4.6.0-0.nightly-2020-07-16-162619,KubeQuotaExceeded is changed to KubeQuotaFullyUsed
**********************
- alert: KubeQuotaFullyUsed
annotations:
message: Namespace {{ $labels.namespace }} is using {{ $value | humanizePercentage
}} of its {{ $labels.resource }} quota.
expr: |
kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics", type="used"}
/ ignoring(instance, job, type)
(kube_resourcequota{namespace=~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics", type="hard"} > 0)
>= 1
for: 15m
labels:
severity: info
**********************
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |
Description of problem: On OSD we create a ResourceQuota [1] for logging that is at 100% of what we allow allocating for logging PVCs. The KubeQuotaExceeded alert fires if quota is at or above 90% allocated. This means the alert is always firing. The name of the alert implies quota is exceeded and we are getting concerned queries from customers regarding this alert firing in the cluster. We don't think allocating less than 90% of the quota is the right answer and request the expr for this alert be adjusted. We are tracking a workaround for our alerting [2] that will fire if quota is actually exceeded. The expr would become: kube_resourcequota{job="kube-state-metrics",namespace="(openshift-.*|kube-.*|default|logging)",type="used"} / ignoring(instance, job, type) (kube_resourcequota{job="kube-state-metrics",namespace="(openshift-.*|kube-.*|default|logging)",type="hard"} > 0) > 1 Version-Release number of selected component (if applicable): 4.3.18 How reproducible: Always Steps to Reproduce: 1. Create quota in an "openshift" namespace 2. Allocate 100% of quota in that namespace 3. Actual results: KubeQuotaExceeded fires. Expected results: KubeQuotaExceeded does not fire. Additional info: Expect KubeQuotaExceeded to fire only if storage quota allocated is above 100% (actually exceeded). [1] https://github.com/openshift/managed-cluster-config/blob/master/deploy/osd-logging/03-storage-quota.yaml [2] https://issues.redhat.com/browse/OSD-4017