Description of problem: In the publicly accessible multi-tenant starter environments, users are intentionally restricted to 1 PVC, 2 CPUs, etc. The current KubeQuotaExceeded alert: '100 * kube_resourcequota{job="kube-state-metrics",type="used"} / ignoring(instance, job, type) kube_resourcequota{job="kube-state-metrics",type="hard"} > 90' will invariably alert if the user consumes their PVC. This results in potentially thousands of warnings like the following: alertname="KubeQuotaExceeded" endpoint="https-main" namespace="jmp-test15" pod="kube-state-metrics-b44488686-p54bf" resource="persistentvolumeclaims" resourcequota="object-counts" service="kube-state-metrics" severity="warning" CPU seems to be another culprit since in a project limited to 2 CPUs, a user will frequently explicitly allocate the CPUs among different pods. Version-Release number of selected component (if applicable): v3.10 How reproducible: 100% Steps to Reproduce: 1. Setup an object limit of 1 for PVCs and use that PVC in a project Actual results: The alert is impractical for low integer values. Expected results: This 90% alert should only be a threshold for directly measured / large float values. Restrict to actual resourcequota="object-counts" with hard limits > 10 ? Additional info: I realize there are way to quiet this alert in the starter environment, but since this alert is general purpose, I wanted to suggest that it was not generally applicable in its current form.
We should only make this alert apply to OpenShift components by default, in those cases we do want to know when we are approaching 100% quota (if any are set in the first place). We need to evaluate whether we can still accomplish this for 3.11, I'm putting this into 3.11 for now, but it has low priority, compared to other issues as Alertmanager routes can be chosen to avoid the noise (not great, but a solution for the time being).
This has been fixed in this PR: https://github.com/openshift/cluster-monitoring-operator/pull/88
The PR has been merged. Please verify.
Tested with ose-cluster-monitoring-operator:v3.11.7, no KubeQuotaExceeded alert for user project in oso starter env, the alert only applied to openshift.*|kube.*|default|logging project
Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.