Description of problem: The alert KubeAPIErrorBudgetBurn is missing a namespace label. All alerts should come with that label. Given that this is the highest priority alert that exists on the cluster, this is crucial to OSD to exist Version-Release number of selected component (if applicable): 4.7.4 How reproducible: always Steps to Reproduce: 1. Create Cluster 2. Cause API failure condition 3. observe missing label Actual results: label `namespace` does not exist Expected results: label `namespace` exists on the alert with the value of the originating namespace Additional info: Additional info in follow up comments
Moving to the API team since they own the alert starting with 4.8.
** A NOTE ABOUT USING URGENT ** This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold. Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility. NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.
High severity/priority is probably correct for this one. The fix is straight forward, so I'm going to open a PR for it. Once fixed in HEAD, we would like to request this be cloned/backported all the way to 4.7, as to correct the functionality of this alert.
Tried some ways to make the KubeAPIErrorBudgetBurn alert fire, e.g. make kube-apiserver down, but didn't see it. Anyway, the PR is simple, `oc get PrometheusRule -n openshift-kube-apiserver kube-apiserver-slos -o yaml` in latest payload can see the change; as per this, could directly move this bug to VERIFIED. But before doing this, I like to confirm: `oc get PrometheusRule -n openshift-kube-apiserver` shows others alerts have not this label. Aren't they crucial? Won't some day in future you maybe also want to add this label to them? If they are crucial too, should they be added same label?
Assigning back for above confirmation before moving to VERIFIED. Thanks!
We cannot add unbounded labels to metrics and hence to alerts. If you need this info, use audit logs.
Sorry, misunderstood the bug. Moving back.
Continuing comment 7
I've opened https://github.com/openshift/cluster-kube-apiserver-operator/pull/1220 for the other alerts
LG.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759