Bug 1985447 - KubeAPIErrorBudgetBurn Missing namespace label
Summary: KubeAPIErrorBudgetBurn Missing namespace label
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.0
Assignee: Christoph Blecker
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-23 15:02 UTC by Rick Rackow
Modified: 2021-10-18 17:41 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:40:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 1185 0 None Merged Bug 1985447: Add namespace labels to kube-apiserver-operator alerts 2022-03-21 18:43:25 UTC
Github openshift cluster-kube-apiserver-operator pull 1220 0 None Merged Bug 1985447: Add namespace label to remaining apiserver alerts 2022-03-21 18:43:27 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:41:15 UTC

Description Rick Rackow 2021-07-23 15:02:17 UTC
Description of problem:
The alert KubeAPIErrorBudgetBurn is missing a namespace label. All alerts should come with that label. Given that this is the highest priority alert that exists on the cluster, this is crucial to OSD to exist

Version-Release number of selected component (if applicable):
4.7.4

How reproducible:
always

Steps to Reproduce:
1. Create Cluster
2. Cause API failure condition
3. observe missing label

Actual results:
label `namespace` does not exist

Expected results:
label `namespace` exists on the alert with the value of the originating namespace

Additional info:
Additional info in follow up comments

Comment 2 Simon Pasquier 2021-07-23 15:09:05 UTC
Moving to the API team since they own the alert starting with 4.8.

Comment 3 Michal Fojtik 2021-07-23 15:23:02 UTC
** A NOTE ABOUT USING URGENT **

This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold.
Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

Comment 4 Christoph Blecker 2021-07-23 21:22:16 UTC
High severity/priority is probably correct for this one. The fix is straight forward, so I'm going to open a PR for it.

Once fixed in HEAD, we would like to request this be cloned/backported all the way to 4.7, as to correct the functionality of this alert.

Comment 6 Xingxing Xia 2021-08-27 10:39:37 UTC
Tried some ways to make the KubeAPIErrorBudgetBurn alert fire, e.g. make kube-apiserver down, but didn't see it.
Anyway, the PR is simple, `oc get PrometheusRule -n openshift-kube-apiserver kube-apiserver-slos -o yaml` in latest payload can see the change; as per this, could directly move this bug to VERIFIED.
But before doing this, I like to confirm: `oc get PrometheusRule -n openshift-kube-apiserver` shows others alerts have not this label. Aren't they crucial? Won't some day in future you maybe also want to add this label to them? If they are crucial too, should they be added same label?

Comment 7 Xingxing Xia 2021-08-30 12:38:58 UTC
Assigning back for above confirmation before moving to VERIFIED. Thanks!

Comment 8 Stefan Schimanski 2021-08-30 13:32:50 UTC
We cannot add unbounded labels to metrics and hence to alerts. If you need this info, use audit logs.

Comment 9 Stefan Schimanski 2021-08-30 13:33:54 UTC
Sorry, misunderstood the bug. Moving back.

Comment 10 Xingxing Xia 2021-08-31 02:08:25 UTC
Continuing comment 7

Comment 11 Christoph Blecker 2021-08-31 17:30:01 UTC
I've opened https://github.com/openshift/cluster-kube-apiserver-operator/pull/1220 for the other alerts

Comment 16 Xingxing Xia 2021-09-01 12:33:49 UTC
LG.

Comment 21 errata-xmlrpc 2021-10-18 17:40:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.