Bug 1985447

Summary:	KubeAPIErrorBudgetBurn Missing namespace label
Product:	OpenShift Container Platform	Reporter:	Rick Rackow <rrackow>
Component:	kube-apiserver	Assignee:	Christoph Blecker <cblecker>
Status:	CLOSED ERRATA	QA Contact:	Ke Wang <kewang>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.7	CC:	amuller, anpicker, aos-bugs, cblecker, erooth, mfojtik, sttts, xxia
Target Milestone:	---	Keywords:	Reopened, ServiceDeliveryBlocker
Target Release:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-10-18 17:40:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Rick Rackow 2021-07-23 15:02:17 UTC

Description of problem:
The alert KubeAPIErrorBudgetBurn is missing a namespace label. All alerts should come with that label. Given that this is the highest priority alert that exists on the cluster, this is crucial to OSD to exist

Version-Release number of selected component (if applicable):
4.7.4

How reproducible:
always

Steps to Reproduce:
1. Create Cluster
2. Cause API failure condition
3. observe missing label

Actual results:
label `namespace` does not exist

Expected results:
label `namespace` exists on the alert with the value of the originating namespace

Additional info:
Additional info in follow up comments

Comment 2 Simon Pasquier 2021-07-23 15:09:05 UTC

Moving to the API team since they own the alert starting with 4.8.

Comment 3 Michal Fojtik 2021-07-23 15:23:02 UTC

** A NOTE ABOUT USING URGENT **

This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold.
Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

Comment 4 Christoph Blecker 2021-07-23 21:22:16 UTC

High severity/priority is probably correct for this one. The fix is straight forward, so I'm going to open a PR for it.

Once fixed in HEAD, we would like to request this be cloned/backported all the way to 4.7, as to correct the functionality of this alert.

Comment 6 Xingxing Xia 2021-08-27 10:39:37 UTC

Tried some ways to make the KubeAPIErrorBudgetBurn alert fire, e.g. make kube-apiserver down, but didn't see it.
Anyway, the PR is simple, `oc get PrometheusRule -n openshift-kube-apiserver kube-apiserver-slos -o yaml` in latest payload can see the change; as per this, could directly move this bug to VERIFIED.
But before doing this, I like to confirm: `oc get PrometheusRule -n openshift-kube-apiserver` shows others alerts have not this label. Aren't they crucial? Won't some day in future you maybe also want to add this label to them? If they are crucial too, should they be added same label?

Comment 7 Xingxing Xia 2021-08-30 12:38:58 UTC

Assigning back for above confirmation before moving to VERIFIED. Thanks!

Comment 8 Stefan Schimanski 2021-08-30 13:32:50 UTC

We cannot add unbounded labels to metrics and hence to alerts. If you need this info, use audit logs.

Comment 9 Stefan Schimanski 2021-08-30 13:33:54 UTC

Sorry, misunderstood the bug. Moving back.

Comment 10 Xingxing Xia 2021-08-31 02:08:25 UTC

Continuing comment 7

Comment 11 Christoph Blecker 2021-08-31 17:30:01 UTC

I've opened https://github.com/openshift/cluster-kube-apiserver-operator/pull/1220 for the other alerts

Comment 16 Xingxing Xia 2021-09-01 12:33:49 UTC

LG.

Comment 21 errata-xmlrpc 2021-10-18 17:40:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759