Bug 1985447

Summary: KubeAPIErrorBudgetBurn Missing namespace label
Product: OpenShift Container Platform Reporter: Rick Rackow <rrackow>
Component: kube-apiserverAssignee: Christoph Blecker <cblecker>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: high    
Version: 4.7CC: amuller, anpicker, aos-bugs, cblecker, erooth, mfojtik, sttts, xxia
Target Milestone: ---Keywords: Reopened, ServiceDeliveryBlocker
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:40:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rick Rackow 2021-07-23 15:02:17 UTC
Description of problem:
The alert KubeAPIErrorBudgetBurn is missing a namespace label. All alerts should come with that label. Given that this is the highest priority alert that exists on the cluster, this is crucial to OSD to exist

Version-Release number of selected component (if applicable):
4.7.4

How reproducible:
always

Steps to Reproduce:
1. Create Cluster
2. Cause API failure condition
3. observe missing label

Actual results:
label `namespace` does not exist

Expected results:
label `namespace` exists on the alert with the value of the originating namespace

Additional info:
Additional info in follow up comments

Comment 2 Simon Pasquier 2021-07-23 15:09:05 UTC
Moving to the API team since they own the alert starting with 4.8.

Comment 3 Michal Fojtik 2021-07-23 15:23:02 UTC
** A NOTE ABOUT USING URGENT **

This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold.
Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

Comment 4 Christoph Blecker 2021-07-23 21:22:16 UTC
High severity/priority is probably correct for this one. The fix is straight forward, so I'm going to open a PR for it.

Once fixed in HEAD, we would like to request this be cloned/backported all the way to 4.7, as to correct the functionality of this alert.

Comment 6 Xingxing Xia 2021-08-27 10:39:37 UTC
Tried some ways to make the KubeAPIErrorBudgetBurn alert fire, e.g. make kube-apiserver down, but didn't see it.
Anyway, the PR is simple, `oc get PrometheusRule -n openshift-kube-apiserver kube-apiserver-slos -o yaml` in latest payload can see the change; as per this, could directly move this bug to VERIFIED.
But before doing this, I like to confirm: `oc get PrometheusRule -n openshift-kube-apiserver` shows others alerts have not this label. Aren't they crucial? Won't some day in future you maybe also want to add this label to them? If they are crucial too, should they be added same label?

Comment 7 Xingxing Xia 2021-08-30 12:38:58 UTC
Assigning back for above confirmation before moving to VERIFIED. Thanks!

Comment 8 Stefan Schimanski 2021-08-30 13:32:50 UTC
We cannot add unbounded labels to metrics and hence to alerts. If you need this info, use audit logs.

Comment 9 Stefan Schimanski 2021-08-30 13:33:54 UTC
Sorry, misunderstood the bug. Moving back.

Comment 10 Xingxing Xia 2021-08-31 02:08:25 UTC
Continuing comment 7

Comment 11 Christoph Blecker 2021-08-31 17:30:01 UTC
I've opened https://github.com/openshift/cluster-kube-apiserver-operator/pull/1220 for the other alerts

Comment 16 Xingxing Xia 2021-09-01 12:33:49 UTC
LG.

Comment 21 errata-xmlrpc 2021-10-18 17:40:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759