Bug 1943442

Summary: Alert KubeAPIErrorBudgetBurn occurs on single node cluster some time after installation
Product: OpenShift Container Platform Reporter: hongyan li <hongyli>
Component: kube-apiserverAssignee: Stefan Schimanski <sttts>
Status: CLOSED WONTFIX QA Contact: Rahul Gangwar <rgangwar>
Severity: high Docs Contact:
Priority: medium    
Version: 4.8CC: kewang, mfojtik, wking, xxia
Target Milestone: ---Flags: mfojtik: needinfo?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: LifecycleStale
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-25 12:30:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
alert displays about 30 minutes
none
prometheus snapshot none

Description hongyan li 2021-03-26 04:35:26 UTC
Created attachment 1766488 [details]
alert displays about 30 minutes

Created attachment 1766488 [details]
alert displays about 30 minutes

Created attachment 1766488 [details]
alert displays about 30 minutes

Description of problem:
Alert KubeAPIErrorBudgetBurn occurs on single node cluster some time after installation

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-03-25-191436

How reproducible:
Most time, installed 4 times and see the alert 3 times

Steps to Reproduce:
1. Install SNO cluster with cluster bot by running 'launch 4.8 aws,single-node'
2. Open console-->monitoring-->alerts

Actual results:
Alert KubeAPIErrorBudgetBurn occurs

Expected results:
There should be no alert KubeAPIErrorBudgetBurn

Additional info:
Alert display some time and will disappear

Comment 1 Stefan Schimanski 2021-03-26 08:18:17 UTC
Please provide must-gather and attach a prometheus snapshot.

This is most probably due to kube-/openshift-/oauth-apiserver redeployments. Serious improvement won't happen before 4.9. This was known to happen.

Comment 2 hongyan li 2021-03-28 03:23:32 UTC
Created attachment 1767038 [details]
prometheus snapshot

Comment 4 hongyan li 2021-03-28 07:35:35 UTC
must-gather.log see comment 3 https://bugzilla.redhat.com/show_bug.cgi?id=1943442#c3

Comment 5 Stefan Schimanski 2021-03-30 07:53:21 UTC
Permission denied to gdrive.

Comment 6 hongyan li 2021-03-31 01:11:43 UTC
Grant you access for the log

Comment 7 Stefan Schimanski 2021-04-12 07:49:21 UTC
*** Bug 1948089 has been marked as a duplicate of this bug. ***

Comment 8 W. Trevor King 2021-04-12 19:48:46 UTC
Linking the test-case for Sippy, based on [1].  As described in bug 1948089, this is happening in basically all 4.7 -> 4.8 updates, so it would be really great to fix it before 4.8 GAs.  I see this bug is currently blocker?.  Thoughts about making it blocker+?

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1948089#c1

Comment 9 Stefan Schimanski 2021-04-14 12:36:06 UTC
> Thoughts about making it blocker+?

This was known and continuously told all the time that the APIs will have hickups on single-node. There is no capacity to fix this. Don't turn an explicit pre-condition for feasibility of the single-node approach into a blocker+. Doesn't make sense.

Comment 10 W. Trevor King 2021-04-14 15:07:39 UTC
Dropping the test-case now that bug 1948089 has been re-opened, so this bug is back to just being about single-node.

Comment 14 Michal Fojtik 2021-12-25 03:22:28 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.