1943442 – Alert KubeAPIErrorBudgetBurn occurs on single node cluster some time after installation

Bug 1943442 - Alert KubeAPIErrorBudgetBurn occurs on single node cluster some time after installation [NEEDINFO]

Summary: Alert KubeAPIErrorBudgetBurn occurs on single node cluster some time after in...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Stefan Schimanski
QA Contact:	Rahul Gangwar
Docs Contact:
URL:
Whiteboard:	LifecycleStale
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-26 04:35 UTC by hongyan li
Modified:	2022-08-25 12:30 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-25 12:30:43 UTC
Target Upstream Version:
Embargoed:
Flags:	mfojtik: needinfo?

Attachments	(Terms of Use)
alert displays about 30 minutes (96.56 KB, image/png) 2021-03-26 04:35 UTC, hongyan li	no flags	Details
prometheus snapshot (108.01 KB, image/png) 2021-03-28 03:23 UTC, hongyan li	no flags	Details
View All

Description hongyan li 2021-03-26 04:35:26 UTC

Created attachment 1766488 [details]
alert displays about 30 minutes

Created attachment 1766488 [details]
alert displays about 30 minutes

Created attachment 1766488 [details]
alert displays about 30 minutes

Description of problem:
Alert KubeAPIErrorBudgetBurn occurs on single node cluster some time after installation

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-03-25-191436

How reproducible:
Most time, installed 4 times and see the alert 3 times

Steps to Reproduce:
1. Install SNO cluster with cluster bot by running 'launch 4.8 aws,single-node'
2. Open console-->monitoring-->alerts

Actual results:
Alert KubeAPIErrorBudgetBurn occurs

Expected results:
There should be no alert KubeAPIErrorBudgetBurn

Additional info:
Alert display some time and will disappear

Comment 1 Stefan Schimanski 2021-03-26 08:18:17 UTC

Please provide must-gather and attach a prometheus snapshot.

This is most probably due to kube-/openshift-/oauth-apiserver redeployments. Serious improvement won't happen before 4.9. This was known to happen.

Comment 2 hongyan li 2021-03-28 03:23:32 UTC

Created attachment 1767038 [details]
prometheus snapshot

Comment 3 hongyan li 2021-03-28 07:34:35 UTC

https://drive.google.com/file/d/19spZEuFoBvzF6_raYy3C-nrYLPmdoI-W/view?usp=sharing

Comment 4 hongyan li 2021-03-28 07:35:35 UTC

must-gather.log see comment 3 https://bugzilla.redhat.com/show_bug.cgi?id=1943442#c3

Comment 5 Stefan Schimanski 2021-03-30 07:53:21 UTC

Permission denied to gdrive.

Comment 6 hongyan li 2021-03-31 01:11:43 UTC

Grant you access for the log

Comment 7 Stefan Schimanski 2021-04-12 07:49:21 UTC

*** Bug 1948089 has been marked as a duplicate of this bug. ***

Comment 8 W. Trevor King 2021-04-12 19:48:46 UTC

Linking the test-case for Sippy, based on [1].  As described in bug 1948089, this is happening in basically all 4.7 -> 4.8 updates, so it would be really great to fix it before 4.8 GAs.  I see this bug is currently blocker?.  Thoughts about making it blocker+?

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1948089#c1

Comment 9 Stefan Schimanski 2021-04-14 12:36:06 UTC

> Thoughts about making it blocker+?

This was known and continuously told all the time that the APIs will have hickups on single-node. There is no capacity to fix this. Don't turn an explicit pre-condition for feasibility of the single-node approach into a blocker+. Doesn't make sense.

Comment 10 W. Trevor King 2021-04-14 15:07:39 UTC

Dropping the test-case now that bug 1948089 has been re-opened, so this bug is back to just being about single-node.

Comment 14 Michal Fojtik 2021-12-25 03:22:28 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Note You need to log in before you can comment on or make changes to this bug.