Bug 1943442

Summary:

Alert KubeAPIErrorBudgetBurn occurs on single node cluster some time after installation

Product:

OpenShift Container Platform

Reporter:

hongyan li <hongyli>

Component:

kube-apiserver

Assignee:

Stefan Schimanski <sttts>

Status:

CLOSED WONTFIX

QA Contact:

Rahul Gangwar <rgangwar>

Severity:

high

Docs Contact:

Priority:

medium

Version:

4.8

CC:

kewang, mfojtik, wking, xxia

Target Milestone:

---

Flags:

mfojtik: needinfo?

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

LifecycleStale

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2022-08-25 12:30:43 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
alert displays about 30 minutes	none
prometheus snapshot	none

Description hongyan li 2021-03-26 04:35:26 UTC

Created attachment 1766488 [details]
alert displays about 30 minutes

Created attachment 1766488 [details]
alert displays about 30 minutes

Created attachment 1766488 [details]
alert displays about 30 minutes

Description of problem:
Alert KubeAPIErrorBudgetBurn occurs on single node cluster some time after installation

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-03-25-191436

How reproducible:
Most time, installed 4 times and see the alert 3 times

Steps to Reproduce:
1. Install SNO cluster with cluster bot by running 'launch 4.8 aws,single-node'
2. Open console-->monitoring-->alerts

Actual results:
Alert KubeAPIErrorBudgetBurn occurs

Expected results:
There should be no alert KubeAPIErrorBudgetBurn

Additional info:
Alert display some time and will disappear

Comment 1 Stefan Schimanski 2021-03-26 08:18:17 UTC

Please provide must-gather and attach a prometheus snapshot.

This is most probably due to kube-/openshift-/oauth-apiserver redeployments. Serious improvement won't happen before 4.9. This was known to happen.

Comment 2 hongyan li 2021-03-28 03:23:32 UTC

Created attachment 1767038 [details]
prometheus snapshot

Comment 3 hongyan li 2021-03-28 07:34:35 UTC

https://drive.google.com/file/d/19spZEuFoBvzF6_raYy3C-nrYLPmdoI-W/view?usp=sharing

Comment 4 hongyan li 2021-03-28 07:35:35 UTC

must-gather.log see comment 3 https://bugzilla.redhat.com/show_bug.cgi?id=1943442#c3

Comment 5 Stefan Schimanski 2021-03-30 07:53:21 UTC

Permission denied to gdrive.

Comment 6 hongyan li 2021-03-31 01:11:43 UTC

Grant you access for the log

Comment 7 Stefan Schimanski 2021-04-12 07:49:21 UTC

*** Bug 1948089 has been marked as a duplicate of this bug. ***

Comment 8 W. Trevor King 2021-04-12 19:48:46 UTC

Linking the test-case for Sippy, based on [1].  As described in bug 1948089, this is happening in basically all 4.7 -> 4.8 updates, so it would be really great to fix it before 4.8 GAs.  I see this bug is currently blocker?.  Thoughts about making it blocker+?

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1948089#c1

Comment 9 Stefan Schimanski 2021-04-14 12:36:06 UTC

> Thoughts about making it blocker+?

This was known and continuously told all the time that the APIs will have hickups on single-node. There is no capacity to fix this. Don't turn an explicit pre-condition for feasibility of the single-node approach into a blocker+. Doesn't make sense.

Comment 10 W. Trevor King 2021-04-14 15:07:39 UTC

Dropping the test-case now that bug 1948089 has been re-opened, so this bug is back to just being about single-node.

Comment 14 Michal Fojtik 2021-12-25 03:22:28 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.