Created attachment 1766488 [details] alert displays about 30 minutes Created attachment 1766488 [details] alert displays about 30 minutes Created attachment 1766488 [details] alert displays about 30 minutes Description of problem: Alert KubeAPIErrorBudgetBurn occurs on single node cluster some time after installation Version-Release number of selected component (if applicable): 4.8.0-0.nightly-2021-03-25-191436 How reproducible: Most time, installed 4 times and see the alert 3 times Steps to Reproduce: 1. Install SNO cluster with cluster bot by running 'launch 4.8 aws,single-node' 2. Open console-->monitoring-->alerts Actual results: Alert KubeAPIErrorBudgetBurn occurs Expected results: There should be no alert KubeAPIErrorBudgetBurn Additional info: Alert display some time and will disappear
Please provide must-gather and attach a prometheus snapshot. This is most probably due to kube-/openshift-/oauth-apiserver redeployments. Serious improvement won't happen before 4.9. This was known to happen.
Created attachment 1767038 [details] prometheus snapshot
https://drive.google.com/file/d/19spZEuFoBvzF6_raYy3C-nrYLPmdoI-W/view?usp=sharing
must-gather.log see comment 3 https://bugzilla.redhat.com/show_bug.cgi?id=1943442#c3
Permission denied to gdrive.
Grant you access for the log
*** Bug 1948089 has been marked as a duplicate of this bug. ***
Linking the test-case for Sippy, based on [1]. As described in bug 1948089, this is happening in basically all 4.7 -> 4.8 updates, so it would be really great to fix it before 4.8 GAs. I see this bug is currently blocker?. Thoughts about making it blocker+? [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1948089#c1
> Thoughts about making it blocker+? This was known and continuously told all the time that the APIs will have hickups on single-node. There is no capacity to fix this. Don't turn an explicit pre-condition for feasibility of the single-node approach into a blocker+. Doesn't make sense.
Dropping the test-case now that bug 1948089 has been re-opened, so this bug is back to just being about single-node.
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.