Bug 2001409
Summary: | All critical alerts should have links to a runbook | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | hongyan li <hongyli> |
Component: | kube-controller-manager | Assignee: | Filip Krepinsky <fkrepins> |
Status: | CLOSED ERRATA | QA Contact: | zhou ying <yinzhou> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 4.9 | CC: | maszulik, mfojtik, spasquie, stevsmit |
Target Milestone: | --- | Flags: | mfojtik:
needinfo?
|
Target Release: | 4.12.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | LifecycleStale | ||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: |
Feature:
kube-controller-manager alerts (KubeControllerManagerDown, PodDisruptionBudgetAtLimit, PodDisruptionBudgetLimit, GarbageCollectorSyncFailed) now have links to github runbooks.
Reason:
The runbooks help with understanding and debugging these alerts.
* With this update, `kube-controller-manager` alerts (`KubeControllerManagerDown`, `PodDisruptionBudgetAtLimit`, `PodDisruptionBudgetLimit`, and `GarbageCollectorSyncFailed`) have links to Github runbooks. The runbooks help users to understand debug these alerts. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2001409[*BZ#2001409*])
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2023-01-17 19:46:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2114580 |
Description
hongyan li
2021-09-06 03:11:59 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that. This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that. This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-08-15-150248 True False 24m Cluster version is 4.12.0-0.nightly-2022-08-15-150248 oc -n openshift-kube-controller-manager-operator get prometheusrules kube-controller-manager-operator -oyaml|grep -B10 critical annotations: description: KubeControllerManager has disappeared from Prometheus target discovery. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-controller-manager-operator/KubeControllerManagerDown.md summary: Target disappeared from Prometheus target discovery. expr: | absent(up{job="kube-controller-manager"} == 1) for: 15m labels: namespace: openshift-kube-controller-manager severity: critical -- annotations: description: The pod disruption budget is below the minimum disruptions allowed level and is not satisfied. The number of current healthy pods is less than the desired healthy pods. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-controller-manager-operator/PodDisruptionBudgetLimit.md summary: The pod disruption budget registers insufficient amount of pods. expr: | max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_current_healthy < kube_poddisruptionbudget_status_desired_healthy) for: 15m labels: severity: critical Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |