Bug 1881246
| Summary: | Overlapping, divergent PrometheusRule manifests | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
| Component: | kube-controller-manager | Assignee: | Maciej Szulik <maszulik> |
| Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.6 | CC: | aos-bugs, knarra, mfojtik |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-27 16:43:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Verified with the payload below and i only see one single file with all the contents in the PR present after doing an oc adm release extract.
[ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-0.nightly-2020-09-24-015627 True False 5h18m Cluster version is 4.6.0-0.nightly-2020-09-24-015627
[ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc version
Client Version: 4.6.0-0.nightly-2020-09-24-015627
Server Version: 4.6.0-0.nightly-2020-09-24-015627
Kubernetes Version: v1.19.0+fff8183
[ramakasturinarra@dhcp35-60 manifests]$ cat 0000_90_kube-controller-manager-operator_05_alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kube-controller-manager-operator
namespace: openshift-kube-controller-manager-operator
annotations:
include.release.openshift.io/self-managed-high-availability: "true"
exclude.release.openshift.io/internal-openshift-hosted: "true"
spec:
groups:
- name: cluster-version
rules:
- alert: KubeControllerManagerDown
annotations:
message: KubeControllerManager has disappeared from Prometheus target discovery.
expr: |
absent(up{job="kube-controller-manager"} == 1)
for: 15m
labels:
severity: critical
- alert: PodDisruptionBudgetAtLimit
annotations:
message: The pod disruption budget is preventing further disruption to pods because it is at the minimum allowed level.
expr: |
max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy)
for: 15m
labels:
severity: warning
- alert: PodDisruptionBudgetLimit
annotations:
message: The pod disruption budget is below the minimum number allowed pods.
expr: |
max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy)
for: 15m
labels:
severity: critical
Based on the above moving the bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |
Description of problem: From [1]: $ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.6.0-fc.6-x86_64 Extracted release payload from digest sha256:933f3d6f61ddec9f3b88a0932b47c438d7dfc15ff1873ab176284b66c9cff76e created at 2020-09-14T21:50:05Z $ diff -u manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml --- manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml 2020-09-12 05:33:59.000000000 -0700 +++ manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml 2020-09-12 05:33:59.000000000 -0700 @@ -9,19 +9,11 @@ groups: - name: cluster-version rules: - - alert: PodDisruptionBudgetAtLimit + - alert: KubeControllerManagerDown annotations: - message: The pod disruption budget is preventing further disruption to pods because it is at the minimum allowed level. + message: KubeControllerManager has disappeared from Prometheus target discovery. expr: | - max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy) - for: 15m - labels: - severity: warning - - alert: PodDisruptionBudgetLimit - annotations: - message: The pod disruption budget is below the minimum number allowed pods. - expr: | - max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy) + absent(up{job="kube-controller-manager"} == 1) for: 15m labels: severity: critical I don't understand why [2,3] are using the same kind/namespace/name with different spec.groups; maybe that's ok for PrometheusRule? We've had the two separate files since [4], and the two separate YAML entries since [5]. Is the overlapping kind/namespace/name intentional? Or can we collapse to a single kind/namespace/name entries with multiple groups? [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184#c2 [2]: https://github.com/openshift/cluster-kube-controller-manager-operator/blob/9773980cbca12bfb0d5e719c13fb81b0de352efb/manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml [3]: https://github.com/openshift/cluster-kube-controller-manager-operator/blob/9773980cbca12bfb0d5e719c13fb81b0de352efb/manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml [4]: https://github.com/openshift/cluster-kube-controller-manager-operator/commit/326750ade37b48ae282074ee3cf05aef71ea5cd6 [5]: https://github.com/openshift/cluster-kube-controller-manager-operator/commit/f072caf44eb237f61c4de157bf8fe39f093f681b