Description of problem: From [1]: $ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.6.0-fc.6-x86_64 Extracted release payload from digest sha256:933f3d6f61ddec9f3b88a0932b47c438d7dfc15ff1873ab176284b66c9cff76e created at 2020-09-14T21:50:05Z $ diff -u manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml --- manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml 2020-09-12 05:33:59.000000000 -0700 +++ manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml 2020-09-12 05:33:59.000000000 -0700 @@ -9,19 +9,11 @@ groups: - name: cluster-version rules: - - alert: PodDisruptionBudgetAtLimit + - alert: KubeControllerManagerDown annotations: - message: The pod disruption budget is preventing further disruption to pods because it is at the minimum allowed level. + message: KubeControllerManager has disappeared from Prometheus target discovery. expr: | - max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy) - for: 15m - labels: - severity: warning - - alert: PodDisruptionBudgetLimit - annotations: - message: The pod disruption budget is below the minimum number allowed pods. - expr: | - max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy) + absent(up{job="kube-controller-manager"} == 1) for: 15m labels: severity: critical I don't understand why [2,3] are using the same kind/namespace/name with different spec.groups; maybe that's ok for PrometheusRule? We've had the two separate files since [4], and the two separate YAML entries since [5]. Is the overlapping kind/namespace/name intentional? Or can we collapse to a single kind/namespace/name entries with multiple groups? [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184#c2 [2]: https://github.com/openshift/cluster-kube-controller-manager-operator/blob/9773980cbca12bfb0d5e719c13fb81b0de352efb/manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml [3]: https://github.com/openshift/cluster-kube-controller-manager-operator/blob/9773980cbca12bfb0d5e719c13fb81b0de352efb/manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml [4]: https://github.com/openshift/cluster-kube-controller-manager-operator/commit/326750ade37b48ae282074ee3cf05aef71ea5cd6 [5]: https://github.com/openshift/cluster-kube-controller-manager-operator/commit/f072caf44eb237f61c4de157bf8fe39f093f681b
Verified with the payload below and i only see one single file with all the contents in the PR present after doing an oc adm release extract. [ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-09-24-015627 True False 5h18m Cluster version is 4.6.0-0.nightly-2020-09-24-015627 [ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc version Client Version: 4.6.0-0.nightly-2020-09-24-015627 Server Version: 4.6.0-0.nightly-2020-09-24-015627 Kubernetes Version: v1.19.0+fff8183 [ramakasturinarra@dhcp35-60 manifests]$ cat 0000_90_kube-controller-manager-operator_05_alerts.yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: kube-controller-manager-operator namespace: openshift-kube-controller-manager-operator annotations: include.release.openshift.io/self-managed-high-availability: "true" exclude.release.openshift.io/internal-openshift-hosted: "true" spec: groups: - name: cluster-version rules: - alert: KubeControllerManagerDown annotations: message: KubeControllerManager has disappeared from Prometheus target discovery. expr: | absent(up{job="kube-controller-manager"} == 1) for: 15m labels: severity: critical - alert: PodDisruptionBudgetAtLimit annotations: message: The pod disruption budget is preventing further disruption to pods because it is at the minimum allowed level. expr: | max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy) for: 15m labels: severity: warning - alert: PodDisruptionBudgetLimit annotations: message: The pod disruption budget is below the minimum number allowed pods. expr: | max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy) for: 15m labels: severity: critical Based on the above moving the bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196