Bug 1881246 - Overlapping, divergent PrometheusRule manifests
Summary: Overlapping, divergent PrometheusRule manifests
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.6.0
Assignee: Maciej Szulik
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-21 23:50 UTC by W. Trevor King
Modified: 2020-10-27 16:43 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:43:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-controller-manager-operator pull 454 0 None closed Bug 1881246: squash alerts into a single file 2020-10-13 17:20:30 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:43:58 UTC

Description W. Trevor King 2020-09-21 23:50:54 UTC
Description of problem:

From [1]:

  $ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.6.0-fc.6-x86_64
  Extracted release payload from digest sha256:933f3d6f61ddec9f3b88a0932b47c438d7dfc15ff1873ab176284b66c9cff76e created at 2020-09-14T21:50:05Z
  $ diff -u manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml
  --- manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml	2020-09-12 05:33:59.000000000 -0700
  +++ manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml	2020-09-12 05:33:59.000000000 -0700
  @@ -9,19 +9,11 @@
     groups:
       - name: cluster-version
         rules:
  -        - alert: PodDisruptionBudgetAtLimit
  +        - alert: KubeControllerManagerDown
             annotations:
  -            message: The pod disruption budget is preventing further disruption to pods because it is at the minimum allowed level.
  +            message: KubeControllerManager has disappeared from Prometheus target discovery.
             expr: |
  -            max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy)
  -          for: 15m
  -          labels:
  -            severity: warning
  -        - alert: PodDisruptionBudgetLimit
  -          annotations:
  -            message: The pod disruption budget is below the minimum number allowed pods.
  -          expr: |
  -            max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy)
  +            absent(up{job="kube-controller-manager"} == 1)
             for: 15m
             labels:
               severity: critical

I don't understand why [2,3] are using the same kind/namespace/name with different spec.groups; maybe that's ok for PrometheusRule?  We've had the two separate files since [4], and the two separate YAML entries since [5].  Is the overlapping kind/namespace/name intentional?  Or can we collapse to a single kind/namespace/name entries with multiple groups?

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184#c2
[2]: https://github.com/openshift/cluster-kube-controller-manager-operator/blob/9773980cbca12bfb0d5e719c13fb81b0de352efb/manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml
[3]: https://github.com/openshift/cluster-kube-controller-manager-operator/blob/9773980cbca12bfb0d5e719c13fb81b0de352efb/manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml
[4]: https://github.com/openshift/cluster-kube-controller-manager-operator/commit/326750ade37b48ae282074ee3cf05aef71ea5cd6
[5]: https://github.com/openshift/cluster-kube-controller-manager-operator/commit/f072caf44eb237f61c4de157bf8fe39f093f681b

Comment 2 RamaKasturi 2020-09-24 12:55:27 UTC
Verified with the payload below and i only see one single file with all the contents in the PR present after doing an oc adm release extract.

[ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-24-015627   True        False         5h18m   Cluster version is 4.6.0-0.nightly-2020-09-24-015627
[ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc version
Client Version: 4.6.0-0.nightly-2020-09-24-015627
Server Version: 4.6.0-0.nightly-2020-09-24-015627
Kubernetes Version: v1.19.0+fff8183


[ramakasturinarra@dhcp35-60 manifests]$ cat 0000_90_kube-controller-manager-operator_05_alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kube-controller-manager-operator
  namespace: openshift-kube-controller-manager-operator
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    exclude.release.openshift.io/internal-openshift-hosted: "true"
spec:
  groups:
    - name: cluster-version
      rules:
        - alert: KubeControllerManagerDown
          annotations:
            message: KubeControllerManager has disappeared from Prometheus target discovery.
          expr: |
            absent(up{job="kube-controller-manager"} == 1)
          for: 15m
          labels:
            severity: critical
        - alert: PodDisruptionBudgetAtLimit
          annotations:
            message: The pod disruption budget is preventing further disruption to pods because it is at the minimum allowed level.
          expr: |
            max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy)
          for: 15m
          labels:
            severity: warning
        - alert: PodDisruptionBudgetLimit
          annotations:
            message: The pod disruption budget is below the minimum number allowed pods.
          expr: |
            max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy)
          for: 15m
          labels:
            severity: critical


Based on the above moving the bug to verified state.

Comment 5 errata-xmlrpc 2020-10-27 16:43:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.