1881246 – Overlapping, divergent PrometheusRule manifests

Bug 1881246 - Overlapping, divergent PrometheusRule manifests

Summary: Overlapping, divergent PrometheusRule manifests

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-controller-manager
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Maciej Szulik
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-21 23:50 UTC by W. Trevor King
Modified:	2020-10-27 16:43 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-27 16:43:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-kube-controller-manager-operator pull 454	0	None	closed	Bug 1881246: squash alerts into a single file	2020-10-13 17:20:30 UTC
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 16:43:58 UTC

Description W. Trevor King 2020-09-21 23:50:54 UTC

Description of problem:

From [1]:

  $ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.6.0-fc.6-x86_64
  Extracted release payload from digest sha256:933f3d6f61ddec9f3b88a0932b47c438d7dfc15ff1873ab176284b66c9cff76e created at 2020-09-14T21:50:05Z
  $ diff -u manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml
  --- manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml	2020-09-12 05:33:59.000000000 -0700
  +++ manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml	2020-09-12 05:33:59.000000000 -0700
  @@ -9,19 +9,11 @@
     groups:
       - name: cluster-version
         rules:
  -        - alert: PodDisruptionBudgetAtLimit
  +        - alert: KubeControllerManagerDown
             annotations:
  -            message: The pod disruption budget is preventing further disruption to pods because it is at the minimum allowed level.
  +            message: KubeControllerManager has disappeared from Prometheus target discovery.
             expr: |
  -            max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy)
  -          for: 15m
  -          labels:
  -            severity: warning
  -        - alert: PodDisruptionBudgetLimit
  -          annotations:
  -            message: The pod disruption budget is below the minimum number allowed pods.
  -          expr: |
  -            max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy)
  +            absent(up{job="kube-controller-manager"} == 1)
             for: 15m
             labels:
               severity: critical

I don't understand why [2,3] are using the same kind/namespace/name with different spec.groups; maybe that's ok for PrometheusRule?  We've had the two separate files since [4], and the two separate YAML entries since [5].  Is the overlapping kind/namespace/name intentional?  Or can we collapse to a single kind/namespace/name entries with multiple groups?

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1879184#c2
[2]: https://github.com/openshift/cluster-kube-controller-manager-operator/blob/9773980cbca12bfb0d5e719c13fb81b0de352efb/manifests/0000_90_kube-controller-manager-operator_05_alert-kcm-down.yaml
[3]: https://github.com/openshift/cluster-kube-controller-manager-operator/blob/9773980cbca12bfb0d5e719c13fb81b0de352efb/manifests/0000_90_kube-controller-manager-operator_05_alert-pdb.yaml
[4]: https://github.com/openshift/cluster-kube-controller-manager-operator/commit/326750ade37b48ae282074ee3cf05aef71ea5cd6
[5]: https://github.com/openshift/cluster-kube-controller-manager-operator/commit/f072caf44eb237f61c4de157bf8fe39f093f681b

Comment 2 RamaKasturi 2020-09-24 12:55:27 UTC

Verified with the payload below and i only see one single file with all the contents in the PR present after doing an oc adm release extract.

[ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-24-015627   True        False         5h18m   Cluster version is 4.6.0-0.nightly-2020-09-24-015627
[ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc version
Client Version: 4.6.0-0.nightly-2020-09-24-015627
Server Version: 4.6.0-0.nightly-2020-09-24-015627
Kubernetes Version: v1.19.0+fff8183


[ramakasturinarra@dhcp35-60 manifests]$ cat 0000_90_kube-controller-manager-operator_05_alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kube-controller-manager-operator
  namespace: openshift-kube-controller-manager-operator
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    exclude.release.openshift.io/internal-openshift-hosted: "true"
spec:
  groups:
    - name: cluster-version
      rules:
        - alert: KubeControllerManagerDown
          annotations:
            message: KubeControllerManager has disappeared from Prometheus target discovery.
          expr: |
            absent(up{job="kube-controller-manager"} == 1)
          for: 15m
          labels:
            severity: critical
        - alert: PodDisruptionBudgetAtLimit
          annotations:
            message: The pod disruption budget is preventing further disruption to pods because it is at the minimum allowed level.
          expr: |
            max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods == kube_poddisruptionbudget_status_desired_healthy)
          for: 15m
          labels:
            severity: warning
        - alert: PodDisruptionBudgetLimit
          annotations:
            message: The pod disruption budget is below the minimum number allowed pods.
          expr: |
            max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_expected_pods < kube_poddisruptionbudget_status_desired_healthy)
          for: 15m
          labels:
            severity: critical


Based on the above moving the bug to verified state.

Comment 5 errata-xmlrpc 2020-10-27 16:43:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.