Bug 1944974 - Duplicate KubeControllerManagerDown/KubeSchedulerDown alerts
Summary: Duplicate KubeControllerManagerDown/KubeSchedulerDown alerts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.8.0
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-31 06:28 UTC by Junqi Zhao
Modified: 2021-07-27 22:57 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:56:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1098 0 None open Bug 1944974: remove KubeControllerManagerDown and KubeSchedulerDown alerts 2021-03-31 07:37:37 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:57:03 UTC

Description Junqi Zhao 2021-03-31 06:28:15 UTC
Description of problem:
there is KubeControllerManagerDown alert rule in kube-controller-manager-operator prometheusrules of openshift-kube-controller-manager-operator project,
and KubeSchedulerDown alert rule in kube-scheduler-operator prometheusrules of openshift-kube-scheduler-operator project.

Both KubeControllerManagerDown/KubeSchedulerDown alert rules also exist in kubernetes-monitoring-rules prometheusrules of openshift-monitoring project.
the only difference is the annotations part

# oc -n openshift-kube-controller-manager-operator get prometheusrules kube-controller-manager-operator -oyaml
...
spec:
  groups:
  - name: cluster-version
    rules:
    - alert: KubeControllerManagerDown
      annotations:
        message: KubeControllerManager has disappeared from Prometheus target discovery.
      expr: |
        absent(up{job="kube-controller-manager"} == 1)
      for: 15m
      labels:
        severity: critical
...

# oc -n openshift-monitoring get prometheusrules kubernetes-monitoring-rules -oyaml
...
  - name: kubernetes-system-controller-manager
    rules:
    - alert: KubeControllerManagerDown
      annotations:
        description: KubeControllerManager has disappeared from Prometheus target discovery.
        summary: Target disappeared from Prometheus target discovery.
      expr: |
        absent(up{job="kube-controller-manager"} == 1)
      for: 15m
      labels:
        severity: critical
*********************************************************

# oc -n openshift-kube-scheduler-operator get prometheusrules kube-scheduler-operator -oyaml
...
spec:
  groups:
  - name: cluster-version
    rules:
    - alert: KubeSchedulerDown
      annotations:
        message: KubeScheduler has disappeared from Prometheus target discovery.
      expr: |
        absent(up{job="scheduler"} == 1)
      for: 15m
      labels:
        severity: critical
...

# oc -n openshift-monitoring get prometheusrules kubernetes-monitoring-rules -oyaml
...
  - name: kubernetes-system-scheduler
    rules:
    - alert: KubeSchedulerDown
      annotations:
        description: KubeScheduler has disappeared from Prometheus target discovery.
        summary: Target disappeared from Prometheus target discovery.
      expr: |
        absent(up{job="scheduler"} == 1)
      for: 15m
      labels:
        severity: critical
...
Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-03-30-160509

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:
Duplicate KubeControllerManagerDown/KubeSchedulerDown alerts

Expected results:
Should not have duplicate alerts

Additional info:

Comment 2 Junqi Zhao 2021-04-02 06:07:34 UTC
tested with 4.8.0-0.nightly-2021-04-01-213116, KubeControllerManagerDown and KubeSchedulerDown are removed from openshift-monitoring prometheusrules file

Comment 5 errata-xmlrpc 2021-07-27 22:56:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.