Bug 1889689 - AggregatedAPIErrors alert may never fire
Summary: AggregatedAPIErrors alert may never fire
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.7.0
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
Depends On:
TreeView+ depends on / blocked
Reported: 2020-10-20 11:39 UTC by Sergiusz Urbaniak
Modified: 2020-11-17 00:31 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

Description Sergiusz Urbaniak 2020-10-20 11:39:13 UTC
Investigating metrics changes in k8s 1.19 revealed that aggregator_unavailable_apiservice_count metric was renamed to aggregator_unavailable_apiservice_total which is used in our stack in the "AggregatedAPIErrors" alert: https://github.com/openshift/cluster-monitoring-operator/blob/57a33cb45dc97d23f0b77885c2acd10fd8b60717/assets/prometheus-k8s/rules.yaml#L1680-L1687

We need to fix upstream, vendor downstream and backport to 4.6.

As this is a symptom based alert with alerting severity warning I am setting the BZ severity to low (not release blocking).

Comment 2 Sergiusz Urbaniak 2020-11-13 09:03:43 UTC
UpcomingSprint: We don't have enough capacity to tackle this one in the next sprint (193).

Note You need to log in before you can comment on or make changes to this bug.