Bug 1845444 - KubeApiLatency alert firing even though not all conditions are matched
Summary: KubeApiLatency alert firing even though not all conditions are matched
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.4
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.4.z
Assignee: Rick Rackow
QA Contact: Junqi Zhao
URL:
Whiteboard: wip
Depends On: 1845445
Blocks: 1845443
TreeView+ depends on / blocked
 
Reported: 2020-06-09 09:15 UTC by Rick Rackow
Modified: 2020-07-28 12:37 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1845443
: 1845445 (view as bug list)
Environment:
Last Closed: 2020-07-28 12:37:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 809 0 None closed Bug 1845444: KubeApiLatency alert firing even though not all conditions are matched 2021-02-09 03:15:57 UTC
Red Hat Product Errata RHBA-2020:3075 0 None None None 2020-07-28 12:37:28 UTC

Description Rick Rackow 2020-06-09 09:15:53 UTC
+++ This bug was initially created as a clone of Bug #1845443 +++

Description of problem:

KubeApiLatencyHigh warning should only be firing if all conditions are met AND the latency is over 1s.
However we have seen this fire with

```
The API server has an abnormal latency of 0.05685404799999992 seconds for PUT namespace
```

Version-Release number of selected component (if applicable):
OpenShift Dedicated 4.3.18

How reproducible:
Partially

Steps to Reproduce:
1. Execute alerting rule in Prometheus to graph
2. Scroll out until you find an occurence

Actual results:
0.05685404799999992

Expected results:
>1

Additional info:
This can as well be fixed by adjusting the message to be something more meaningful

Comment 1 Rick Rackow 2020-07-09 11:58:58 UTC
Blocked because we're waiting for https://bugzilla.redhat.com/show_bug.cgi?id=1845445

Comment 6 Junqi Zhao 2020-07-20 03:15:19 UTC
tested with 4.4.0-0.nightly-2020-07-18-033102,KubeAPILatencyHigh warning alert details see below, and there is not such alert in the cluster
*************************************************
  - alert: KubeAPILatencyHigh
    annotations:
      message: The API server has an abnormal latency of {{ $value }} seconds for
        {{ $labels.verb }} {{ $labels.resource }}.
    expr: |
      cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job="apiserver",quantile="0.99"}
      >
      1
      and on (verb,resource)
      (
        cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"}
        >
        on (verb) group_left()
        (
          avg by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0)
          +
          2*stddev by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0)
        )
      ) > on (verb) group_left()
      1.2 * avg by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0)
    for: 5m
    labels:
      severity: warning

Comment 8 errata-xmlrpc 2020-07-28 12:37:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3075


Note You need to log in before you can comment on or make changes to this bug.