Bug 1845445 - KubeApiLatency alert firing even though not all conditions are matched
Summary: KubeApiLatency alert firing even though not all conditions are matched
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.z
Assignee: Rick Rackow
QA Contact: Junqi Zhao
URL:
Whiteboard: wip
Depends On: 1845446
Blocks: 1845444
TreeView+ depends on / blocked
 
Reported: 2020-06-09 09:16 UTC by Rick Rackow
Modified: 2020-07-16 16:12 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1845444
: 1845446 (view as bug list)
Environment:
Last Closed: 2020-07-16 16:12:24 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 805 None closed Bug 1845445: KubeApiLatency alert firing even though not all conditions are matched 2020-09-14 13:29:31 UTC
Red Hat Product Errata RHBA-2020:2909 None None None 2020-07-16 16:12:45 UTC

Description Rick Rackow 2020-06-09 09:16:29 UTC
+++ This bug was initially created as a clone of Bug #1845444 +++

+++ This bug was initially created as a clone of Bug #1845443 +++

Description of problem:

KubeApiLatencyHigh warning should only be firing if all conditions are met AND the latency is over 1s.
However we have seen this fire with

```
The API server has an abnormal latency of 0.05685404799999992 seconds for PUT namespace
```

Version-Release number of selected component (if applicable):
OpenShift Dedicated 4.3.18

How reproducible:
Partially

Steps to Reproduce:
1. Execute alerting rule in Prometheus to graph
2. Scroll out until you find an occurence

Actual results:
0.05685404799999992

Expected results:
>1

Additional info:
This can as well be fixed by adjusting the message to be something more meaningful

Comment 8 Junqi Zhao 2020-07-14 06:17:30 UTC
tested with 4.5.0-0.nightly-2020-07-14-022827, KubeAPILatencyHigh alert details see below, and there is not such alert in the cluster
*************************************************
  - alert: KubeAPILatencyHigh
    annotations:
      message: The API server has an abnormal latency of {{ $value }} seconds for
        {{ $labels.verb }} {{ $labels.resource }}.
    expr: |
      cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job="apiserver",quantile="0.99"}
      >
      1
      and on (verb,resource)
      (
        cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"}
        >
        on (verb) group_left()
        (
          avg by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0)
          +
          2*stddev by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0)
        )
      ) > on (verb) group_left()
      1.2 * avg by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0)
    for: 5m
    labels:
      severity: warning

Comment 10 errata-xmlrpc 2020-07-16 16:12:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2909


Note You need to log in before you can comment on or make changes to this bug.