Bug 1845444

Summary: KubeApiLatency alert firing even though not all conditions are matched
Product: OpenShift Container Platform Reporter: Rick Rackow <rrackow>
Component: MonitoringAssignee: Rick Rackow <rrackow>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: low    
Version: 4.4CC: alegrand, anpicker, erooth, juzhao, kakkoyun, lcosic, mloibl, pkrupa, surbania, travi, wking
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: wip
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1845443
: 1845445 (view as bug list) Environment:
Last Closed: 2020-07-28 12:37:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1845445    
Bug Blocks: 1845443    

Description Rick Rackow 2020-06-09 09:15:53 UTC
+++ This bug was initially created as a clone of Bug #1845443 +++

Description of problem:

KubeApiLatencyHigh warning should only be firing if all conditions are met AND the latency is over 1s.
However we have seen this fire with

```
The API server has an abnormal latency of 0.05685404799999992 seconds for PUT namespace
```

Version-Release number of selected component (if applicable):
OpenShift Dedicated 4.3.18

How reproducible:
Partially

Steps to Reproduce:
1. Execute alerting rule in Prometheus to graph
2. Scroll out until you find an occurence

Actual results:
0.05685404799999992

Expected results:
>1

Additional info:
This can as well be fixed by adjusting the message to be something more meaningful

Comment 1 Rick Rackow 2020-07-09 11:58:58 UTC
Blocked because we're waiting for https://bugzilla.redhat.com/show_bug.cgi?id=1845445

Comment 6 Junqi Zhao 2020-07-20 03:15:19 UTC
tested with 4.4.0-0.nightly-2020-07-18-033102,KubeAPILatencyHigh warning alert details see below, and there is not such alert in the cluster
*************************************************
  - alert: KubeAPILatencyHigh
    annotations:
      message: The API server has an abnormal latency of {{ $value }} seconds for
        {{ $labels.verb }} {{ $labels.resource }}.
    expr: |
      cluster_quantile:apiserver_request_duration_seconds:histogram_quantile{job="apiserver",quantile="0.99"}
      >
      1
      and on (verb,resource)
      (
        cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"}
        >
        on (verb) group_left()
        (
          avg by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0)
          +
          2*stddev by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0)
        )
      ) > on (verb) group_left()
      1.2 * avg by (verb) (cluster:apiserver_request_duration_seconds:mean5m{job="apiserver"} >= 0)
    for: 5m
    labels:
      severity: warning

Comment 8 errata-xmlrpc 2020-07-28 12:37:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3075