Bug 1811834

Summary: Alerts on DELETECOLLECTION latency spam and are not useful
Product: OpenShift Container Platform Reporter: Steve Kuznetsov <skuznets>
Component: MonitoringAssignee: Pawel Krupa <pkrupa>
Status: CLOSED ERRATA QA Contact: hongyan li <hongyli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4CC: alegrand, anpicker, aos-bugs, erich, erooth, juzhao, kakkoyun, lcosic, mfojtik, mloibl, pkrupa, surbania
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:19:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Steve Kuznetsov 2020-03-09 21:23:58 UTC
Description of problem:

Alerts for latency on DELETECOLLECTION are very sensitive. The actual DELETECOLLECTION call will scale in latency with the size of the item set to be deleted, so long latencies are not even generally problematic without knowing the deletion set. 40ms response on deleting 100 things should not fire anything. The frequency of alerting here spams to the point where ops teams will get fatigued.

Version:
$ KUBECONFIG=~/.kube/build01 oc get clusterversion version
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2020-03-04-222846   True        False         4d5h    Cluster version is 4.3.0-0.nightly-2020-03-04-222846


Additional info:

See our alerts channel for more spam:
https://coreos.slack.com/archives/CV1UZU53R/p1583786540029600
https://coreos.slack.com/archives/CB48XQ4KZ/p1583520575237600

Comment 1 Stefan Schimanski 2020-03-10 09:41:27 UTC
For reference, this is the alert:

[FIRING:1] KubeAPILatencyHigh apiserver (apiserver https events.k8s.io default openshift-monitoring/k8s events namespace kubernetes warning DELETECOLLECTION v1beta1)
The API server has an abnormal latency of 0.057229939575753015 seconds for DELETECOLLECTION events.

Comment 11 errata-xmlrpc 2020-07-13 17:19:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409