Created attachment 1487567 [details] latencies which cannot be cleared Description of problem: KubeAPILatencyHigh is a warning rule -> cluster_quantile:apiserver_request_latencies:histogram_quantile{job="apiserver",quantile="0.99",subresource!="log",verb!~"^(?:LIST|WATCH|WATCHLIST|PROXY|CONNECT)$"} > 1 As I understand histograms in prometheus, the buckets are cumulative and do not necessarily reset. This alert, therefore, will continue to warn for an indeterminate period without there being a way to 'clear' the alert. Version-Release number of selected component (if applicable): v3.11.16 How reproducible: by design Actual results: If there is a single high-latency request (e.g. a master is temporarily troubled), prometheus will raise this warning after 15m and an operations team has no way of clearing it -- even if the master is healthy again. Expected results: I would like to know if there are *recent* high latency requests. A full history is useful information, but should not be a warning.
The use of colons in the name indicates that this is a recording rule. The underlying PromQL query of this particular rule is: ``` histogram_quantile(0.99, sum without(instance, pod) (rate(apiserver_request_latencies_bucket{job="apiserver"}[5m]))) / 1e+06 ``` Because of the use of rate here, the request latency should recover within a few minutes, given the latency actually improved. This should only show latency spikes over the last five minutes.