Description of problem: The KubeAPIErrorBudgetBurn alert is often pending in the CI. The alert is active because there's a (small) percentage of requests that have a latency above the 1 second threshold. Most of these requests breaking the SLO have been tracked down to be POST requests to the pods/exec subresource. Version-Release number of selected component (if applicable): 4.9 (probably the same for previous releases) How reproducible: Often seen in the CI, for instance: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-monitoring-operator/1273/pull-ci-openshift-cluster-monitoring-operator-master-e2e-aws-single-node/1415089024816648192 Steps to Reproduce: 1. I believe it would be possible to reproduce by launching "oc exec" commands at a steady rate. 2. 3. Actual results: The alert is pending. Expected results: The alert shouldn't be pending. Additional info: The current recording rules computing burn-rates for write requests [1] don't exclude the "exec|proxy|logs" subresources unlike the recording rules for read requests [2]. The alert has been identified as a flake and is ignored by the origin e2e test suite [3]. [1] https://github.com/openshift/cluster-kube-apiserver-operator/blob/005a95607cf9f8db490e962b549811d8bc0c5eaf/bindata/assets/alerts/kube-apiserver-slos.yaml#L302-L413 [2] https://github.com/openshift/cluster-kube-apiserver-operator/blob/005a95607cf9f8db490e962b549811d8bc0c5eaf/bindata/assets/alerts/kube-apiserver-slos.yaml#L64-L301 [3] https://github.com/openshift/origin/blob/4f99a10d9b0f2f47f17e50961aac7e39af065ab4/test/extended/prometheus/prometheus.go#L82-L87
To be clear: this is about the exec (and other subresources) being missing from the alert in certain rules. It's not about hunting generic alert occurances in CI.
Closing BZ here as this appears to be solved by https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/740