This bug was initially created as a copy of Bug #1872786 I am copying this bug because: This bug still exists in 4.5. Description of problem: the rule: ========================================== - interval: 3m name: kube-apiserver-availability.rules rules: - expr: | 1 - ( ( # write too slow sum(increase(apiserver_request_duration_seconds_count{verb=~"POST|PUT|PATCH|DELETE"}[30d])) - sum(increase(apiserver_request_duration_seconds_bucket{verb=~"POST|PUT|PATCH|DELETE",le="1"}[30d])) ) + ( # read too slow sum(increase(apiserver_request_duration_seconds_count{verb=~"LIST|GET"}[30d])) - ( sum(increase(apiserver_request_duration_seconds_bucket{verb=~"LIST|GET",scope=~"resource|",le="0.1"}[30d])) + sum(increase(apiserver_request_duration_seconds_bucket{verb=~"LIST|GET",scope="namespace",le="0.5"}[30d])) + sum(increase(apiserver_request_duration_seconds_bucket{verb=~"LIST|GET",scope="cluster",le="5"}[30d])) ) ) + # errors sum(code:apiserver_request_total:increase30d{code=~"5.."} or vector(0)) ) / sum(code:apiserver_request_total:increase30d) labels: verb: all record: apiserver_request:availability30d =========================================== is getting too much info sincethe query corresponds to 30 days. Customer is having this error message all the time: "query processing would load too many samples into memory in query execution" This has been mentioned already upstream here: https://github.com/prometheus/prometheus/issues/7281 and here: https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/411 The current limit is 500m set by query.max-samples. This cannot be changed in openshift (managed by operator) but what we probably need to change is the rule have this query done not including the latest 30days.
Both the 4.6.z nor 4.7.0 bug have blocker- set on them and this is not a regression as far as anyone knows so therefore it's not valid to set it blocker+ here.
Clarifying my poorly worded previous comment, Both the 4.6.z and 4.7.0 bug have blocker- set on them and this is not a regression as far as anyone knows so therefore it's not valid to set it blocker+ here. Please use blocker? to indicate that you'd like engineering to evaluate whether or not this should block a z-stream release or not.
I agree with @sdodson here. Even if this bug has a huge impact on our clients and OSD, it's not a regression so we should not set the blocker+ flag on its z-stream fix. Although, this is still critical and very urgent.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.27 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0033