job: release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 is always failing in CI, see testgrid results: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 One of the consistently failing tests is: : [sig-instrumentation][Late] Alerts shouldn't exceed the 500 series limit of total series sent via telemetry from each cluster [Suite:openshift/conformance/parallel] expand_less fail [github.com/openshift/origin/test/extended/util/prometheus/helpers.go:174]: Expected <map[string]error | len:1>: { "max_over_time(cluster:telemetry_selected_series:count[2h]) >= 500": { s: "promQL query: max_over_time(cluster:telemetry_selected_series:count[2h]) >= 500 had reported incorrect results:\n[{\"metric\":{},\"value\":[1624748035.001,\"514\"]}]", }, } to be empty sample job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-old-rhcos-e2e-aws-4.7/1408892345029496832 Either the test needs to raise the limit or we need to reduce our metric time series count (it would also be useful to understand why this fails w/ old rhcos, but presumably not w/ current rhcos)
Hmm this is weird because the limits have been increased to 600 series in 4.7 [1] while 4.6 still has the 500 limit [2]. It would mean that the release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 job uses the release-4.6 branch of openshift/origin? Anyway we need to fix the title of the test because "... the 500 series limit ..." isn't accurate. [1] https://github.com/openshift/origin/blob/5013124a4cb27df4b199aabb5812ec0fc1184196/test/extended/prometheus/prometheus.go#L111-L118 [2] https://github.com/openshift/origin/blob/f629c90891c0c7e49dbcc2a5fb44a177712fcfd8/test/extended/prometheus/prometheus.go#L103-L107
The job appears to use the 4.6 tests deliberately: https://github.com/openshift/release/blob/f572056645f7536ac91857204edfaef8088f1766/ci-operator/jobs/openshift/release/openshift-release-release-4.7-periodics.yaml#L570 so you'll need to backport the change to 4.6 if that's appropriate.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.40 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2767