Description of problem: CannotRetrieveUpdates is firing on `Critical`. Due to the nature of the alert, it should be a warning. No one should be paged for it in the middle of the night but we still want cluster owners to eventually fix this.
Reproducing it with 4.7 build: # oc version Client Version: 4.7.0-0.nightly-2021-02-05-161444 Server Version: 4.7.0-0.nightly-2021-02-09-024347 Kubernetes Version: v1.20.0+ba45583 # curl -s -k -H "Authorization: Bearer $token" https://prometheus-k8s-openshift-monitoring.apps.storage-mitm11.qe.gcp.devcluster.openshift.com/api/v1/alerts | jq '.data.alerts[]|select(.labels.alertname == "CannotRetrieveUpdates")' { "labels": { "alertname": "CannotRetrieveUpdates", "endpoint": "metrics", "instance": "10.0.0.7:9099", "job": "cluster-version-operator", "namespace": "openshift-cluster-version", "pod": "cluster-version-operator-84db878675-gv77f", "service": "cluster-version-operator", "severity": "critical" }, "annotations": { "message": "Cluster version operator has not retrieved updates in 20h 28m 44s. Failure reason VersionNotFound . For more information refer to https://console-openshift-console.apps.storage-mitm11.qe.gcp.devcluster.openshift.com/settings/cluster/." }, "state": "firing", "activeAt": "2021-02-09T07:50:46.457044785Z", "value": "7.372445700001717e+04" } Waiting for available 4.8 nightly build to verify it.
Verified it with 4.8.0-0.nightly-2021-02-09-225546 Steps to verify: 1) Install a cluster with 4.8.0-0.nightly-2021-02-09-225546 2) After 1 hour, check CannotRetrieveUpdates alert # curl -s -k -H "Authorization: Bearer $token" https://prometheus-k8s-openshift-monitoring.apps.yangyangbz.qe.gcp.devcluster.openshift.com/api/v1/alerts | jq '.data.alerts[]|select(.labels.alertname == "CannotRetrieveUpdates")' { "labels": { "alertname": "CannotRetrieveUpdates", "endpoint": "metrics", "instance": "10.0.0.5:9099", "job": "cluster-version-operator", "namespace": "openshift-cluster-version", "pod": "cluster-version-operator-6cfff74f5b-hkdjp", "service": "cluster-version-operator", "severity": "warning" <----- severity is changed to warning }, "annotations": { "message": "Cluster version operator has not retrieved updates in 1h 13m 29s. Failure reason VersionNotFound . For more information refer to https://console-openshift-console.apps.yangyangbz.qe.gcp.devcluster.openshift.com/settings/cluster/." }, "state": "firing", "activeAt": "2021-02-10T08:25:46.457044785Z", "value": "4.409457000017166e+03" } Moving it to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438