Bug 1926310

Summary: CannotRetrieveUpdates alerts on Critical severity
Product: OpenShift Container Platform Reporter: Rick Rackow <rrackow>
Component: Cluster Version OperatorAssignee: Over the Air Updates <aos-team-ota>
Status: CLOSED ERRATA QA Contact: Yang Yang <yanyang>
Severity: high Docs Contact:
Priority: medium    
Version: 4.6.zCC: aos-bugs, jokerman, lmohanty, yanyang
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:42:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1926795    

Description Rick Rackow 2021-02-08 16:02:42 UTC
Description of problem:

CannotRetrieveUpdates is firing on `Critical`. Due to the nature of the alert, it should be a warning. No one should be paged for it in the middle of the night but we still want cluster owners to eventually fix this.

Comment 2 Yang Yang 2021-02-10 03:21:49 UTC
Reproducing it with 4.7 build:
# oc version
Client Version: 4.7.0-0.nightly-2021-02-05-161444
Server Version: 4.7.0-0.nightly-2021-02-09-024347
Kubernetes Version: v1.20.0+ba45583

# curl -s -k -H "Authorization: Bearer $token" https://prometheus-k8s-openshift-monitoring.apps.storage-mitm11.qe.gcp.devcluster.openshift.com/api/v1/alerts | jq '.data.alerts[]|select(.labels.alertname == "CannotRetrieveUpdates")'
{
  "labels": {
    "alertname": "CannotRetrieveUpdates",
    "endpoint": "metrics",
    "instance": "10.0.0.7:9099",
    "job": "cluster-version-operator",
    "namespace": "openshift-cluster-version",
    "pod": "cluster-version-operator-84db878675-gv77f",
    "service": "cluster-version-operator",
    "severity": "critical"
  },
  "annotations": {
    "message": "Cluster version operator has not retrieved updates in 20h 28m 44s. Failure reason VersionNotFound .  For more information refer to https://console-openshift-console.apps.storage-mitm11.qe.gcp.devcluster.openshift.com/settings/cluster/."
  },
  "state": "firing",
  "activeAt": "2021-02-09T07:50:46.457044785Z",
  "value": "7.372445700001717e+04"
}

Waiting for available 4.8 nightly build to verify it.

Comment 3 Yang Yang 2021-02-10 08:41:14 UTC
Verified it with 4.8.0-0.nightly-2021-02-09-225546

Steps to verify:

1) Install a cluster with 4.8.0-0.nightly-2021-02-09-225546
2) After 1 hour, check CannotRetrieveUpdates alert

# curl -s -k -H "Authorization: Bearer $token" https://prometheus-k8s-openshift-monitoring.apps.yangyangbz.qe.gcp.devcluster.openshift.com/api/v1/alerts | jq '.data.alerts[]|select(.labels.alertname == "CannotRetrieveUpdates")'
{
  "labels": {
    "alertname": "CannotRetrieveUpdates",
    "endpoint": "metrics",
    "instance": "10.0.0.5:9099",
    "job": "cluster-version-operator",
    "namespace": "openshift-cluster-version",
    "pod": "cluster-version-operator-6cfff74f5b-hkdjp",
    "service": "cluster-version-operator",
    "severity": "warning"    <----- severity is changed to warning
  },
  "annotations": {
    "message": "Cluster version operator has not retrieved updates in 1h 13m 29s. Failure reason VersionNotFound .  For more information refer to https://console-openshift-console.apps.yangyangbz.qe.gcp.devcluster.openshift.com/settings/cluster/."
  },
  "state": "firing",
  "activeAt": "2021-02-10T08:25:46.457044785Z",
  "value": "4.409457000017166e+03"
}

Moving it to verified.

Comment 6 errata-xmlrpc 2021-07-27 22:42:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438