Bug 1926310 - CannotRetrieveUpdates alerts on Critical severity
Summary: CannotRetrieveUpdates alerts on Critical severity
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.8.0
Assignee: Over the Air Updates
QA Contact: Yang Yang
URL:
Whiteboard:
Depends On:
Blocks: 1926795
TreeView+ depends on / blocked
 
Reported: 2021-02-08 16:02 UTC by Rick Rackow
Modified: 2022-05-06 12:29 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:42:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 509 0 None closed Bug 1926310: install/0000_90_cluster-version-operator_02_servicemonitor.yaml: adjust "CannotRetrieveUpdates" to "warning... 2021-02-16 17:48:23 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:42:59 UTC

Description Rick Rackow 2021-02-08 16:02:42 UTC
Description of problem:

CannotRetrieveUpdates is firing on `Critical`. Due to the nature of the alert, it should be a warning. No one should be paged for it in the middle of the night but we still want cluster owners to eventually fix this.

Comment 2 Yang Yang 2021-02-10 03:21:49 UTC
Reproducing it with 4.7 build:
# oc version
Client Version: 4.7.0-0.nightly-2021-02-05-161444
Server Version: 4.7.0-0.nightly-2021-02-09-024347
Kubernetes Version: v1.20.0+ba45583

# curl -s -k -H "Authorization: Bearer $token" https://prometheus-k8s-openshift-monitoring.apps.storage-mitm11.qe.gcp.devcluster.openshift.com/api/v1/alerts | jq '.data.alerts[]|select(.labels.alertname == "CannotRetrieveUpdates")'
{
  "labels": {
    "alertname": "CannotRetrieveUpdates",
    "endpoint": "metrics",
    "instance": "10.0.0.7:9099",
    "job": "cluster-version-operator",
    "namespace": "openshift-cluster-version",
    "pod": "cluster-version-operator-84db878675-gv77f",
    "service": "cluster-version-operator",
    "severity": "critical"
  },
  "annotations": {
    "message": "Cluster version operator has not retrieved updates in 20h 28m 44s. Failure reason VersionNotFound .  For more information refer to https://console-openshift-console.apps.storage-mitm11.qe.gcp.devcluster.openshift.com/settings/cluster/."
  },
  "state": "firing",
  "activeAt": "2021-02-09T07:50:46.457044785Z",
  "value": "7.372445700001717e+04"
}

Waiting for available 4.8 nightly build to verify it.

Comment 3 Yang Yang 2021-02-10 08:41:14 UTC
Verified it with 4.8.0-0.nightly-2021-02-09-225546

Steps to verify:

1) Install a cluster with 4.8.0-0.nightly-2021-02-09-225546
2) After 1 hour, check CannotRetrieveUpdates alert

# curl -s -k -H "Authorization: Bearer $token" https://prometheus-k8s-openshift-monitoring.apps.yangyangbz.qe.gcp.devcluster.openshift.com/api/v1/alerts | jq '.data.alerts[]|select(.labels.alertname == "CannotRetrieveUpdates")'
{
  "labels": {
    "alertname": "CannotRetrieveUpdates",
    "endpoint": "metrics",
    "instance": "10.0.0.5:9099",
    "job": "cluster-version-operator",
    "namespace": "openshift-cluster-version",
    "pod": "cluster-version-operator-6cfff74f5b-hkdjp",
    "service": "cluster-version-operator",
    "severity": "warning"    <----- severity is changed to warning
  },
  "annotations": {
    "message": "Cluster version operator has not retrieved updates in 1h 13m 29s. Failure reason VersionNotFound .  For more information refer to https://console-openshift-console.apps.yangyangbz.qe.gcp.devcluster.openshift.com/settings/cluster/."
  },
  "state": "firing",
  "activeAt": "2021-02-10T08:25:46.457044785Z",
  "value": "4.409457000017166e+03"
}

Moving it to verified.

Comment 6 errata-xmlrpc 2021-07-27 22:42:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.