Bug 1958792 - Alert pending time is not consistent with the alert message
Summary: Alert pending time is not consistent with the alert message
Keywords:
Status: CLOSED DUPLICATE of bug 1957991
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Over the Air Updates
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-10 07:28 UTC by liujia
Modified: 2022-05-06 12:29 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-10 22:27:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description liujia 2021-05-10 07:28:49 UTC
Description of problem:
Degraded one operator to trigger ClusterOperatorDegraded alert. It cost 30min+ to get it firing from pending state, but the message says the operator has been degraded for 10m.

# curl -s -k -H "Authorization: Bearer $token"  https://prometheus-k8s-openshift-monitoring.apps.jliu-48.qe.gcp.devcluster.openshift.com/api/v1/alerts | jq -r '.data.alerts[]| select(.labels.alertname == "ClusterOperatorDegraded")'
{
  "labels": {
    "alertname": "ClusterOperatorDegraded",
    "condition": "Degraded",
    "endpoint": "metrics",
    "instance": "10.0.0.7:9099",
    "job": "cluster-version-operator",
    "name": "authentication",
    "namespace": "openshift-cluster-version",
    "pod": "cluster-version-operator-5bcbddcc86-lqlk7",
    "reason": "OAuthServerConfigObservation_Error",
    "service": "cluster-version-operator",
    "severity": "warning"
  },
  "annotations": {
    "message": "Cluster operator authentication has been degraded for 10 minutes. Operator is degraded because OAuthServerConfigObservation_Error and cluster upgrades will be unstable."
  },
  "state": "firing",
  "activeAt": "2021-05-10T03:04:29.21303266Z",
  "value": "1e+00"
}

The wait time was updated recently in[1] since it was 10min before.

[1] https://github.com/openshift/cluster-version-operator/commit/fb5257d4be8e1b18a80a171a24ba6e8386026b94#diff-fabad9e1d73a4f70c3d47836ed62e1982b1c6fbb947fce9a633b9cb0a98ecb24

Version-Release number of the following components:
4.8.0-0.nightly-2021-05-08-025039

How reproducible:
always

Steps to Reproduce:
1. Degraded cluster operator and check ClusterOperatorDegraded is firing correctly and timely
2.
3.

Actual results:


Expected results:
The alert message should be consistent with the wait time.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 W. Trevor King 2021-05-10 22:27:56 UTC
I'd just attached the fix for this to bug 1957991, no need for a separate bug.

*** This bug has been marked as a duplicate of bug 1957991 ***


Note You need to log in before you can comment on or make changes to this bug.