1958792 – Alert pending time is not consistent with the alert message

Bug 1958792 - Alert pending time is not consistent with the alert message

Summary: Alert pending time is not consistent with the alert message

Keywords:
Status:	CLOSED DUPLICATE of bug 1957991
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Over the Air Updates
QA Contact:	liujia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-10 07:28 UTC by liujia
Modified:	2022-05-06 12:29 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-10 22:27:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description liujia 2021-05-10 07:28:49 UTC

Description of problem:
Degraded one operator to trigger ClusterOperatorDegraded alert. It cost 30min+ to get it firing from pending state, but the message says the operator has been degraded for 10m.

# curl -s -k -H "Authorization: Bearer $token"  https://prometheus-k8s-openshift-monitoring.apps.jliu-48.qe.gcp.devcluster.openshift.com/api/v1/alerts | jq -r '.data.alerts[]| select(.labels.alertname == "ClusterOperatorDegraded")'
{
  "labels": {
    "alertname": "ClusterOperatorDegraded",
    "condition": "Degraded",
    "endpoint": "metrics",
    "instance": "10.0.0.7:9099",
    "job": "cluster-version-operator",
    "name": "authentication",
    "namespace": "openshift-cluster-version",
    "pod": "cluster-version-operator-5bcbddcc86-lqlk7",
    "reason": "OAuthServerConfigObservation_Error",
    "service": "cluster-version-operator",
    "severity": "warning"
  },
  "annotations": {
    "message": "Cluster operator authentication has been degraded for 10 minutes. Operator is degraded because OAuthServerConfigObservation_Error and cluster upgrades will be unstable."
  },
  "state": "firing",
  "activeAt": "2021-05-10T03:04:29.21303266Z",
  "value": "1e+00"
}

The wait time was updated recently in[1] since it was 10min before.

[1] https://github.com/openshift/cluster-version-operator/commit/fb5257d4be8e1b18a80a171a24ba6e8386026b94#diff-fabad9e1d73a4f70c3d47836ed62e1982b1c6fbb947fce9a633b9cb0a98ecb24

Version-Release number of the following components:
4.8.0-0.nightly-2021-05-08-025039

How reproducible:
always

Steps to Reproduce:
1. Degraded cluster operator and check ClusterOperatorDegraded is firing correctly and timely
2.
3.

Actual results:


Expected results:
The alert message should be consistent with the wait time.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 W. Trevor King 2021-05-10 22:27:56 UTC

I'd just attached the fix for this to bug 1957991, no need for a separate bug.

*** This bug has been marked as a duplicate of bug 1957991 ***

Note You need to log in before you can comment on or make changes to this bug.