Description of problem: Degraded one operator to trigger ClusterOperatorDegraded alert. It cost 30min+ to get it firing from pending state, but the message says the operator has been degraded for 10m. # curl -s -k -H "Authorization: Bearer $token" https://prometheus-k8s-openshift-monitoring.apps.jliu-48.qe.gcp.devcluster.openshift.com/api/v1/alerts | jq -r '.data.alerts[]| select(.labels.alertname == "ClusterOperatorDegraded")' { "labels": { "alertname": "ClusterOperatorDegraded", "condition": "Degraded", "endpoint": "metrics", "instance": "10.0.0.7:9099", "job": "cluster-version-operator", "name": "authentication", "namespace": "openshift-cluster-version", "pod": "cluster-version-operator-5bcbddcc86-lqlk7", "reason": "OAuthServerConfigObservation_Error", "service": "cluster-version-operator", "severity": "warning" }, "annotations": { "message": "Cluster operator authentication has been degraded for 10 minutes. Operator is degraded because OAuthServerConfigObservation_Error and cluster upgrades will be unstable." }, "state": "firing", "activeAt": "2021-05-10T03:04:29.21303266Z", "value": "1e+00" } The wait time was updated recently in[1] since it was 10min before. [1] https://github.com/openshift/cluster-version-operator/commit/fb5257d4be8e1b18a80a171a24ba6e8386026b94#diff-fabad9e1d73a4f70c3d47836ed62e1982b1c6fbb947fce9a633b9cb0a98ecb24 Version-Release number of the following components: 4.8.0-0.nightly-2021-05-08-025039 How reproducible: always Steps to Reproduce: 1. Degraded cluster operator and check ClusterOperatorDegraded is firing correctly and timely 2. 3. Actual results: Expected results: The alert message should be consistent with the wait time. Additional info: Please attach logs from ansible-playbook with the -vvv flag
I'd just attached the fix for this to bug 1957991, no need for a separate bug. *** This bug has been marked as a duplicate of bug 1957991 ***