Created attachment 1639715 [details] cloud-credential-operator pod logs Description of problem: 4.3.0-0.nightly-2019-11-25-153929 fresh cluster, CCOProvisioningFailed alert is found # oc -n openshift-monitoring get ep | grep alertmanager-main NAME ENDPOINTS AGE alertmanager-main 10.128.2.10:9095,10.129.2.13:9095,10.131.0.12:9095 6h31m # token=`oc -n openshift-monitoring sa get-token prometheus-k8s` # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-1 -- curl -k -H "Authorization: Bearer $token" 'https://10.128.2.10:9095/api/v1/alerts' | jq ... { "labels": { "alertname": "CCOProvisioningFailed", "condition": "CredentialsProvisionFailure", "endpoint": "cco-metrics", "instance": "10.130.0.2:2112", "job": "cco-metrics", "namespace": "openshift-cloud-credential-operator", "pod": "cloud-credential-operator-7b4fd65dc5-z5z5q", "prometheus": "openshift-monitoring/k8s", "service": "cco-metrics", "severity": "warning" }, "annotations": { "summary": "CredentialsRequest(s) unable to be fulfilled" }, "startsAt": "2019-11-26T01:13:42.851606264Z", "endsAt": "2019-11-26T07:40:42.851606264Z", "generatorURL": "https://prometheus-k8s-openshift-monitoring.apps.juzhao-11-26.qe.devcluster.openshift.com/graph?g0.expr=cco_credentials_requests_conditions%7Bcondition%3D%22CredentialsProvisionFailure%22%7D+%3E+0&g0.tab=1", "status": { "state": "active", "silencedBy": [], "inhibitedBy": [] }, "receivers": [ "null" ], "fingerprint": "554807430686d598" } ... CCOProvisioningFailed detail ******************************* alert: CCOProvisioningFailed expr: cco_credentials_requests_conditions{condition="CredentialsProvisionFailure"} > 0 for: 5m labels: severity: warning annotations: summary: CredentialsRequest(s) unable to be fulfilled ******************************* cco_credentials_requests_conditions{condition="CredentialsProvisionFailure"} > 0 Element Value cco_credentials_requests_conditions{condition="CredentialsProvisionFailure",endpoint="cco-metrics",instance="10.130.0.2:2112",job="cco-metrics",namespace="openshift-cloud-credential-operator",pod="cloud-credential-operator-7b4fd65dc5-z5z5q",service="cco-metrics"} 1 logs see the attached file Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2019-11-25-153929 How reproducible: Recently Steps to Reproduce: 1. See the description 2. 3. Actual results: CCOProvisioningFailed alert is found in a fresh cluster Expected results: no such alert Additional info:
This is the same as Bug 1781109, setting up dependency on that one as the 4.4 bug.
The issue would be intermittent. Fundamentally what is happening is that once an alert fires (which wouldn't happen on every installation), the alert would never clear. You can force an alert by adding a CredentialsRequest object that points to a namespace that doesn't exist. apiVersion: cloudcredential.openshift.io/v1 kind: CredentialsRequest metadata: name: my-cred-request namespace: openshift-cloud-credential-operator spec: secretRef: name: my-cred-request-secret namespace: namespace-does-not-exist providerSpec: apiVersion: cloudcredential.openshift.io/v1 kind: AWSProviderSpec statementEntries: - effect: Allow action: - s3:CreateBucket - s3:DeleteBucket resource: "*" After a few minutes you should see the alert. Now you can either create the namespace to clear the alert, or delete the CredentialsRequest so there is no longer a CredentialsRequest in a bad state, and you would expect the alert to clear, but it never does (at least it doesn't clear without the changes in the PR).
*** Bug 1783963 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062