Description of problem: I disabled the CCO via the configmap $ oc edit cm -n openshift-cloud-credential-operator cloud-credential-operator-config data: disabled: "true" Operator is reporting Available=True and Degraded=False $ oc get clusteroperator cloud-credential NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE cloud-credential 4.3.0 True False False 3h14m However the CloudCredentialOperatorProvisioningFailed alert is firing. Alerting rule is cco_credentials_requests_conditions{condition="CredentialsProvisionFailure"} > 0 This alert should only fire when the CCO is enabled. Version-Release number of selected component (if applicable): 4.3.0 How reproducible: always Steps to Reproduce: 1. Install cluster with kube-system/aws-creds removed 2. Disable the CCO via configmap 3. Observe alert Actual results: cco_credentials_requests_conditions{condition="CredentialsProvisionFailure"} = 3 (ingress, image-registry, machine-api) status: conditions: - lastProbeTime: "2020-01-23T15:59:12Z" lastTransitionTime: "2020-01-23T15:59:12Z" message: 'failed to grant creds: unable to fetch root cloud cred secret: Secret "aws-creds" not found' reason: CredentialsProvisionFailure status: "True" type: CredentialsProvisionFailure lastSyncGeneration: 0 provisioned: false Expected results: Alert should be conditioned on if the CCO is enabled or not Additional info:
Joel, I have a question to ask you. The step 1 of reproduce is removing kube-system/aws-creds before installing a cluster or after installing a cluster? I try to reproduce it following your step, but can not success. The alert is not fired. Could you please show me more detail steps?
Sorry, I found the wrong person. Seth, could you please help to answer the question above? Thanks in advance.
The way I did it was to remove the secret from the manifests that the installer uses https://gist.github.com/sjenning/b22468a02a7fce57a914b09569409ee0#modify-manifests The key is that their have to be CredentialsRequest CRs that the CCO can not process because the secret is missing
Hello, Seth Sorry to disturb you again. I can understand what you point out,but I did not know how to operator it. I am a new hiring,and I have not know much about it. ---------------------------------------------------------------------------------------------------- I tried the following steps 1. Generate manifests and remove aws-creds resource $./openshift-install create manifests $rm openshift/99_cloud-creds-secret.yaml 99_role-cloud-creds-secret-reader.yaml 2.deploy cluster $./openshift-install create cluster But the cluster can't deploy successfully. ----------------------------------------------------------------------------------------------------- So I tried the following steps again 1. Generate manifests and remove aws-creds resource $./openshift-install create manifests $rm openshift/99_cloud-creds-secret.yaml 99_role-cloud-creds-secret-reader.yaml 2.create secrets of ingress,machine-api,image-registry files in openshift dir 3.deploy cluster $./openshift-install create cluster This time, the cluster can be deployed successfully. But I don't know what commands I should use to get the status you wrote in description. I try to use $oc get clusteroperator cloud-credential -o yaml and $openshift-cloud-credential-operator commands,but I can't get below status you mentioned: status: conditions: - lastProbeTime: "2020-01-23T15:59:12Z" lastTransitionTime: "2020-01-23T15:59:12Z" message: 'failed to grant creds: unable to fetch root cloud cred secret: Secret "aws-creds" not found' reason: CredentialsProvisionFailure status: "True" type: CredentialsProvisionFailure lastSyncGeneration: 0 provisioned: false I just can see the alert through prometheus-k8s web console. Then I try to apply a credentialsRequest CRs via $oc create -f test.yaml. I also can't get CredentialsProvisionFailure info. yaml file is below: apiVersion: cloudcredential.openshift.io/v1 kind: CredentialsRequest metadata: name: test1 namespace: openshift-cloud-credential-operator spec: secretRef: name: test1 namespace: default providerSpec: apiVersion: cloudcredential.openshift.io/v1 kind: AWSProviderSpec statementEntries: - effect: Allow action: - s3:CreateBucket - s3:DeleteBucket resource: "*" ----------------------------------------------------------------------------------------- I don't know if my test steps are right,So I want to ask for your help sincerely. Thanks in advance.
Let me suggest an alternative to repro this. Perform a regular cluster installation. Then add a bad CredentialsRequest (this CredentialsRequest will have the NamespaceMissing condition): --- apiVersion: cloudcredential.openshift.io/v1 kind: CredentialsRequest metadata: labels: controller-tools.k8s.io: "1.0" name: my-cred-request namespace: openshift-cloud-credential-operator spec: secretRef: name: my-cred-request-secret namespace: not-a-real-namespace providerSpec: apiVersion: cloudcredential.openshift.io/v1 kind: AWSProviderSpec statementEntries: - effect: Allow action: - s3:CreateBucket - s3:DeleteBucket resource: "*" Give it a moment so that Prometheus is showing the alert. Now edit the cloud-credential-operator configmap so that 'disabled' is set to 'true'. Now the next time cloud-credential-operator calculates metrics (after 2 minutes), and Prometheus scrapes the updated metrics, the NamespaceMissing allert should no longer be firing.
Thanks Joel, I have clear it very well this time.
The bug has fixed. test payload : registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-02-07-001901
sounds like you got it figured out
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581