Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2019001

Summary: AWS: Operator degraded (CredentialsFailing): 1 of 6 credentials requests are failing to sync.
Product: OpenShift Container Platform Reporter: Martin Kennelly <mkennell>
Component: NetworkingAssignee: Miheer Salunke <misalunk>
Networking sub component: DNS QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, dgoodwin, misalunk, mmasters, wking
Version: 4.10Keywords: Reopened
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:23:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Kennelly 2021-11-01 13:11:17 UTC
Description of problem:
During a PR I am creating, I noticed CCO is failing with Operator message:
"1 of 6 credentials requests are failing to sync."

Here is the PR and associated artifacts (inc. must-gather): https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_sdn/358/pull-ci-openshift-sdn-master-e2e-aws/1455124266080014336

I don't believe my PR has anything to do with it failing.

I looked at the CCO operator logs and saw nothing abnormal.
I looked at pod-identity-webhook logs and see this message repeating:
"2021-11-01T11:36:15.053166052Z E1101 11:36:15.053133       1 certificate_manager.go:437] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:pod-identity-webhook" cannot create resource "certificatesigningrequests" in API group "certificates.k8s.io" at the cluster scope"


Version-Release number of selected component (if applicable):
4.10

How reproducible:
Unknown. I only seen it once.

Steps to Reproduce:
1. Unknown


Actual results:
CCO is degraded.


Expected results:
CCO deploys during cluster rollout

Additional info:

Comment 1 Joel Diaz 2021-11-01 13:58:16 UTC
That message has nothing to do with why 1 of 6 CredentialsRequest CRs are failing.

After looking through the must-gather bundle, I see the CredentialsRequest in openshift-cloud-credential-operator/external-dns is the one that is in the failed state. The status for the object shows:

status:
  conditions:
  - lastProbeTime: "2021-11-01T11:09:24Z"
    lastTransitionTime: "2021-11-01T11:09:24Z"
    message: target namespace external-dns-operator not found
    reason: NamespaceMissing
    status: "True"
    type: MissingTargetNamespace

Whoever/whatever created that CredentailsRequest has not created the Namespace to store the Secret specified in .spec.secretRef (namely external-dns-operator/external-dns-credentials).

Once the Namespace 'external-dns-operator' begins to exist, cloud-cred-operator will be unstuck and should proceed to finish reconciling the CredentialsRequest.

Comment 2 Devan Goodwin 2021-11-01 15:55:54 UTC
https://github.com/openshift/external-dns-operator/commit/ba13b6f49f4304aaa535b086e298447ad7e61616#diff-f7f53c249caae2054edb14689cd60ca6a553a3b75373345fb0a4f67cfe88071a added this credentials request 6 hours ago, something is going wrong, we're seeing this in other places in CI. Definitely not cloud-credential-operator bug.

Cred request is in but it seems nothing is creating the required namespace.

Comment 4 Devan Goodwin 2021-11-01 16:26:45 UTC
Change reverted but the team working on that project is not clear how or why it was present in the release image at all. https://github.com/openshift/external-dns-operator/pull/49

Comment 5 Miheer Salunke 2021-11-01 16:41:00 UTC
The OCP release process is for cluster operators and contains manifests for starting an OCP cluster. It should not have any manifests of external-dns-operator. So I'm wondering how did this CredentialRequest end up in the release payload.

When the operator is installed through OLM, the namespace will be created by OLM. So the change/PR I wrote should be fine.

Comment 6 Miciah Dashiel Butler Masters 2021-11-02 16:37:56 UTC
Setting blocker+ because this broke CI for the entire product.  

Miheer, can you look into why the BZ hasn't progress to ON_QA state?  Andrey might be able to help if it has something to do with CPaaS.

Comment 10 errata-xmlrpc 2022-03-10 16:23:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056