Bug 2019001 - AWS: Operator degraded (CredentialsFailing): 1 of 6 credentials requests are failing to sync.
Summary: AWS: Operator degraded (CredentialsFailing): 1 of 6 credentials requests are ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Miheer Salunke
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-01 13:11 UTC by Martin Kennelly
Modified: 2022-08-04 22:39 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:23:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift external-dns-operator pull 51 0 None Merged Bug 2019001: OLM operators should not be included in the release payload 2021-11-02 16:35:17 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:24:03 UTC

Description Martin Kennelly 2021-11-01 13:11:17 UTC
Description of problem:
During a PR I am creating, I noticed CCO is failing with Operator message:
"1 of 6 credentials requests are failing to sync."

Here is the PR and associated artifacts (inc. must-gather): https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_sdn/358/pull-ci-openshift-sdn-master-e2e-aws/1455124266080014336

I don't believe my PR has anything to do with it failing.

I looked at the CCO operator logs and saw nothing abnormal.
I looked at pod-identity-webhook logs and see this message repeating:
"2021-11-01T11:36:15.053166052Z E1101 11:36:15.053133       1 certificate_manager.go:437] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:pod-identity-webhook" cannot create resource "certificatesigningrequests" in API group "certificates.k8s.io" at the cluster scope"


Version-Release number of selected component (if applicable):
4.10

How reproducible:
Unknown. I only seen it once.

Steps to Reproduce:
1. Unknown


Actual results:
CCO is degraded.


Expected results:
CCO deploys during cluster rollout

Additional info:

Comment 1 Joel Diaz 2021-11-01 13:58:16 UTC
That message has nothing to do with why 1 of 6 CredentialsRequest CRs are failing.

After looking through the must-gather bundle, I see the CredentialsRequest in openshift-cloud-credential-operator/external-dns is the one that is in the failed state. The status for the object shows:

status:
  conditions:
  - lastProbeTime: "2021-11-01T11:09:24Z"
    lastTransitionTime: "2021-11-01T11:09:24Z"
    message: target namespace external-dns-operator not found
    reason: NamespaceMissing
    status: "True"
    type: MissingTargetNamespace

Whoever/whatever created that CredentailsRequest has not created the Namespace to store the Secret specified in .spec.secretRef (namely external-dns-operator/external-dns-credentials).

Once the Namespace 'external-dns-operator' begins to exist, cloud-cred-operator will be unstuck and should proceed to finish reconciling the CredentialsRequest.

Comment 2 Devan Goodwin 2021-11-01 15:55:54 UTC
https://github.com/openshift/external-dns-operator/commit/ba13b6f49f4304aaa535b086e298447ad7e61616#diff-f7f53c249caae2054edb14689cd60ca6a553a3b75373345fb0a4f67cfe88071a added this credentials request 6 hours ago, something is going wrong, we're seeing this in other places in CI. Definitely not cloud-credential-operator bug.

Cred request is in but it seems nothing is creating the required namespace.

Comment 4 Devan Goodwin 2021-11-01 16:26:45 UTC
Change reverted but the team working on that project is not clear how or why it was present in the release image at all. https://github.com/openshift/external-dns-operator/pull/49

Comment 5 Miheer Salunke 2021-11-01 16:41:00 UTC
The OCP release process is for cluster operators and contains manifests for starting an OCP cluster. It should not have any manifests of external-dns-operator. So I'm wondering how did this CredentialRequest end up in the release payload.

When the operator is installed through OLM, the namespace will be created by OLM. So the change/PR I wrote should be fine.

Comment 6 Miciah Dashiel Butler Masters 2021-11-02 16:37:56 UTC
Setting blocker+ because this broke CI for the entire product.  

Miheer, can you look into why the BZ hasn't progress to ON_QA state?  Andrey might be able to help if it has something to do with CPaaS.

Comment 10 errata-xmlrpc 2022-03-10 16:23:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.