Bug 1996624 - 100% of the cco-metrics/cco-metrics targets in openshift-cloud-credential-operator namespace are down
Summary: 100% of the cco-metrics/cco-metrics targets in openshift-cloud-credential-ope...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Credential Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.10.0
Assignee: Akhil Rane
QA Contact: wang lin
URL:
Whiteboard:
Depends On:
Blocks: 2015989
TreeView+ depends on / blocked
 
Reported: 2021-08-23 10:32 UTC by Mani
Modified: 2022-10-10 04:43 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: AWS was the default secret annotator implementation for OpenStack Consequence: cloud-credential-operator pod restarting continuously with error Fix: Do not setup AWS client for non-AWS platform Result: We no longer see CCO pod crashing
Clone Of:
Environment:
Last Closed: 2022-03-10 16:05:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cloud-credential-operator pull 399 0 None open Bug 1996624: Check for aws status in infra platform status field before client setup 2021-10-11 17:11:31 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:06:24 UTC

Description Mani 2021-08-23 10:32:13 UTC
Description of problem:
cloud-credential-operator pod is restarting continuously with error.

$ omg get pods
NAME                                        READY  STATUS   RESTARTS  AGE
cloud-credential-operator-54bd6754d9-k7gzz  2/2    Running  5         49m


~~~
containerID: cri-o://4eb38ca0eeb4df340bb008a4b4c5c24bc2a71b0be4166b28468650ea934acd61
exitCode: 2
finishedAt: '2021-08-10T12:43:07Z'
message: "aws.(*ReconcileCloudCredSecret).Reconcile(0xc0006d81b0, 0xc004fc2510,\
  \ 0xb, 0xc004fc24f0, 0x9, 0xc010bf2500, 0x0, 0x0, 0x0)\n\t/go/src/github.com/openshift/cloud-credential-operator/pkg/operator/secretannotator/aws/reconciler.go:172\
  \ +0x5d7\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000ab4990,\
  \ 0x208a620, 0xc010bf2380, 0x0)\n\t/go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235\
  \ +0x2a9\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000ab4990,\
  \ 0x203000)\n\t/go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:209\
  \ +0xb0\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc000ab4990)\n\
  \t/go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:188\
  \ +0x2b\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000eb31b0)\n\
  \t/go/src/github.com/openshift/cloud-credential-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\
  \ +0x5f\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000eb31b0, 0x2606fa0,\
  \ 0xc010f9e0c0, 0x1, 0xc0004fdd40)\n\t/go/src/github.com/openshift/cloud-credential-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\
  \ +0xad\nk8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000eb31b0, 0x3b9aca00,\
  \ 0x0, 0xc00101c401, 0xc0004fdd40)\n\t/go/src/github.com/openshift/cloud-credential-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\
  \ +0x98\nk8s.io/apimachinery/pkg/util/wait.Until(0xc000eb31b0, 0x3b9aca00,\
  \ 0xc0004fdd40)\n\t/go/src/github.com/openshift/cloud-credential-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90\
  \ +0x4d\ncreated by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\
  \t/go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:170\
  \ +0x3fa\n"
reason: Error
startedAt: '2021-08-10T12:34:16Z'
~~~

Comment 2 wang lin 2021-08-24 03:10:27 UTC
Hi Mani,

I checked the cco log from must-gather you attached, i see the below information from log, it shows that cco detect an aws root secret in kube-system namespace too, the customer cluster is installed on openstack, so aws secret is not required and this cause the panic. as a workaround, could you remove that aws root secret($oc delete secret aws-creds -n kube-system). i tested, that could resolve this panic.

################
2021-08-10T12:43:07.086582668Z time="2021-08-10T12:43:07Z" level=info msg="observed admin cloud credential secret event" namespace=kube-system secret=openstack-credentials
2021-08-10T12:43:07.086582668Z time="2021-08-10T12:43:07Z" level=info msg="requeueing all CredentialsRequests"
2021-08-10T12:43:07.086582668Z time="2021-08-10T12:43:07Z" level=info msg="observed admin cloud credential secret event" namespace=kube-system secret=aws-creds
2021-08-10T12:43:07.086582668Z time="2021-08-10T12:43:07Z" level=info msg="requeueing all CredentialsRequests"
################

Comment 3 Mani 2021-08-25 10:29:41 UTC
@wang Lin

I suggested the workaround to the customer and it worked.


Is this affect all RHOCP 4.6 openstack clusters?

Comment 4 wang lin 2021-08-25 10:47:56 UTC
I launched a regular installation on openstack using the same version as the customer, it won't create such aws root secret in kube-system namespace by default,  i am not sure why the customer's cluster had this secret.

To reproduce this issue, I created an aws secret manually in kube-system, then i reproduce the same issue with the customer.

Comment 7 wang lin 2021-10-21 02:16:55 UTC
The verified steps pasted on PR: https://github.com/openshift/cloud-credential-operator/pull/399#issuecomment-940787856

move this one to Verified.

Comment 10 errata-xmlrpc 2022-03-10 16:05:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.