Bug 2028647
| Summary: | Clusters are in 'Degraded' status with upgrade env due to obs-controller not working properly | ||
|---|---|---|---|
| Product: | Red Hat Advanced Cluster Management for Kubernetes | Reporter: | Xiang Yin <xiyin> |
| Component: | Core Services / Observability | Assignee: | Chunlin Yang <chuyang> |
| Status: | CLOSED ERRATA | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhacm-2.4.z | CC: | brandencaufield, cqu, juhsu, jwakely, llan, robertsonldspj11, rorygwgehman, thuongchodoisl024 |
| Target Milestone: | --- | Flags: | bot-tracker-sync:
rhacm-2.5+
|
| Target Release: | rhacm-2.5 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-06-09 02:07:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Xiang Yin
2021-12-02 21:18:00 UTC
Checked the environment, metrics-collector in some clusters cannot push metrics successfully to hub server due to client ca certs which used to sign the client certs cannot be verified by server side.
I found in server side, the client ca cert secret removed recently. Ideally, the new-generated client ca ca secret should include the old ca cert, so that the sever side can verify the requests using client certs which signed by old or new ca cert. But in mco operator, there is error message which mentioned it's failed to update that secret to include the old cert.
error message is as below:
```
2021-12-01T13:37:46.631Z ERROR controller_certificates Failed to update secret for ca certificate {"name": "observability-client-ca-certs", "error": "Operation cannot be fulfilled on secrets \"observability-client-ca-certs\": StorageError: invalid object, Code: 4, Key: /kubernetes.io/secrets/open-cluster-management-observability/observability-client-ca-certs, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 4b11225a-6288-43e9-b1c9-e6a30bd42db9, UID in object meta: "}
github.com/open-cluster-management/multicluster-observability-operator/operators/multiclusterobservability/pkg/certificates.onDelete.func1
/remote-source/app/operators/multiclusterobservability/pkg/certificates/cert_controller.go:184
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete
/remote-source/app/vendor/k8s.io/client-go/tools/cache/controller.go:245
k8s.io/client-go/tools/cache.newInformer.func1
/remote-source/app/vendor/k8s.io/client-go/tools/cache/controller.go:413
k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop
/remote-source/app/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:544
k8s.io/client-go/tools/cache.(*controller).processLoop
/remote-source/app/vendor/k8s.io/client-go/tools/cache/controller.go:183
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
k8s.io/client-go/tools/cache.(*controller).Run
/remote-source/app/vendor/k8s.io/client-go/tools/cache/controller.go:154
```
This seems a problem introduced by kubernetes code(https://github.com/kubernetes/kubernetes/issues/82130).
In mco operator, we can add retry logic to bypass this problem.
For users which run into this problem, they can delete the secret observability-controller-open-cluster-management.io-observability-signer-client-cert in open-cluster-management-addon-observability namespace in the managed cluster. Then the client cert will be re-generated and signed by the new ca cert, and metrics can be pushed successfully with the new cert.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4956 This comment was flagged a spam, view the edit history to see the original text if required. This comment was flagged a spam, view the edit history to see the original text if required. This comment was flagged a spam, view the edit history to see the original text if required. Thanks for sharing this, i have also read this. https://www.tellpopeyes.me/ |