Description of problem: During upgrade from ocp 4.6.17 to ocp 4.6.19 the insight operator goes in degraded state , due to the error (The server has asked for the client to provide credentials pods/log sdn-xxxxx) Version-Release number of selected component (if applicable): Actual results: The upgrade to be completed without any issues. Expected results: The upgrade is stuck and insight operator is in degraded state. Additional info: The MCP is not degraded. tried to regenerate the kubelet CSR for the affected node. The there are a lots of errors " Unable to authenticate the request due to an error: x509: certificate signed by unknown authority"
I created a PR fixing the IO issue. I would suggest to create another issue for the Node-auth, MCO.
Started at this commit https://github.com/openshift/insights-operator/commit/5b8e5dce854bfc96a5c1b53a1e2d25346f476639 Changed IO code of CSR gatherer to always return an error built & replaced IO on cluster insights-operator-688645c897-f9zs6 1/1 Running 0 42s added csr resource checked IO log - it contains the nonsense error I have added to the code I0324 13:45:50.047785 1 status.go:248] The operator has some internal errors: Source clusterconfig could not be retrieved: Too many requests: brrrrrr I0324 13:45:50.047998 1 status.go:300] The operator has some internal errors: Source clusterconfig could not be retrieved: Too many requests: brrrrrr IO is not degraded
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.25 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1153
If the cluster is stuck between two 4.6.z versions i.e. the insight operator is in degraded state because of this bug then the safest way to recover is to update to 4.6.25 or later version. You can just update from e.g. 4.6.19 and 4.6.25 if that's a recommended edge available in your channel (fast or stable channel) you are subscribed to. Please Note that you do not need to force the update.
oc has a client-side guard for starting a new update when hen the cluster is already in between an update. So when you run: $ oc adm upgrade --to 4.6.25 (or any version later than 4.6.25). If it can not trigger the update then it will give you the list of warnings if there are any applicable to your cluster. Review the warnings listed. Make sure that the warning are only because of the current bug i.e. cluster operator insights is degraded then you can add --allow-upgrade-with-warnings to the command (oc adm upgrade --allow-upgrade-with-warnings --to 4.6.25) to trigger the update.
Lalatendu already provided the information. Thanks.
4.6.25 went into stable channels at 2021-04-22 20:36Z [1]. [1]: https://github.com/openshift/cincinnati-graph-data/pull/758#event-4633317668