Description of problem:
During upgrade from ocp 4.6.17 to ocp 4.6.19 the insight operator goes in degraded state , due to the error (The server has asked for the client to provide credentials pods/log sdn-xxxxx)
Version-Release number of selected component (if applicable):
The upgrade to be completed without any issues.
The upgrade is stuck and insight operator is in degraded state.
The MCP is not degraded. tried to regenerate the kubelet CSR for the affected node. The there are a lots of errors " Unable to authenticate the request due to an error: x509: certificate signed by unknown authority"
I created a PR fixing the IO issue. I would suggest to create another issue for the Node-auth, MCO.
Started at this commit https://github.com/openshift/insights-operator/commit/5b8e5dce854bfc96a5c1b53a1e2d25346f476639
Changed IO code of CSR gatherer to always return an error
built & replaced IO on cluster
insights-operator-688645c897-f9zs6 1/1 Running 0 42s
added csr resource
checked IO log - it contains the nonsense error I have added to the code
I0324 13:45:50.047785 1 status.go:248] The operator has some internal errors: Source clusterconfig could not be retrieved: Too many requests: brrrrrr
I0324 13:45:50.047998 1 status.go:300] The operator has some internal errors: Source clusterconfig could not be retrieved: Too many requests: brrrrrr
IO is not degraded
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.6.25 bug fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
If the cluster is stuck between two 4.6.z versions i.e. the insight operator is in degraded state because of this bug then the safest way to recover is to update to 4.6.25 or later version. You can just update from e.g. 4.6.19 and 4.6.25 if that's a recommended edge available in your channel (fast or stable channel) you are subscribed to. Please Note that you do not need to force the update.
oc has a client-side guard for starting a new update when hen the cluster is already in between an update. So when you run:
$ oc adm upgrade --to 4.6.25
(or any version later than 4.6.25).
If it can not trigger the update then it will give you the list of warnings if there are any applicable to your cluster.
Review the warnings listed. Make sure that the warning are only because of the current bug i.e. cluster operator insights is degraded then you can add --allow-upgrade-with-warnings to the command (oc adm upgrade --allow-upgrade-with-warnings --to 4.6.25) to trigger the update.
Lalatendu already provided the information. Thanks.
4.6.25 went into stable channels at 2021-04-22 20:36Z .