Description of problem:
Unable to oc logs/rsh/exec to any of a pod. The error occurs is -
~~
oc rsh -n openshift-authentication oauth-openshift-6dfddc87cf-7dn7q cat /run/secrets/kubernetes.io/serviceaccount/ca.crt > ingress-ca.crt
error: unable to upgrade connection: Unauthorized
~~
Some time the following error occurs -
~~
# oc logs ibm-block-csi-operator-76749b9685-g4fpd
error: You must be logged in to the server (the server has asked for the client to provide credentials ( pods/log ibm-block-csi-operator-76749b9685-g4fpd))
~~
I check the openshift-authentication pod logs and got the following messages -
~~
I1114 07:11:35.830694 1 log.go:172] http: TLS handshake error from 10.131.0.1:44736: remote error: tls: bad certificate
I1114 07:49:12.538067 1 log.go:172] http: TLS handshake error from 10.128.0.1:55808: remote error: tls: unknown certificate
I1114 07:49:12.538202 1 log.go:172] http: TLS handshake error from 10.131.0.1:37778: remote error: tls: unknown certificate
I1114 07:49:12.546383 1 log.go:172] http: TLS handshake error from 10.128.0.1:55810: EOF
I1114 08:23:22.165048 1 log.go:172] http: TLS handshake error from 10.128.0.1:52108: remote error: tls: bad certificate
I1114 08:23:22.174039 1 log.go:172] http: TLS handshake error from 10.131.0.1:57172: remote error: tls: bad certificate
I1114 18:01:24.772893 1 log.go:172] http: TLS handshake error from 10.131.0.1:45426: EOF
I1115 07:53:10.787558 1 log.go:172] http: TLS handshake error from 10.128.0.1:45482: remote error: tls: unknown certificate
I1115 07:53:10.800616 1 log.go:172] http: TLS handshake error from 10.131.0.1:36036: EOF
~~
All the COs are in the Available state.
Checked the kubelet certs, they seemed to be good to be.
There are no pending CSRs as well -
~~
$ oc get csr
No resources found.
$ oc get nodes
NAME STATUS ROLES AGE VERSION
master0.ocp.lou.com Ready master 106d v1.14.6+7e13ab9a7
master1.ocp.lou.com Ready master 106d v1.14.6+7e13ab9a7
master2.ocp.lou.com Ready master 106d v1.14.6+7e13ab9a7
worker0.ocp.lou.com Ready worker 106d v1.14.6+7e13ab9a7
worker1.ocp.lou.com Ready worker 106d v1.14.6+7e13ab9a7
~~
There is no proxy in the environment.
The this happened after upgrading from 4.1 to 4.2.
Please do let me know if anything is required.
Moving to the API team, since the kubelet seems to be Ready. Could you attach the must-gather logs?
I searched 4.3 CI builds for the error, and did not find any matches.
I can see that they are unable to successfully perform must-gather, but they somehow managed to get logs for certain pods. Would it be possible to get openshift-apiserver pods logs, too?
Ok, looks like there's no problem with either API server. I would like you to check that the kubelets are actually capable of connecting to the API servers. I am going to move this BZ back to the Node team so that they tell you which CA file to use when attempting to do either of `openssl s_client -connect <url> -CAfile <kubelet_ca_here>` or `curl --cacert <kubelet_ca_here>`.
Could you also please share when was the last time each kubelet reported ready?
Do we get to know what was the cause and what is the solution for this bug i see it as marked as closed by there is not any specific details regarding the solution, Do we have a KCS for this issue.