Description of problem: Anything which requires websockets errors with internal server error Version-Release number of selected component (if applicable): 4.1.4 How reproducible: Seen in 2 unrelated clusters, no reproducer steps known Actual results: In the web console: WebSocket connection to 'wss://console-openshift-console.apps.example.com/api/kubernetes/api/v1/namespaces/openshift-console/pods/console-79b6c7bb87-gt2ck/log?container=console&follow=true&tailLines=1000&x-csrf-token=ESx4l2bhkAyUQ8nx9f0%2FmA3qThlJEI6IOptYX2N%2FSPBDwcQuQ1K91DDjT0I3J99QYF4rogNwgleVtq6FV%2BkL7Q%3D%3D' failed: Error during WebSocket handshake: Unexpected response code: 500 oc logs console-79b6c7bb87-gt2ck Error from server: Get https://master0.example.com:10250/containerLogs/openshift-console/console-79b6c7bb87-gt2ck/console: remote error: tls: internal error must gather command fails as well: oc adm must-gather namespace/openshift-must-gather-pp7qd created clusterrolebinding.rbac.authorization.k8s.io/must-gather-zhlt9 created container logs unavailable: Get https://master0.example.com:10250/containerLogs/openshift-must-gather-pp7qd/must-gather-4zsd8/gather?follow=true: remote error: tls: internal error Expected results: No error Additional info:
https://tools.ietf.org/html/rfc5246#page-33
(In reply to Steven Walter from comment #1) > https://tools.ietf.org/html/rfc5246#page-33 This is a TLS spec issue (and something the kubelet is not handling properly - or at the VERY least log an issue about) - I don't see any logs from the kubelet.service when these issues happen.
I also see this with 4.1.6 Example of the issues: $ oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE marketplace-operator-768b99959-9pftm 1/1 Running 1 128m $ oc rsh marketplace-operator-768b99959-9pftm -n openshift-marketplace Error from server (NotFound): pods "marketplace-operator-768b99959-9pftm" not found $ oc exec marketplace-operator-768b99959-9pftm -n openshift-marketplace -- echo foo Error from server: error dialing backend: remote error: tls: internal error $ oc logs marketplace-operator-768b99959-9pftm -n openshift-marketplace Error from server: Get https://master:10250/containerLogs/openshift-marketplace/marketplace-operator-768b99959-9pftm/marketplace-operator: remote error: tls: internal error --- $ sudo crictl ps | grep kube-api 239ec13eeaf4e beaf65fce4dc16947c5bd5d1ca7e16313234c393e8ca1c4251ac9b85094972bb About an hour ago Running kube-apiserver-operator 3 bd197ceb6f882 6f2bdcab072ca beaf65fce4dc16947c5bd5d1ca7e16313234c393e8ca1c4251ac9b85094972bb About an hour ago Running kube-apiserver-cert-syncer-8 1 6938a6ebc2c3d e6b9db2994d07 0d8dcfc307048a0f0400e644fcd1c9929018103b15d0f9b23b4841f1e71937bc About an hour ago Running kube-apiserver-8 1 6938a6ebc2c3d sudo crictl logs e6b9db2994d07 ... E0725 17:38:54.707552 1 status.go:64] apiserver received an error that is not an metav1.Status: &url.Error{Op:"Get", URL:"https://master:10250/containerLogs/openshift-kube-apiserver/kube-apiserver-master/kube-apiserver-8", Err:(*net.OpError)(0xc01ec89270)} ... > No other relevant logs out side of issues like this.
This is fixed by approving pending CSRs. I wrote a solution to document this: https://access.redhat.com/solutions/4307511
(In reply to Steven Walter from comment #4) > This is fixed by approving pending CSRs. I wrote a solution to document > this: https://access.redhat.com/solutions/4307511 We need better error handling or clues that the issue we hit here leads to a lack of approved certificates.
The API server doesn't have an opinion on CSRs. In general, an errors about kubelet certificates cannot be fixed by the user getting the message. It seems that there needs to be better feedback about the relative health of kubelets and the state of their credentials. I would expect this to come either from something managing kubelets (this comes up a lot, perhaps something should be built) or by the agent handling CSR approval and deciding not to approve open ones. Reassigning to node team to get it closer to a solution.
https://bugzilla.redhat.com/show_bug.cgi?id=1733331#c4 https://docs.openshift.com/container-platform/4.1/installing/installing_bare_metal/installing-bare-metal.html#installation-approve-csrs_installing-bare-metal This is not a bug. For UPI installs, CU must provide the method for approving kubelet serving CSRs (client CSRs are approved by kube-controller-manager) Some agent on the node that monitored for the status of the kubelet serving cert and surfaced that to the cluster admin in some way would be an RFE.