Description of problem: Failed to approve the kubelet client certificate requests for the UPI installation cluster after run some times. Version-Release number of selected component (if applicable): payload: 4.7.0-0.nightly-2021-01-18-000316 How reproducible: Sometimes Steps to Reproduce: 1) UPI install OCP cluster on vsphere; 2) After run some time, the node will "NotReady", check the CSR, will see pending; 3) Check the logs of KCM Actual results: 2) [root@dhcp-140-138 ~]# oc get csr NAME AGE SIGNERNAME REQUESTOR CONDITION csr-5wcnj 3h54m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending csr-b4q2b 3h39m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending 3) See error from the KCM: oc logs -f po/kube-controller-manager-control-plane-2 ...... I0119 02:58:10.961653 1 cert_rotation.go:88] certificate rotation detected, shutting down client connections to start using new credentials I0119 02:58:35.097411 1 cert_rotation.go:88] certificate rotation detected, shutting down client connections to start using new credentials E0119 02:58:35.099744 1 webhook.go:116] Failed to make webhook authenticator request: Post "https://api-int.xxia18shared.qe.devcluster.openshift.com:6443/apis/authentication.k8s.io/v1/tokenreviews?timeout=10s": write tcp 172.31.248.29:58906->172.31.248.37:6443: use of closed network connection E0119 02:58:35.099815 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Post "https://api-int.xxia18shared.qe.devcluster.openshift.com:6443/apis/authentication.k8s.io/v1/tokenreviews?timeout=10s": write tcp 172.31.248.29:58906->172.31.248.37:6443: use of closed network connection] Expected results: 2) No pending CSR; 3) No error, node client CSRs are automatically approved by the cluster kube-controller-manager. Additional info:
Jan if in doubt sync with Zhoe, she might be able to provide you with more background information.
> Expected results: > 2) No pending CSR; > 3) No error, node client CSRs are automatically approved by the cluster kube-controller-manager. OpenShift uses cluster-machine-approver to automatically approve CSRs: https://github.com/openshift/cluster-machine-approver#node-client-csr-approval-workflow. Checking logs of must-gather.local.4215719760916163428/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-ea0726ef5ce36297001001454c6c56b13a5f7a2cb201b7cab31a42d86bf5648a/namespaces/openshift-cluster-machine-approver/pods/machine-approver-7f9d4c7d59-4gx4k/machine-approver-controller/machine-approver-controller/logs/current.log (CSRs are getting approved): ``` ... 2021-01-26T03:27:24.595130087Z I0126 03:27:24.595086 1 main.go:147] CSR csr-z72xq added 2021-01-26T03:27:24.608409139Z I0126 03:27:24.608362 1 csr_check.go:419] retrieving serving cert from xiuwang-shared-w9nc5-worker-bnnfq (172.31.247.239:10250) 2021-01-26T03:27:24.611066248Z W0126 03:27:24.611027 1 csr_check.go:178] Failed to retrieve current serving cert: remote error: tls: internal error 2021-01-26T03:27:24.611128342Z I0126 03:27:24.611118 1 csr_check.go:183] Falling back to machine-api authorization for xiuwang-shared-w9nc5-worker-bnnfq 2021-01-26T03:27:24.611161741Z I0126 03:27:24.611153 1 main.go:182] CSR csr-z72xq not authorized: IP address '172.31.247.239' not in machine addresses: 2021-01-26T03:27:24.611184159Z I0126 03:27:24.611176 1 main.go:218] Error syncing csr csr-z72xq: IP address '172.31.247.239' not in machine addresses: ... 2021-01-26T03:27:26.476956626Z I0126 03:27:26.476920 1 csr_check.go:419] retrieving serving cert from xiuwang-shared-w9nc5-worker-bnnfq (172.31.247.239:10250) 2021-01-26T03:27:26.478468707Z W0126 03:27:26.477991 1 csr_check.go:178] Failed to retrieve current serving cert: remote error: tls: internal error 2021-01-26T03:27:26.478468707Z I0126 03:27:26.478016 1 csr_check.go:183] Falling back to machine-api authorization for xiuwang-shared-w9nc5-worker-bnnfq 2021-01-26T03:27:26.489758522Z W0126 03:27:26.489724 1 warnings.go:70] certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest 2021-01-26T03:27:26.489915769Z I0126 03:27:26.489905 1 main.go:197] CSR csr-z72xq approved ... ``` Also /must-gather.local.4215719760916163428/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-ea0726ef5ce36297001001454c6c56b13a5f7a2cb201b7cab31a42d86bf5648a/namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-control-plane-1/kube-controller-manager/kube-controller-manager/logs/current.log contains: ``` 2021-01-26T01:05:47.786069064Z I0126 01:05:47.785984 1 cleaner.go:180] Cleaning CSR "csr-js5wb" as it is more than 1h0m0s old and approved. 2021-01-26T01:05:47.792841774Z I0126 01:05:47.792795 1 cleaner.go:180] Cleaning CSR "csr-knfk7" as it is more than 1h0m0s old and approved. 2021-01-26T01:05:47.800169927Z I0126 01:05:47.800126 1 cleaner.go:180] Cleaning CSR "csr-rq6mz" as it is more than 1h0m0s old and approved. 2021-01-26T01:05:47.806977519Z I0126 01:05:47.806923 1 cleaner.go:180] Cleaning CSR "csr-mc5lx" as it is more than 1h0m0s old and approved. 2021-01-26T03:05:47.814419036Z I0126 03:05:47.814254 1 cleaner.go:180] Cleaning CSR "csr-2ch8x" as it is more than 1h0m0s old and approved. 2021-01-26T03:05:47.826164361Z I0126 03:05:47.826051 1 cleaner.go:180] Cleaning CSR "csr-stddj" as it is more than 1h0m0s old and approved. 2021-01-26T05:05:47.837761737Z I0126 05:05:47.837667 1 cleaner.go:180] Cleaning CSR "csr-pkw9x" as it is more than 1h0m0s old and approved. 2021-01-26T05:05:47.852182999Z I0126 05:05:47.852138 1 cleaner.go:180] Cleaning CSR "csr-zrtr4" as it is more than 1h0m0s old and approved. 2021-01-26T05:05:47.860966303Z I0126 05:05:47.860909 1 cleaner.go:180] Cleaning CSR "csr-z72xq" as it is more than 1h0m0s old and approved. 2021-01-26T05:05:47.868394170Z I0126 05:05:47.868341 1 cleaner.go:180] Cleaning CSR "csr-hrflf" as it is more than 1h0m0s old and approved. 2021-01-26T05:05:47.876019229Z I0126 05:05:47.875970 1 cleaner.go:180] Cleaning CSR "csr-zbwhm" as it is more than 1h0m0s old and approved. 2021-01-26T05:05:47.884147927Z I0126 05:05:47.884070 1 cleaner.go:180] Cleaning CSR "csr-pd8tv" as it is more than 1h0m0s old and approved. ``` So KCM is acknowledging approved CSRs. From quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-ea0726ef5ce36297001001454c6c56b13a5f7a2cb201b7cab31a42d86bf5648a/namespaces/openshift-cluster-machine-approver/pods/machine-approver-7f9d4c7d59-4gx4k/machine-approver-controller/machine-approver-controller/logs/current.log CRS added: ``` $ cat current.log | grep "CSR .* added": 2021-01-25T03:12:45.576681774Z I0125 03:12:45.576619 1 main.go:147] CSR csr-2pnzb added 2021-01-25T03:12:45.576681774Z I0125 03:12:45.576646 1 main.go:147] CSR csr-6nr27 added 2021-01-25T03:12:45.576681774Z I0125 03:12:45.576656 1 main.go:147] CSR csr-g8nr7 added 2021-01-25T03:12:45.576714915Z I0125 03:12:45.576687 1 main.go:147] CSR csr-jhnx9 added 2021-01-25T03:12:45.576714915Z I0125 03:12:45.576696 1 main.go:147] CSR csr-v7qnn added 2021-01-25T03:12:45.576714915Z I0125 03:12:45.576705 1 main.go:147] CSR csr-vmx9h added 2021-01-25T03:12:45.576714915Z I0125 03:12:45.576711 1 main.go:147] CSR csr-wm77f added 2021-01-25T03:12:45.576724730Z I0125 03:12:45.576718 1 main.go:147] CSR csr-8qjnq added 2021-01-25T03:12:45.576732752Z I0125 03:12:45.576725 1 main.go:147] CSR csr-kqzgk added 2021-01-25T03:12:45.576739414Z I0125 03:12:45.576732 1 main.go:147] CSR csr-rxlc9 added 2021-01-25T20:38:24.835444834Z I0125 20:38:24.835391 1 main.go:147] CSR csr-5n6kw added 2021-01-25T20:38:24.860634962Z I0125 20:38:24.860591 1 main.go:147] CSR csr-5n6kw added 2021-01-25T21:26:00.756217511Z I0125 21:26:00.756157 1 main.go:147] CSR csr-qvm45 added 2021-01-25T22:16:59.256412130Z I0125 22:16:59.256354 1 main.go:147] CSR csr-tbs7h added 2021-01-25T22:16:59.267781829Z I0125 22:16:59.267745 1 main.go:147] CSR csr-tbs7h added 2021-01-25T22:16:59.283727703Z I0125 22:16:59.283690 1 main.go:147] CSR csr-tbs7h added 2021-01-25T22:25:41.854052553Z I0125 22:25:41.853968 1 main.go:147] CSR csr-jlxxt added 2021-01-25T22:37:17.628433076Z I0125 22:37:17.628374 1 main.go:147] CSR csr-d7mrj added 2021-01-25T22:37:17.646049944Z I0125 22:37:17.646009 1 main.go:147] CSR csr-d7mrj added 2021-01-25T22:43:04.990894945Z I0125 22:43:04.990825 1 main.go:147] CSR csr-l69rp added 2021-01-25T23:03:53.076274391Z I0125 23:03:53.076180 1 main.go:147] CSR csr-knfk7 added 2021-01-25T23:03:53.089004332Z I0125 23:03:53.088964 1 main.go:147] CSR csr-knfk7 added 2021-01-25T23:03:53.105461282Z I0125 23:03:53.105412 1 main.go:147] CSR csr-knfk7 added 2021-01-25T23:17:39.516617444Z I0125 23:17:39.516552 1 main.go:147] CSR csr-rq6mz added 2021-01-25T23:17:39.529278097Z I0125 23:17:39.529191 1 main.go:147] CSR csr-rq6mz added 2021-01-25T23:25:26.858634167Z I0125 23:25:26.858574 1 main.go:147] CSR csr-mc5lx added 2021-01-25T23:30:07.345276514Z I0125 23:30:07.345217 1 main.go:147] CSR csr-js5wb added 2021-01-26T01:25:20.526586583Z I0126 01:25:20.526526 1 main.go:147] CSR csr-2ch8x added 2021-01-26T01:31:00.161514200Z I0126 01:31:00.161440 1 main.go:147] CSR csr-stddj added 2021-01-26T01:31:00.175973528Z I0126 01:31:00.175932 1 main.go:147] CSR csr-stddj added 2021-01-26T03:20:37.839047884Z I0126 03:20:37.838972 1 main.go:147] CSR csr-hrflf added 2021-01-26T03:20:50.157354677Z I0126 03:20:50.157000 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:20:50.177332159Z I0126 03:20:50.177238 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:20:50.209774493Z I0126 03:20:50.209691 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:20:50.244169208Z I0126 03:20:50.244110 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:20:50.299331231Z I0126 03:20:50.299239 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:20:50.392126861Z I0126 03:20:50.392067 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:20:50.727348557Z I0126 03:20:50.727259 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:20:51.287633657Z I0126 03:20:51.287575 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:20:52.008331687Z I0126 03:20:52.008274 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:20:53.301388843Z I0126 03:20:53.301319 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:20:55.878502244Z I0126 03:20:55.878434 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:21:01.010779346Z I0126 03:21:01.010700 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:21:11.274376435Z I0126 03:21:11.274307 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:21:31.775219228Z I0126 03:21:31.775151 1 main.go:147] CSR csr-zbwhm added 2021-01-26T03:27:10.946750711Z I0126 03:27:10.946665 1 main.go:147] CSR csr-pd8tv added 2021-01-26T03:27:12.275271394Z I0126 03:27:12.275006 1 main.go:147] CSR csr-pkw9x added 2021-01-26T03:27:23.256632353Z I0126 03:27:23.256589 1 main.go:147] CSR csr-zrtr4 added 2021-01-26T03:27:24.595130087Z I0126 03:27:24.595086 1 main.go:147] CSR csr-z72xq added 2021-01-26T03:27:24.616369939Z I0126 03:27:24.616335 1 main.go:147] CSR csr-z72xq added 2021-01-26T03:27:24.646100674Z I0126 03:27:24.646061 1 main.go:147] CSR csr-z72xq added 2021-01-26T03:27:24.678818101Z I0126 03:27:24.678761 1 main.go:147] CSR csr-z72xq added 2021-01-26T03:27:24.735043858Z I0126 03:27:24.734994 1 main.go:147] CSR csr-z72xq added 2021-01-26T03:27:24.829525276Z I0126 03:27:24.829473 1 main.go:147] CSR csr-z72xq added 2021-01-26T03:27:25.166647306Z I0126 03:27:25.166609 1 main.go:147] CSR csr-z72xq added 2021-01-26T03:27:25.726858201Z I0126 03:27:25.726796 1 main.go:147] CSR csr-z72xq added 2021-01-26T03:27:26.448596834Z I0126 03:27:26.448553 1 main.go:147] CSR csr-z72xq added ``` Giving 28 different CSRs: ``` $ cat current.log | grep "CSR .* added" | cut -dC -f2 | sort -u: SR csr-2ch8x added SR csr-2pnzb added SR csr-5n6kw added SR csr-6nr27 added SR csr-8qjnq added SR csr-d7mrj added SR csr-g8nr7 added SR csr-hrflf added SR csr-jhnx9 added SR csr-jlxxt added SR csr-js5wb added SR csr-knfk7 added SR csr-kqzgk added SR csr-l69rp added SR csr-mc5lx added SR csr-pd8tv added SR csr-pkw9x added SR csr-qvm45 added SR csr-rq6mz added SR csr-rxlc9 added SR csr-stddj added SR csr-tbs7h added SR csr-v7qnn added SR csr-vmx9h added SR csr-wm77f added SR csr-z72xq added SR csr-zbwhm added SR csr-zrtr4 added ``` From which the following CSRs are approved (28): ``` $ cat current.log | grep "CSR .* approved" | sed "s/is already approved/approved/g" | cut -dC -f2 | sort -u SR csr-2ch8x approved SR csr-2pnzb approved SR csr-5n6kw approved SR csr-6nr27 approved SR csr-8qjnq approved SR csr-d7mrj approved SR csr-g8nr7 approved SR csr-hrflf approved SR csr-jhnx9 approved SR csr-jlxxt approved SR csr-js5wb approved SR csr-knfk7 approved SR csr-kqzgk approved SR csr-l69rp approved SR csr-mc5lx approved SR csr-pd8tv approved SR csr-pkw9x approved SR csr-qvm45 approved SR csr-rq6mz approved SR csr-rxlc9 approved SR csr-stddj approved SR csr-tbs7h approved SR csr-v7qnn approved SR csr-vmx9h approved SR csr-wm77f approved SR csr-z72xq approved SR csr-zbwhm approved SR csr-zrtr4 approved ```
I could not find any CSRs added but not approved. Zhou, how hard is to reproduce the issue? Next time it happens, can you check cluster-machine-approver if there are any suspicious logs? Also, alongside must-gather, can you also share output of `oc get csr -o yaml` to see which CSRs are still pending?
sure , if reproduce , I'll share the output of `oc get csr -o yaml`
Thanks for the update. I will close the issue until then. Feel free to reopen. Thanks.
In a UPI cluster, unless we can retrieve an existing certificate and prove that the CSR is requesting an identical certificate (ie no IP addresses have changed), we do not have a mechanism to securely say that the certificate is valid, as such, manual approval is required. There is nothing we can do here to automate this that would be considered to be safe and as such, this is a customer responsibility in this case. In this case, why has the customers certificate expired already? Did they scale the cluster down for some period of time perhaps?
*** This bug has been marked as a duplicate of bug 1860774 ***