Bug 1917690 - [UPI vsphere]Failed to approve the kubelet client certificate requests
Summary: [UPI vsphere]Failed to approve the kubelet client certificate requests
Keywords:
Status: CLOSED DUPLICATE of bug 1860774
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Joel Speed
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-19 07:11 UTC by zhou ying
Modified: 2024-12-20 19:31 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-25 13:27:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description zhou ying 2021-01-19 07:11:22 UTC
Description of problem:
Failed to approve the kubelet client certificate requests for the UPI installation cluster after run some times.

Version-Release number of selected component (if applicable):
payload: 4.7.0-0.nightly-2021-01-18-000316

How reproducible:
Sometimes

Steps to Reproduce:
1) UPI install OCP cluster on vsphere;
2) After run some time, the node will "NotReady", check the CSR, will see pending;
3) Check the logs of KCM

Actual results:
2) [root@dhcp-140-138 ~]# oc get csr
NAME        AGE     SIGNERNAME                                    REQUESTOR                                                                   CONDITION
csr-5wcnj   3h54m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-b4q2b   3h39m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending


3) See error from the KCM:
oc logs -f po/kube-controller-manager-control-plane-2
......
I0119 02:58:10.961653       1 cert_rotation.go:88] certificate rotation detected, shutting down client connections to start using new credentials
I0119 02:58:35.097411       1 cert_rotation.go:88] certificate rotation detected, shutting down client connections to start using new credentials
E0119 02:58:35.099744       1 webhook.go:116] Failed to make webhook authenticator request: Post "https://api-int.xxia18shared.qe.devcluster.openshift.com:6443/apis/authentication.k8s.io/v1/tokenreviews?timeout=10s": write tcp 172.31.248.29:58906->172.31.248.37:6443: use of closed network connection
E0119 02:58:35.099815       1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Post "https://api-int.xxia18shared.qe.devcluster.openshift.com:6443/apis/authentication.k8s.io/v1/tokenreviews?timeout=10s": write tcp 172.31.248.29:58906->172.31.248.37:6443: use of closed network connection]

Expected results:
2) No pending CSR;
3) No error, node client CSRs are automatically approved by the cluster kube-controller-manager.


Additional info:

Comment 5 Maciej Szulik 2021-02-02 17:18:14 UTC
Jan if in doubt sync with Zhoe, she might be able to provide you with more background information.

Comment 6 Jan Chaloupka 2021-02-04 11:54:10 UTC
> Expected results:
> 2) No pending CSR;
> 3) No error, node client CSRs are automatically approved by the cluster kube-controller-manager.

OpenShift uses cluster-machine-approver to automatically approve CSRs: https://github.com/openshift/cluster-machine-approver#node-client-csr-approval-workflow.

Checking logs of must-gather.local.4215719760916163428/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-ea0726ef5ce36297001001454c6c56b13a5f7a2cb201b7cab31a42d86bf5648a/namespaces/openshift-cluster-machine-approver/pods/machine-approver-7f9d4c7d59-4gx4k/machine-approver-controller/machine-approver-controller/logs/current.log (CSRs are getting approved):
```
...
2021-01-26T03:27:24.595130087Z I0126 03:27:24.595086       1 main.go:147] CSR csr-z72xq added
2021-01-26T03:27:24.608409139Z I0126 03:27:24.608362       1 csr_check.go:419] retrieving serving cert from xiuwang-shared-w9nc5-worker-bnnfq (172.31.247.239:10250)
2021-01-26T03:27:24.611066248Z W0126 03:27:24.611027       1 csr_check.go:178] Failed to retrieve current serving cert: remote error: tls: internal error
2021-01-26T03:27:24.611128342Z I0126 03:27:24.611118       1 csr_check.go:183] Falling back to machine-api authorization for xiuwang-shared-w9nc5-worker-bnnfq
2021-01-26T03:27:24.611161741Z I0126 03:27:24.611153       1 main.go:182] CSR csr-z72xq not authorized: IP address '172.31.247.239' not in machine addresses: 
2021-01-26T03:27:24.611184159Z I0126 03:27:24.611176       1 main.go:218] Error syncing csr csr-z72xq: IP address '172.31.247.239' not in machine addresses: 
...
2021-01-26T03:27:26.476956626Z I0126 03:27:26.476920       1 csr_check.go:419] retrieving serving cert from xiuwang-shared-w9nc5-worker-bnnfq (172.31.247.239:10250)
2021-01-26T03:27:26.478468707Z W0126 03:27:26.477991       1 csr_check.go:178] Failed to retrieve current serving cert: remote error: tls: internal error
2021-01-26T03:27:26.478468707Z I0126 03:27:26.478016       1 csr_check.go:183] Falling back to machine-api authorization for xiuwang-shared-w9nc5-worker-bnnfq
2021-01-26T03:27:26.489758522Z W0126 03:27:26.489724       1 warnings.go:70] certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest
2021-01-26T03:27:26.489915769Z I0126 03:27:26.489905       1 main.go:197] CSR csr-z72xq approved
...
```

Also /must-gather.local.4215719760916163428/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-ea0726ef5ce36297001001454c6c56b13a5f7a2cb201b7cab31a42d86bf5648a/namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-control-plane-1/kube-controller-manager/kube-controller-manager/logs/current.log contains:
```
2021-01-26T01:05:47.786069064Z I0126 01:05:47.785984       1 cleaner.go:180] Cleaning CSR "csr-js5wb" as it is more than 1h0m0s old and approved.
2021-01-26T01:05:47.792841774Z I0126 01:05:47.792795       1 cleaner.go:180] Cleaning CSR "csr-knfk7" as it is more than 1h0m0s old and approved.
2021-01-26T01:05:47.800169927Z I0126 01:05:47.800126       1 cleaner.go:180] Cleaning CSR "csr-rq6mz" as it is more than 1h0m0s old and approved.
2021-01-26T01:05:47.806977519Z I0126 01:05:47.806923       1 cleaner.go:180] Cleaning CSR "csr-mc5lx" as it is more than 1h0m0s old and approved.
2021-01-26T03:05:47.814419036Z I0126 03:05:47.814254       1 cleaner.go:180] Cleaning CSR "csr-2ch8x" as it is more than 1h0m0s old and approved.
2021-01-26T03:05:47.826164361Z I0126 03:05:47.826051       1 cleaner.go:180] Cleaning CSR "csr-stddj" as it is more than 1h0m0s old and approved.
2021-01-26T05:05:47.837761737Z I0126 05:05:47.837667       1 cleaner.go:180] Cleaning CSR "csr-pkw9x" as it is more than 1h0m0s old and approved.
2021-01-26T05:05:47.852182999Z I0126 05:05:47.852138       1 cleaner.go:180] Cleaning CSR "csr-zrtr4" as it is more than 1h0m0s old and approved.
2021-01-26T05:05:47.860966303Z I0126 05:05:47.860909       1 cleaner.go:180] Cleaning CSR "csr-z72xq" as it is more than 1h0m0s old and approved.
2021-01-26T05:05:47.868394170Z I0126 05:05:47.868341       1 cleaner.go:180] Cleaning CSR "csr-hrflf" as it is more than 1h0m0s old and approved.
2021-01-26T05:05:47.876019229Z I0126 05:05:47.875970       1 cleaner.go:180] Cleaning CSR "csr-zbwhm" as it is more than 1h0m0s old and approved.
2021-01-26T05:05:47.884147927Z I0126 05:05:47.884070       1 cleaner.go:180] Cleaning CSR "csr-pd8tv" as it is more than 1h0m0s old and approved.
```

So KCM is acknowledging approved CSRs.

From quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-ea0726ef5ce36297001001454c6c56b13a5f7a2cb201b7cab31a42d86bf5648a/namespaces/openshift-cluster-machine-approver/pods/machine-approver-7f9d4c7d59-4gx4k/machine-approver-controller/machine-approver-controller/logs/current.log

CRS added:
```
$ cat current.log | grep "CSR .* added":
2021-01-25T03:12:45.576681774Z I0125 03:12:45.576619       1 main.go:147] CSR csr-2pnzb added
2021-01-25T03:12:45.576681774Z I0125 03:12:45.576646       1 main.go:147] CSR csr-6nr27 added
2021-01-25T03:12:45.576681774Z I0125 03:12:45.576656       1 main.go:147] CSR csr-g8nr7 added
2021-01-25T03:12:45.576714915Z I0125 03:12:45.576687       1 main.go:147] CSR csr-jhnx9 added
2021-01-25T03:12:45.576714915Z I0125 03:12:45.576696       1 main.go:147] CSR csr-v7qnn added
2021-01-25T03:12:45.576714915Z I0125 03:12:45.576705       1 main.go:147] CSR csr-vmx9h added
2021-01-25T03:12:45.576714915Z I0125 03:12:45.576711       1 main.go:147] CSR csr-wm77f added
2021-01-25T03:12:45.576724730Z I0125 03:12:45.576718       1 main.go:147] CSR csr-8qjnq added
2021-01-25T03:12:45.576732752Z I0125 03:12:45.576725       1 main.go:147] CSR csr-kqzgk added
2021-01-25T03:12:45.576739414Z I0125 03:12:45.576732       1 main.go:147] CSR csr-rxlc9 added
2021-01-25T20:38:24.835444834Z I0125 20:38:24.835391       1 main.go:147] CSR csr-5n6kw added
2021-01-25T20:38:24.860634962Z I0125 20:38:24.860591       1 main.go:147] CSR csr-5n6kw added
2021-01-25T21:26:00.756217511Z I0125 21:26:00.756157       1 main.go:147] CSR csr-qvm45 added
2021-01-25T22:16:59.256412130Z I0125 22:16:59.256354       1 main.go:147] CSR csr-tbs7h added
2021-01-25T22:16:59.267781829Z I0125 22:16:59.267745       1 main.go:147] CSR csr-tbs7h added
2021-01-25T22:16:59.283727703Z I0125 22:16:59.283690       1 main.go:147] CSR csr-tbs7h added
2021-01-25T22:25:41.854052553Z I0125 22:25:41.853968       1 main.go:147] CSR csr-jlxxt added
2021-01-25T22:37:17.628433076Z I0125 22:37:17.628374       1 main.go:147] CSR csr-d7mrj added
2021-01-25T22:37:17.646049944Z I0125 22:37:17.646009       1 main.go:147] CSR csr-d7mrj added
2021-01-25T22:43:04.990894945Z I0125 22:43:04.990825       1 main.go:147] CSR csr-l69rp added
2021-01-25T23:03:53.076274391Z I0125 23:03:53.076180       1 main.go:147] CSR csr-knfk7 added
2021-01-25T23:03:53.089004332Z I0125 23:03:53.088964       1 main.go:147] CSR csr-knfk7 added
2021-01-25T23:03:53.105461282Z I0125 23:03:53.105412       1 main.go:147] CSR csr-knfk7 added
2021-01-25T23:17:39.516617444Z I0125 23:17:39.516552       1 main.go:147] CSR csr-rq6mz added
2021-01-25T23:17:39.529278097Z I0125 23:17:39.529191       1 main.go:147] CSR csr-rq6mz added
2021-01-25T23:25:26.858634167Z I0125 23:25:26.858574       1 main.go:147] CSR csr-mc5lx added
2021-01-25T23:30:07.345276514Z I0125 23:30:07.345217       1 main.go:147] CSR csr-js5wb added
2021-01-26T01:25:20.526586583Z I0126 01:25:20.526526       1 main.go:147] CSR csr-2ch8x added
2021-01-26T01:31:00.161514200Z I0126 01:31:00.161440       1 main.go:147] CSR csr-stddj added
2021-01-26T01:31:00.175973528Z I0126 01:31:00.175932       1 main.go:147] CSR csr-stddj added
2021-01-26T03:20:37.839047884Z I0126 03:20:37.838972       1 main.go:147] CSR csr-hrflf added
2021-01-26T03:20:50.157354677Z I0126 03:20:50.157000       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:20:50.177332159Z I0126 03:20:50.177238       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:20:50.209774493Z I0126 03:20:50.209691       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:20:50.244169208Z I0126 03:20:50.244110       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:20:50.299331231Z I0126 03:20:50.299239       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:20:50.392126861Z I0126 03:20:50.392067       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:20:50.727348557Z I0126 03:20:50.727259       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:20:51.287633657Z I0126 03:20:51.287575       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:20:52.008331687Z I0126 03:20:52.008274       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:20:53.301388843Z I0126 03:20:53.301319       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:20:55.878502244Z I0126 03:20:55.878434       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:21:01.010779346Z I0126 03:21:01.010700       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:21:11.274376435Z I0126 03:21:11.274307       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:21:31.775219228Z I0126 03:21:31.775151       1 main.go:147] CSR csr-zbwhm added
2021-01-26T03:27:10.946750711Z I0126 03:27:10.946665       1 main.go:147] CSR csr-pd8tv added
2021-01-26T03:27:12.275271394Z I0126 03:27:12.275006       1 main.go:147] CSR csr-pkw9x added
2021-01-26T03:27:23.256632353Z I0126 03:27:23.256589       1 main.go:147] CSR csr-zrtr4 added
2021-01-26T03:27:24.595130087Z I0126 03:27:24.595086       1 main.go:147] CSR csr-z72xq added
2021-01-26T03:27:24.616369939Z I0126 03:27:24.616335       1 main.go:147] CSR csr-z72xq added
2021-01-26T03:27:24.646100674Z I0126 03:27:24.646061       1 main.go:147] CSR csr-z72xq added
2021-01-26T03:27:24.678818101Z I0126 03:27:24.678761       1 main.go:147] CSR csr-z72xq added
2021-01-26T03:27:24.735043858Z I0126 03:27:24.734994       1 main.go:147] CSR csr-z72xq added
2021-01-26T03:27:24.829525276Z I0126 03:27:24.829473       1 main.go:147] CSR csr-z72xq added
2021-01-26T03:27:25.166647306Z I0126 03:27:25.166609       1 main.go:147] CSR csr-z72xq added
2021-01-26T03:27:25.726858201Z I0126 03:27:25.726796       1 main.go:147] CSR csr-z72xq added
2021-01-26T03:27:26.448596834Z I0126 03:27:26.448553       1 main.go:147] CSR csr-z72xq added
```

Giving 28 different CSRs:
```
$ cat current.log | grep "CSR .* added" | cut -dC -f2 | sort -u:
SR csr-2ch8x added
SR csr-2pnzb added
SR csr-5n6kw added
SR csr-6nr27 added
SR csr-8qjnq added
SR csr-d7mrj added
SR csr-g8nr7 added
SR csr-hrflf added
SR csr-jhnx9 added
SR csr-jlxxt added
SR csr-js5wb added
SR csr-knfk7 added
SR csr-kqzgk added
SR csr-l69rp added
SR csr-mc5lx added
SR csr-pd8tv added
SR csr-pkw9x added
SR csr-qvm45 added
SR csr-rq6mz added
SR csr-rxlc9 added
SR csr-stddj added
SR csr-tbs7h added
SR csr-v7qnn added
SR csr-vmx9h added
SR csr-wm77f added
SR csr-z72xq added
SR csr-zbwhm added
SR csr-zrtr4 added
```

From which the following CSRs are approved (28):
```
$ cat current.log | grep "CSR .* approved" | sed "s/is already approved/approved/g" | cut -dC -f2 | sort -u
SR csr-2ch8x approved
SR csr-2pnzb approved
SR csr-5n6kw approved
SR csr-6nr27 approved
SR csr-8qjnq approved
SR csr-d7mrj approved
SR csr-g8nr7 approved
SR csr-hrflf approved
SR csr-jhnx9 approved
SR csr-jlxxt approved
SR csr-js5wb approved
SR csr-knfk7 approved
SR csr-kqzgk approved
SR csr-l69rp approved
SR csr-mc5lx approved
SR csr-pd8tv approved
SR csr-pkw9x approved
SR csr-qvm45 approved
SR csr-rq6mz approved
SR csr-rxlc9 approved
SR csr-stddj approved
SR csr-tbs7h approved
SR csr-v7qnn approved
SR csr-vmx9h approved
SR csr-wm77f approved
SR csr-z72xq approved
SR csr-zbwhm approved
SR csr-zrtr4 approved
```

Comment 7 Jan Chaloupka 2021-02-04 12:09:49 UTC
I could not find any CSRs added but not approved. Zhou, how hard is to reproduce the issue? Next time it happens, can you check cluster-machine-approver if there are any suspicious logs? Also, alongside must-gather, can you also share output of `oc get csr -o yaml` to see which CSRs are still pending?

Comment 8 zhou ying 2021-02-22 02:44:04 UTC
sure , if reproduce , I'll share the output of `oc get csr -o yaml`

Comment 9 Jan Chaloupka 2021-02-22 09:55:01 UTC
Thanks for the update. I will close the issue until then. Feel free to reopen. Thanks.

Comment 12 Joel Speed 2021-06-22 16:17:33 UTC
In a UPI cluster, unless we can retrieve an existing certificate and prove that the CSR is requesting an identical certificate (ie no IP addresses have changed), we do not have a mechanism to securely say that the certificate is valid, as such, manual approval is required.

There is nothing we can do here to automate this that would be considered to be safe and as such, this is a customer responsibility in this case.

In this case, why has the customers certificate expired already? Did they scale the cluster down for some period of time perhaps?

Comment 16 Joel Speed 2021-06-25 13:27:45 UTC

*** This bug has been marked as a duplicate of bug 1860774 ***


Note You need to log in before you can comment on or make changes to this bug.