Description of problem: After fresh installation of 4.1.8. I left the cluster for 25 hours and check for node CSR approval, but seems like the CSR did not get auto-approved by controller. Version-Release number of selected component (if applicable): 4.1.8 UPI on Libvirt baremetal. How reproducible: Install 4.1.8, leave it for 25 hours and check CSR and node journalctl. CSR: I have >100 Pending CSR. Journalctl: [root@worker02 ~]# journalctl | grep expire Aug 06 13:10:43 localhost.localdomain NetworkManager[873]: <info> [1565097043.0931] dhcp4 (enp1s0): expires in 582386604 seconds Aug 06 13:15:53 worker02 systemd[1]: kubelet.service: Service RestartSec=10s expired, scheduling restart. Aug 07 07:00:11 localhost.localdomain NetworkManager[857]: <info> [1565161211.8523] dhcp4 (enp1s0): expires in 582322436 seconds Aug 07 13:00:20 worker02 hyperkube[870]: I0807 13:00:20.154747 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:00:21 worker02 hyperkube[870]: I0807 13:00:21.850513 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:00:22 worker02 hyperkube[870]: I0807 13:00:22.418064 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:00:22 worker02 hyperkube[870]: I0807 13:00:22.816861 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:00:23 worker02 hyperkube[870]: I0807 13:00:23.926010 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:00:25 worker02 hyperkube[870]: I0807 13:00:25.077777 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:00:38 worker02 hyperkube[870]: I0807 13:00:38.196123 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:02:04 worker02 hyperkube[870]: I0807 13:02:04.828882 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:02:09 worker02 hyperkube[870]: I0807 13:02:09.321297 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:02:10 worker02 hyperkube[870]: I0807 13:02:10.468860 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:02:11 worker02 hyperkube[870]: I0807 13:02:11.669977 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:02:12 worker02 hyperkube[870]: I0807 13:02:12.116674 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:02:12 worker02 hyperkube[870]: I0807 13:02:12.299825 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:02:12 worker02 hyperkube[870]: I0807 13:02:12.449696 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:02:12 worker02 hyperkube[870]: I0807 13:02:12.614234 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:02:12 worker02 hyperkube[870]: I0807 13:02:12.768592 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:03:26 worker02 hyperkube[870]: I0807 13:03:26.020208 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:03:26 worker02 hyperkube[870]: I0807 13:03:26.035054 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:03:42 worker02 hyperkube[870]: I0807 13:03:42.506574 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:03:42 worker02 hyperkube[870]: I0807 13:03:42.521517 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:03:56 worker02 hyperkube[870]: I0807 13:03:56.020177 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:03:56 worker02 hyperkube[870]: I0807 13:03:56.035186 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:04:12 worker02 hyperkube[870]: I0807 13:04:12.506578 870 certificate_manager.go:213] Current certificate is expired. Aug 07 13:04:12 worker02 hyperkube[870]: I0807 13:04:12.521522 870 certificate_manager.go:213] Current certificate is expired. Steps to Reproduce: 1. 2. 3. Actual results: Node CSR didnt get auto-approved, hence causing TLS error due to expired bootstrapping certificate after 24 hours. Expected results: First rotating CSR should be auto-approved by controller and customer should not need to approved it manually. Additional info: 'oc adm must-gather' output uploaded to dropbox.redhat.com. /incoming/must-gather.local.7787674465236119942.tar.gz [root@worker02 ~]# rpm-ostree status State: idle AutomaticUpdates: disabled Deployments: ● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac44808ab4dd33b4a01f20102e2ab6af3fc649ef78c91a3bd8bd1e94e8bf072a CustomOrigin: Managed by pivot tool Version: 410.8.20190724.0 (2019-07-24T20:02:52Z) pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:53389c9b4a00d7afebb98f7bd9d20348deb1d77ca4baf194f0ae1b582b7e965b CustomOrigin: Provisioned from oscontainer Version: 410.8.20190520.0 (2019-05-20T22:55:04Z)
Referring to this URL[1]. "After you approve the initial CSRs, the subsequent node client CSRs are automatically approved by the cluster kube-controller-manager. " Not sure if this statement only applies when adding RHEL compute node to the cluster, but I also unable to look for information that we required CSR to be manually approved for first rotation. ( Or I might missed/overlooked this from our docs. ) [1]: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.1/html/machine_management/adding-rhel-compute#installation-approve-csrs_adding-rhel-compute.
After going trough our documentation[1] again, re-read these lines, seem I might be confused between kube client certificate that auto-approved by controller and node serving certificate by machine-approver. However for better experience shouldn't this auto approve since the node already part of the cluster? "3.1.2.4. Certificate signing requests management Because your cluster has limited access to automatic machine management when you use infrastructure that you provision, you must provide a mechanism for approving cluster certificate signing requests (CSRs) after installation. The kube-controller-manager only approves the kubelet client CSRs. The machine-approver cannot guarantee the validity of a serving certificate that is requested by using kubelet credentials because it cannot confirm that the correct machine issued the request. You must determine and implement a method of verifying the validity of the kubelet serving certificate requests and approving them." [1]:https://access.redhat.com/documentation/en-us/openshift_container_platform/4.1/html/installing/installing-on-bare-metal#installing-bare-metal
> However for better experience shouldn't this auto approve since the node already part of the cluster? Agreed. This is so dumb.
(In reply to Ryan Sawhill from comment #3) > > However for better experience shouldn't this auto approve since the node already part of the cluster? > > Agreed. This is so dumb. I believed this is fundamental features and shouldn't be skipped even in MVP. As workaround I need to create a cronjob that approve the existing node serving cert rotation request and skipped bootstrap node CSR approval request[1]. [1]:https://github.com/aizuddin85/openshift4/tree/master/serving-cert-approver-workaround
Can you provide me with the full output from oc adm must-gather from your cluster?
(In reply to Maciej Szulik from comment #5) > Can you provide me with the full output from oc adm must-gather from your > cluster? due to size constraint, i already uploaded to our dropbox. Additional info: 'oc adm must-gather' output uploaded to dropbox.redhat.com. /incoming/must-gather.local.7787674465236119942.tar.gz
cloud team owns the auto-approver.
Muhammad Aizuddin Zali, can you attach the must gather tar file (/incoming/must-gather.local.7787674465236119942.tar.gz ) into this issue?
This is a known, documented, limitation on UPI installs. The cluster-machine-approver relies on data from the machine-api to authorized CSRs. When that data is not available, it doesn't preform the authorization. We're exploring ways of handling renewals without the need for the machine-api however. See: https://bugzilla.redhat.com/show_bug.cgi?id=1737611 *** This bug has been marked as a duplicate of bug 1737611 ***