Bug 1738568 - Rotating node serving CSR did not get auto-approved by operator.
Summary: Rotating node serving CSR did not get auto-approved by operator.
Keywords:
Status: CLOSED DUPLICATE of bug 1737611
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.1.z
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 4.2.0
Assignee: Jan Chaloupka
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-07 13:34 UTC by Muhammad Aizuddin Zali
Modified: 2019-08-27 10:38 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-27 10:38:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Muhammad Aizuddin Zali 2019-08-07 13:34:51 UTC
Description of problem:

After fresh installation of 4.1.8. I left the cluster for 25 hours and check for node CSR approval, but seems like the CSR did not get auto-approved by controller.



Version-Release number of selected component (if applicable):
4.1.8 UPI on Libvirt baremetal.

How reproducible:
Install 4.1.8, leave it for 25 hours and check CSR and node journalctl.

CSR:
I have >100 Pending CSR.

Journalctl:

[root@worker02 ~]# journalctl  | grep expire
Aug 06 13:10:43 localhost.localdomain NetworkManager[873]: <info>  [1565097043.0931] dhcp4 (enp1s0):   expires in 582386604 seconds
Aug 06 13:15:53 worker02 systemd[1]: kubelet.service: Service RestartSec=10s expired, scheduling restart.
Aug 07 07:00:11 localhost.localdomain NetworkManager[857]: <info>  [1565161211.8523] dhcp4 (enp1s0):   expires in 582322436 seconds
Aug 07 13:00:20 worker02 hyperkube[870]: I0807 13:00:20.154747     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:00:21 worker02 hyperkube[870]: I0807 13:00:21.850513     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:00:22 worker02 hyperkube[870]: I0807 13:00:22.418064     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:00:22 worker02 hyperkube[870]: I0807 13:00:22.816861     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:00:23 worker02 hyperkube[870]: I0807 13:00:23.926010     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:00:25 worker02 hyperkube[870]: I0807 13:00:25.077777     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:00:38 worker02 hyperkube[870]: I0807 13:00:38.196123     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:02:04 worker02 hyperkube[870]: I0807 13:02:04.828882     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:02:09 worker02 hyperkube[870]: I0807 13:02:09.321297     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:02:10 worker02 hyperkube[870]: I0807 13:02:10.468860     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:02:11 worker02 hyperkube[870]: I0807 13:02:11.669977     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:02:12 worker02 hyperkube[870]: I0807 13:02:12.116674     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:02:12 worker02 hyperkube[870]: I0807 13:02:12.299825     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:02:12 worker02 hyperkube[870]: I0807 13:02:12.449696     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:02:12 worker02 hyperkube[870]: I0807 13:02:12.614234     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:02:12 worker02 hyperkube[870]: I0807 13:02:12.768592     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:03:26 worker02 hyperkube[870]: I0807 13:03:26.020208     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:03:26 worker02 hyperkube[870]: I0807 13:03:26.035054     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:03:42 worker02 hyperkube[870]: I0807 13:03:42.506574     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:03:42 worker02 hyperkube[870]: I0807 13:03:42.521517     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:03:56 worker02 hyperkube[870]: I0807 13:03:56.020177     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:03:56 worker02 hyperkube[870]: I0807 13:03:56.035186     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:04:12 worker02 hyperkube[870]: I0807 13:04:12.506578     870 certificate_manager.go:213] Current certificate is expired.
Aug 07 13:04:12 worker02 hyperkube[870]: I0807 13:04:12.521522     870 certificate_manager.go:213] Current certificate is expired.



Steps to Reproduce:
1.
2.
3.

Actual results:
Node CSR didnt get auto-approved, hence causing TLS error due to expired bootstrapping certificate after 24 hours.


Expected results:
First rotating CSR should be auto-approved by controller and customer should not need to approved it manually.

Additional info:
'oc adm must-gather' output uploaded to dropbox.redhat.com. /incoming/must-gather.local.7787674465236119942.tar.gz


[root@worker02 ~]# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac44808ab4dd33b4a01f20102e2ab6af3fc649ef78c91a3bd8bd1e94e8bf072a
              CustomOrigin: Managed by pivot tool
                   Version: 410.8.20190724.0 (2019-07-24T20:02:52Z)

  pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:53389c9b4a00d7afebb98f7bd9d20348deb1d77ca4baf194f0ae1b582b7e965b
              CustomOrigin: Provisioned from oscontainer
                   Version: 410.8.20190520.0 (2019-05-20T22:55:04Z)

Comment 1 Muhammad Aizuddin Zali 2019-08-07 13:40:02 UTC
Referring to this URL[1]. "After you approve the initial CSRs, the subsequent node client CSRs are automatically approved by the cluster kube-controller-manager. " 

Not sure if this statement only applies when adding RHEL compute node to the cluster, but I also unable to look for information that we required CSR to be manually approved for first rotation. ( Or I might missed/overlooked this from our docs. )





[1]: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.1/html/machine_management/adding-rhel-compute#installation-approve-csrs_adding-rhel-compute.

Comment 2 Muhammad Aizuddin Zali 2019-08-08 06:48:52 UTC
After going trough our documentation[1] again, re-read these lines, seem I might be confused between kube client certificate that auto-approved by controller and node serving certificate by machine-approver. However for better experience shouldn't this auto approve since the node already part of the cluster?


"3.1.2.4. Certificate signing requests management
Because your cluster has limited access to automatic machine management when you use infrastructure that you provision, you must provide a mechanism for approving cluster certificate signing requests (CSRs) after installation. The kube-controller-manager only approves the kubelet client CSRs. The machine-approver cannot guarantee the validity of a serving certificate that is requested by using kubelet credentials because it cannot confirm that the correct machine issued the request. You must determine and implement a method of verifying the validity of the kubelet serving certificate requests and approving them."



[1]:https://access.redhat.com/documentation/en-us/openshift_container_platform/4.1/html/installing/installing-on-bare-metal#installing-bare-metal

Comment 3 Ryan Sawhill 2019-08-08 19:23:29 UTC
> However for better experience shouldn't this auto approve since the node already part of the cluster?

Agreed. This is so dumb.

Comment 4 Muhammad Aizuddin Zali 2019-08-08 19:32:58 UTC
(In reply to Ryan Sawhill from comment #3)
> > However for better experience shouldn't this auto approve since the node already part of the cluster?
> 
> Agreed. This is so dumb.

I believed this is fundamental features and shouldn't be skipped even in MVP. 

As workaround I need to create a cronjob that approve the existing node serving cert rotation request and skipped bootstrap node CSR approval request[1]. 

[1]:https://github.com/aizuddin85/openshift4/tree/master/serving-cert-approver-workaround

Comment 5 Maciej Szulik 2019-08-19 14:00:33 UTC
Can you provide me with the full output from oc adm must-gather from your cluster?

Comment 6 Muhammad Aizuddin Zali 2019-08-19 14:31:19 UTC
(In reply to Maciej Szulik from comment #5)
> Can you provide me with the full output from oc adm must-gather from your
> cluster?
due to size constraint, i already uploaded to our dropbox.

Additional info:
'oc adm must-gather' output uploaded to dropbox.redhat.com. /incoming/must-gather.local.7787674465236119942.tar.gz

Comment 9 Michal Fojtik 2019-08-26 12:45:57 UTC
cloud team owns the auto-approver.

Comment 10 Jan Chaloupka 2019-08-26 12:48:15 UTC
Muhammad Aizuddin Zali, can you attach the must gather tar file (/incoming/must-gather.local.7787674465236119942.tar.gz
) into this issue?

Comment 12 Brad Ison 2019-08-27 10:38:12 UTC
This is a known, documented, limitation on UPI installs. The cluster-machine-approver relies on data from the machine-api to authorized CSRs. When that data is not available, it doesn't preform the authorization. We're exploring ways of handling renewals without the need for the machine-api however.

See: https://bugzilla.redhat.com/show_bug.cgi?id=1737611

*** This bug has been marked as a duplicate of bug 1737611 ***


Note You need to log in before you can comment on or make changes to this bug.