Bug 1693951

Summary: TLS errors due to expired kubelet certificates after node was shutdown
Product: OpenShift Container Platform Reporter: Gerard Braad (Red Hat) <gbraad>
Component: NodeAssignee: Ryan Phillips <rphillips>
Status: CLOSED ERRATA QA Contact: Sunil Choudhary <schoudha>
Severity: low Docs Contact:
Priority: medium    
Version: 4.2.0CC: ablum, anjan, aos-bugs, cfergeau, dconsoli, eparis, erich, fbrychta, jokerman, jrosenta, lbednar, lmohanty, maszulik, mfojtik, mfuruta, mmccomas, prkumar, rh-container, rphillips, sapandit, scuppett, tnozicka, veillard, vlaad, wking, yinzhou
Target Milestone: ---Keywords: Reopened
Target Release: 4.4.0   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-21 19:16:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1735180    
Bug Blocks:    

Description Gerard Braad (Red Hat) 2019-03-29 06:52:28 UTC
Description of problem:

When the VM has been shut down for a period of time, it is unable to communicate with the api server as it logs TLS errors

Version-Release number of the following components:
openshift 4.x, installer 0.15.0

How reproducible:

Steps to Reproduce:
1. Install openshift, preferably with a single master and a worker.
2. After install, shut down VM for a long enough period (3+ hours perhaps?)
3. restart the cluster


Actual results:
[root@test1-svtfv-master-0 ~]# cd /var/lib/kubelet/pki/
[root@test1-svtfv-master-0 pki]# openssl x509 -in kubelet-client-current.pem -noout -text > kubelet-client-current.txt
[root@test1-svtfv-master-0 pki]# openssl x509 -in kubelet-server-current.pem -noout -text > kubelet-server-current.txt
[root@test1-svtfv-master-0 pki]# less kubelet-client-current.txt 
[root@test1-svtfv-master-0 pki]# less kubelet-client-current.txt 
[root@test1-svtfv-master-0 pki]# cd /var/lib/kubelet/pki/
[root@test1-svtfv-master-0 pki]# openssl x509 -in kubelet-client-current.pem -noout -text > kubelet-client-current.txt
[root@test1-svtfv-master-0 pki]# openssl x509 -in kubelet-server-current.pem -noout -text > kubelet-server-current.txt
[root@test1-svtfv-master-0 pki]# cat kubelet-client-current.txt | grep Not
            Not Before: Mar 27 06:57:00 2019 GMT
            Not After : Mar 28 06:39:47 2019 GMT
[root@test1-svtfv-master-0 pki]# cat kubelet-server-current.txt | grep Not
            Not Before: Mar 27 07:02:00 2019 GMT
            Not After : Mar 28 06:39:39 2019 GMT
[root@test1-svtfv-master-0 pki]# date
vr 29 mrt 2019  5:21:18 UTC
[root@test1-svtfv-master-0 pki]# 

this causes tls: internal error being logged


Expected results:
No TLS error

Comment 1 Gerard Braad (Red Hat) 2019-03-29 06:52:57 UTC
Also filed as: https://github.com/openshift/installer/issues/1494

Comment 2 W. Trevor King 2019-03-29 11:29:03 UTC
So it looks like those were one-day certs.  I don't know how often they are rotated, what installer and release image were you using?  The 0.15.0 installer sets up the kubelet client with a one-day cert [1] (and probably the kubelet server too, although I haven't tracked down a link).  Then the cluster rotates the certs with... something.  For example, see [2] for the Kubernetes API server and related totations.  But nothing in [3] is jumping out at me as the kubelet certs you're having issues with.  Moving to the auth team, since they'll probably know, although the code itself may live in a master-team repo.

But if you want to shut down nodes, you'll certainly want to wait after the initial install, for a whole day or however long it takes for the first in-cluster rotations to go through, to get certs with longer validity times before shutting down nodes.  The auth/master teams may also have some advice for monitoring those rotations, even if it's just "grep the kube-apiserver-operator logs".  Maybe there are Kubernetes Events you can watch for?  I dunno.

Alternatively, you can just let the certs expire, and when the cluster comes back up, use SSH (which we don't expire/rotate) to go through and rebuild the x.509 chains.  I know that sort of thing has been discussed before, but don't have a reference handy at the moment.  I'll see if I can dig up a link later.

[1]: https://github.com/openshift/installer/blob/v0.15.0/pkg/asset/tls/kubelet.go#L184
[2]: https://github.com/openshift/cluster-kube-apiserver-operator/pull/342
[3]: https://github.com/openshift/cluster-kube-apiserver-operator/blob/0b686ff00295c382f245b0b4103a566d672498c8/pkg/operator/certrotationcontroller/certrotationcontroller.go

Comment 3 Gerard Braad (Red Hat) 2019-03-29 11:52:15 UTC
>  But if you want to shut down nodes, you'll certainly want to wait after the initial install, for a whole day or however long it takes for the first in-cluster rotations to go through,

that is unacceptable as this would be part of a delivery pipeline


> use SSH (which we don't expire/rotate) to go through and rebuild the x.509 chains.

This pre-provisioning of the certificates is what we prefer, as this also allows to create certs that are created for a longer period. So far we have not seen/received any instructions about this.

Comment 4 W. Trevor King 2019-03-29 12:21:07 UTC
For recovery, I may have been remembering this ask [1], although there was no further discussion there.  There's another ask for a manual rotation trigger in [2].  I'm not aware of procedures for either, but there may be more-specific trackers somewhere that I just haven't turned up (or maybe not :p).

[1]: https://github.com/openshift/installer/blob/v0.15.0/docs/user/troubleshooting.md#unable-to-ssh-into-master-nodes
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1684547#c27

Comment 5 W. Trevor King 2019-03-29 12:22:50 UTC
Oops, stale paste.  [1] above should have been:

https://github.com/openshift/api/pull/199#discussion_r261689426

Comment 6 W. Trevor King 2019-03-29 12:56:25 UTC
I hear kubelet certs are the Pod team, so reassigning to see if they can link the rotation code and/or have ideas about triggering, monitoring, or recovering cert rotation.

Comment 7 W. Trevor King 2019-03-29 13:09:12 UTC
Recovery tool now has a tracker in bug 1694079.

Comment 8 Praveen Kumar 2019-03-29 13:50:55 UTC
In my case I shutdown the VM for less than 2 hour and started again which was working but when I checked the kubelet cert it was vaild for 3 hours so does that mean within 24 hours till the kubelet get proper 30 days valid cert it rotate in ever 2-3 hour?

```
$ oc get nodes
NAME                   STATUS   ROLES           AGE   VERSION
test1-svtfv-master-0   Ready    master,worker   47h   v1.12.4+30e6a0f55

# cat kubelet-client-current.txt | grep Not
            Not Before: Mar 29 05:58:00 2019 GMT
            Not After : Mar 29 08:43:34 2019 GMT

# date
Fri Mar 29 06:27:12 UTC 2019
```

Comment 9 Seth Jennings 2019-03-29 14:14:35 UTC
This bug is strange because you can have a single master setup in OCP 4.x.  HA is required; 3 master minimum.

This is a known issue with rapidly rotating the kubelet client/server certs.  If the kubelet is down during the time it would normally do the rotation and doesn't come back up before the existing certs expire, then the kubelet is unable to connect to the apiserver after that.

I think I'm going to dup this to the recovery tool tracker because it does take external intervention to resolve this situation.

*** This bug has been marked as a duplicate of bug 1694079 ***

Comment 11 Seth Jennings 2019-04-01 19:20:48 UTC
This was changed 5 days ago to a 60 day validity time and 30 day rotation time
https://github.com/openshift/cluster-kube-controller-manager-operator/pull/203

Comment 14 W. Trevor King 2019-04-02 03:05:36 UTC
I dunno if the commit is available in a nightly yet, but PR 203 has certainly landed, so this should be at least MODIFIED.

Comment 15 Praveen Kumar 2019-04-02 05:47:31 UTC
So I tried the installer master which have this PR in as a payload for controller operator but then also I had to wait around 24 hour till the kubelet client/server cert actually rotated for a month validity.


```
$ openshift-install version
openshift-install unreleased-master-663-g086a88534ad03776c97e31a843658e53e0088e78
built from commit 086a88534ad03776c97e31a843658e53e0088e78

[root@test1-sbgjm-master-0 pki]# uptime -p
up 14 hours, 32 minutes

[root@test1-sbgjm-master-0 pki] # openssl x509 -in kubelet-client-2019-03-31-13-57-00.pem -noout -text | grep Not
            Not Before: Mar 31 13:52:00 2019 GMT
            Not After : Apr  1 13:34:52 2019 GMT
            
[root@test1-sbgjm-master-0 pki]# uptime -p
up 1 day, 1 hours, 51 minutes

[root@test1-sbgjm-master-0 pki] # openssl x509 -in kubelet-client-2019-04-01-11-04-04.pem -noout -text | grep Not
            Not Before: Apr  1 10:59:00 2019 GMT
            Not After : May  1 08:50:40 2019 GMT
			
[root@test1-sbgjm-master-0 pki] # openssl x509 -in kubelet-server-2019-04-01-09-03-42.pem  -noout -text | grep Not
            Not Before: Apr  1 08:59:00 2019 GMT
            Not After : May  1 08:51:02 2019 GMT

```

So I am still wondering that https://bugzilla.redhat.com/show_bug.cgi?id=1693951#c8 "the kubelet cert it was vaild for 3 hours so does that mean within 24 hours till the kubelet get proper 30 days valid cert it rotate in ever 2-3 hour?"

Comment 16 Praveen Kumar 2019-04-02 05:48:58 UTC
(In reply to Praveen Kumar from comment #15)
> So I tried the installer master which have this PR in as a payload for
> controller operator but then also I had to wait around 24 hour till the
> kubelet client/server cert actually rotated for a month validity.
> 
> 
> ```
> $ openshift-install version
> openshift-install
> unreleased-master-663-g086a88534ad03776c97e31a843658e53e0088e78
> built from commit 086a88534ad03776c97e31a843658e53e0088e78
> 
> [root@test1-sbgjm-master-0 pki]# uptime -p
> up 14 hours, 32 minutes
> 
> [root@test1-sbgjm-master-0 pki] # openssl x509 -in
> kubelet-client-2019-03-31-13-57-00.pem -noout -text | grep Not
>             Not Before: Mar 31 13:52:00 2019 GMT
>             Not After : Apr  1 13:34:52 2019 GMT
>             
> [root@test1-sbgjm-master-0 pki]# uptime -p
> up 1 day, 1 hours, 51 minutes
> 
> [root@test1-sbgjm-master-0 pki] # openssl x509 -in
> kubelet-client-2019-04-01-11-04-04.pem -noout -text | grep Not
>             Not Before: Apr  1 10:59:00 2019 GMT
>             Not After : May  1 08:50:40 2019 GMT
> 			
> [root@test1-sbgjm-master-0 pki] # openssl x509 -in
> kubelet-server-2019-04-01-09-03-42.pem  -noout -text | grep Not
>             Not Before: Apr  1 08:59:00 2019 GMT
>             Not After : May  1 08:51:02 2019 GMT
> 
> ```
> 
> So I am still wondering that
> https://bugzilla.redhat.com/show_bug.cgi?id=1693951#c8 "the kubelet cert it
> was vaild for 3 hours so does that mean within 24 hours till the kubelet get
> proper 30 days valid cert it rotate in ever 2-3 hour?"

Forgot to add the payload info for this installer binary.

```
$ oc adm release info --commits | grep cluster-kube-controller-manager-operator
  cluster-kube-controller-manager-operator      https://github.com/openshift/cluster-kube-controller-manager-operator      4e5073f837b1262db0c390c30b275b293db0b469
```

Comment 17 Maciej Szulik 2019-04-09 14:27:20 UTC
The solution will be provided in this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1694079

*** This bug has been marked as a duplicate of bug 1694079 ***

Comment 18 Daniel Veillard 2019-06-05 07:20:05 UTC
This is reopened. This was closed as a Duplicate of 1694079

The bug 1694079 is now closed with fixes but if you try to run the resulting script it fails for
us in the 2 tests we tried based on Code Ready constainers, plus the instructions ask to wait
15 minutes plaus 20 minutes which means the operations takes at least 35 minutes which may be fine for an
online cluster but absolutely not adequate for a developer waiting for his envioronment to start.

https://docs.google.com/document/d/1ONkxdDmQVLBNJrSJymfKPrndo7b4vgCA2zwL9xHYx6A/edit

So reopened, the solution for 1694079 is not adequate for this bug,

Daniel Veillard

Comment 20 Daniel Veillard 2019-06-05 12:31:49 UTC
The problem related to bug 1694079 was pasted there, didn't work for us but was automatically closed nonetheless.
I think we provided all required infos at this point.

Comment 21 Tomáš Nožička 2019-07-01 06:35:20 UTC
I think this bug is now outdated. If a machine is kept turned down for a period of time long enough for certs to expire the recommended path is to run certificate recovery steps:

  https://docs.openshift.com/container-platform/4.1/disaster_recovery/scenario-3-expired-certs.html

Comment 22 Daniel Veillard 2019-07-19 14:50:10 UTC
Tomáš the document indiocates at leaqst 35 mn for the recovery process to succeed. We can't expect developers to wait that timne for their cluster to show up.

Comment 23 Anjan 2019-08-06 15:33:51 UTC
So i tried the force certification steps from the above gdoc, not sure what i am doing wrong but it does not rotate certs in the cluster.
I am using the libvirt build for testing this and following are the steps that i followed:

```
# validity is 30 times the base (30*9000s = 270000s)
oc create -n openshift-config configmap unsupported-cert-rotation-config --from-literal='base=9000s'

# forcing rotation
oc get secret -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | .!=null and fromdateiso8601<='$( date --date='+1year' +%s )') | "-n \(.metadata.namespace) \(.metadata.name)"' | xargs -n3 oc patch secret -p='{"metadata": {"annotations": {"auth.openshift.io/certificate-not-after": null}}}'

# Wait ~ 5-10 minutes

# Make sure at least the apiserver serving cert has 15 min validity (change your cluster name based on your kubeconfig)
openssl s_client -connect api.tnozicka-1.devcluster.openshift.com:6443 | openssl x509 -noout -dates
```

Actual O/P i got:
----------------
```
$ ./oc create -n openshift-config configmap unsupported-cert-rotation-config --from-literal='base=9000s'
configmap/unsupported-cert-rotation-config created

$ ./oc get secret -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | .!=null and fromdateiso8601<='$( date --date='+1year' +%s )') | "-n \(.metadata.namespace) \(.metadata.name)"' | xargs -n3 ./oc patch secret -p='{"metadata": {"annotations": {"auth.openshift.io/certificate-not-after": null}}}'
secret/kube-controller-manager-client-cert-key patched
secret/kube-scheduler-client-cert-key patched
secret/aggregator-client-signer patched
secret/kube-apiserver-to-kubelet-signer patched
secret/kube-control-plane-signer patched
secret/aggregator-client patched
secret/external-loadbalancer-serving-certkey patched
secret/internal-loadbalancer-serving-certkey patched
secret/kube-apiserver-cert-syncer-client-cert-key patched
secret/kube-apiserver-cert-syncer-client-cert-key-2 patched
secret/kube-apiserver-cert-syncer-client-cert-key-3 patched
secret/kube-apiserver-cert-syncer-client-cert-key-4 patched
secret/kube-apiserver-cert-syncer-client-cert-key-5 patched
secret/kube-apiserver-cert-syncer-client-cert-key-6 patched
secret/kubelet-client patched
secret/kubelet-client-2 patched
secret/kubelet-client-3 patched
secret/kubelet-client-4 patched
secret/kubelet-client-5 patched
secret/kubelet-client-6 patched
secret/localhost-serving-cert-certkey patched
secret/service-network-serving-certkey patched
secret/csr-signer patched
secret/csr-signer-signer patched
secret/kube-controller-manager-client-cert-key patched
secret/kube-controller-manager-client-cert-key-2 patched
secret/kube-controller-manager-client-cert-key-3 patched
secret/kube-controller-manager-client-cert-key-4 patched
secret/kube-controller-manager-client-cert-key-5 patched
secret/kube-scheduler-client-cert-key patched
secret/kube-scheduler-client-cert-key-2 patched
secret/kube-scheduler-client-cert-key-3 patched
secret/kube-scheduler-client-cert-key-4 patched
secret/kube-scheduler-client-cert-key-5 patched


#inside the VM that installer created certs are only valid for 1 day
[core@crc-kmrrq-master-0 ~]$ sudo su
[root@crc-kmrrq-master-0 core]# openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates
notBefore=May 31 11:14:00 2019 GMT
notAfter=Jun  1 11:05:06 2019 GMT

```
Moreover, the process described in the document https://docs.google.com/document/d/1ONkxdDmQVLBNJrSJymfKPrndo7b4vgCA2zwL9xHYx6A/edit# deals with recovery of a cluster whose certificates have expired.
For CRC use case we want to force the rotation, so that we don't have to wait for 20-24hrs for the certs to be rotated and valid for 30days. So that we can automate our bundle generation process.

Comment 24 Anjan 2019-08-08 11:56:09 UTC
Please ignore the previous comment.

After following the docs at https://docs.openshift.com/container-platform/4.1/disaster_recovery/scenario-3-expired-certs.html#dr-scenario-3-recovering-expired-certs_dr-recovering-expired-certs
I am not able to recover the cluster, the `kubelet` is still not able to find the node, below you can see the logs from the kubelet:

```
Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.293620    8455 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://api.crc.testing:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dcrc-cvgnz-master-0&limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.299748    8455 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://api.crc.testing:6443/api/v1/services?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.302810    8455 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://api.crc.testing:6443/api/v1/nodes?fieldSelector=metadata.name%3Dcrc-cvgnz-master-0&limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid
Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.365414    8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found
Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.465569    8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found
Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.565772    8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found
Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.665910    8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found
Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.766073    8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found
Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.866251    8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found
Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.966425    8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found
```

Details of the steps i followed:
--------------------------------
RELASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.1.3
KAO_IMAGE=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7c93e979f8b062841393470e3710c58245e47bf9cf0685ba0c6f95912c6d7882

```
[root@crc-cvgnz-master-0 core]# podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" recovery-apiserver create
I0808 08:42:49.569561       1 apiserver.go:215] Recovery apiserver certificates will be valid for 168h0m0s
I0808 08:42:50.089333       1 create.go:82] To access the server.
I0808 08:42:50.089413       1 create.go:83]     export KUBECONFIG=/etc/kubernetes/static-pod-resources/recovery-kube-apiserver-pod/admin.kubeconfig

[root@crc-cvgnz-master-0 core]# export KUBECONFIG=/etc/kubernetes/static-pod-resources/recovery-kube-apiserver-pod/admin.kubeconfig

[root@crc-cvgnz-master-0 core]# podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" regenerate-certificates
I0808 09:05:33.378239       1 certrotationcontroller.go:452] Waiting for CertRotation
I0808 09:05:33.478761       1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "AggregatorProxyClientCert"
I0808 09:05:33.579077       1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "AggregatorProxyClientCert"
I0808 09:05:33.579183       1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "KubeAPIServerToKubeletClientCert"
I0808 09:05:33.679653       1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "KubeAPIServerToKubeletClientCert"
I0808 09:05:33.679713       1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "LocalhostServing"
I0808 09:05:33.780310       1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "LocalhostServing"
I0808 09:05:33.780377       1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "ServiceNetworkServing"
I0808 09:05:33.881138       1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "ServiceNetworkServing"
I0808 09:05:33.881299       1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "ExternalLoadBalancerServing"
I0808 09:05:33.981788       1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "ExternalLoadBalancerServing"
I0808 09:05:33.982000       1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "InternalLoadBalancerServing"
I0808 09:05:34.082485       1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "InternalLoadBalancerServing"
I0808 09:05:34.082592       1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "KubeControllerManagerClient"
I0808 09:05:34.182878       1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "KubeControllerManagerClient"
I0808 09:05:34.182948       1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "KubeSchedulerClient"
I0808 09:05:34.283259       1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "KubeSchedulerClient"
I0808 09:05:34.283334       1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "KubeAPIServerCertSyncer"
I0808 09:05:34.383636       1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "KubeAPIServerCertSyncer"
I0808 09:05:34.383702       1 certrotationcontroller.go:474] Finished waiting for CertRotation
I0808 09:05:34.383729       1 kubecontrollermanagercertrotation.go:84] Waiting for CertRotation
I0808 09:05:34.383746       1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "CSRSigningCert"
I0808 09:05:34.484094       1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "CSRSigningCert"
I0808 09:05:34.484162       1 kubecontrollermanagercertrotation.go:90] Finished waiting for CertRotation
I0808 09:05:34.484193       1 regenerate_certificates.go:196] Refreshing certificates.
I0808 09:05:34.485152       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SignerUpdateRequired' "aggregator-client-signer" in "openshift-kube-apiserver-operator" requires a new signing cert/key pair: past its latest possible time 2019-07-23 03:52:22.8 +0000 UTC
I0808 09:05:34.924741       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CABundleUpdateRequired' "kube-apiserver-aggregator-client-ca" in "openshift-config-managed" requires a new cert
I0808 09:05:34.924882       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/aggregator-client-signer -n openshift-kube-apiserver-operator because it changed
I0808 09:05:35.014446       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/kube-apiserver-aggregator-client-ca -n openshift-config-managed: cause by changes in data.ca-bundle.crt
I0808 09:05:35.015462       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "aggregator-client" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-23 03:52:58.8 +0000 UTC
I0808 09:05:35.142765       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/aggregator-client -n openshift-kube-apiserver because it changed
I0808 09:05:35.143530       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "kubelet-client" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:06.8 +0000 UTC
I0808 09:05:35.791868       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/kubelet-client -n openshift-kube-apiserver because it changed
I0808 09:05:35.793313       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "localhost-serving-cert-certkey" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:07.8 +0000 UTC
I0808 09:05:36.590911       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/localhost-serving-cert-certkey -n openshift-kube-apiserver because it changed
I0808 09:05:36.593276       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "service-network-serving-certkey" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:07.8 +0000 UTC
I0808 09:05:37.393286       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/service-network-serving-certkey -n openshift-kube-apiserver because it changed
I0808 09:05:37.394794       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "external-loadbalancer-serving-certkey" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:06.8 +0000 UTC
I0808 09:05:38.190468       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/external-loadbalancer-serving-certkey -n openshift-kube-apiserver because it changed
I0808 09:05:38.192619       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "internal-loadbalancer-serving-certkey" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:07.8 +0000 UTC
I0808 09:05:38.990541       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/internal-loadbalancer-serving-certkey -n openshift-kube-apiserver because it changed
I0808 09:05:38.990782       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SignerUpdateRequired' "kube-control-plane-signer" in "openshift-kube-apiserver-operator" requires a new signing cert/key pair: past its refresh time 2019-07-28 08:40:05 +0000 UTC
I0808 09:05:39.790036       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/kube-control-plane-signer -n openshift-kube-apiserver-operator because it changed
I0808 09:05:39.790267       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CABundleUpdateRequired' "kube-control-plane-signer-ca" in "openshift-kube-apiserver-operator" requires a new cert
I0808 09:05:40.394509       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/kube-control-plane-signer-ca -n openshift-kube-apiserver-operator: cause by changes in data.ca-bundle.crt
I0808 09:05:40.395546       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "kube-controller-manager-client-cert-key" in "openshift-config-managed" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:08.8 +0000 UTC
I0808 09:05:41.399186       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/kube-controller-manager-client-cert-key -n openshift-config-managed because it changed
I0808 09:05:41.403527       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "kube-scheduler-client-cert-key" in "openshift-config-managed" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:06.8 +0000 UTC
I0808 09:05:42.189084       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/kube-scheduler-client-cert-key -n openshift-config-managed because it changed
I0808 09:05:42.190652       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "kube-apiserver-cert-syncer-client-cert-key" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:08.8 +0000 UTC
I0808 09:05:42.992317       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/kube-apiserver-cert-syncer-client-cert-key -n openshift-kube-apiserver because it changed
I0808 09:05:42.992967       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SignerUpdateRequired' "csr-signer-signer" in "openshift-kube-controller-manager-operator" requires a new signing cert/key pair: past its refresh time 2019-07-29 03:52:03 +0000 UTC
I0808 09:05:43.592810       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CABundleUpdateRequired' "csr-controller-signer-ca" in "openshift-kube-controller-manager-operator" requires a new cert
I0808 09:05:43.592926       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/csr-signer-signer -n openshift-kube-controller-manager-operator because it changed
I0808 09:05:44.193495       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/csr-controller-signer-ca -n openshift-kube-controller-manager-operator: cause by changes in data.ca-bundle.crt
I0808 09:05:44.194495       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "csr-signer" in "openshift-kube-controller-manager-operator" requires a new target cert/key pair: past its latest possible time 2019-07-23 03:55:02.8 +0000 UTC
I0808 09:05:45.193108       1 regenerate_certificates.go:203] Certificates refreshed.
I0808 09:05:45.193185       1 regenerate_certificates.go:205] Refreshing derivative resources.
I0808 09:05:45.193352       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/csr-signer -n openshift-kube-controller-manager-operator because it changed
I0808 09:05:47.222137       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/csr-signer-ca -n openshift-kube-controller-manager-operator: cause by changes in data.ca-bundle.crt
I0808 09:05:49.232076       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/csr-controller-ca -n openshift-kube-controller-manager-operator: cause by changes in data.ca-bundle.crt
I0808 09:05:51.248541       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/csr-signer -n openshift-kube-controller-manager because it changed
I0808 09:05:51.296868       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/csr-controller-ca -n openshift-config-managed: cause by changes in data.ca-bundle.crt
I0808 09:05:53.318022       1 regenerate_certificates.go:233] Derivative resources refreshed.
I0808 09:05:53.318120       1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/client-ca -n openshift-kube-apiserver: cause by changes in data.ca-bundle.crt
I0808 09:05:53.366974       1 helpers.go:121] Wrote new content to file "/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/service-network-serving-certkey/tls.crt"
```
Then force re-deployment, generate the kubeconfig using `recover-kubeconfig.sh` and move it to `/etc/kubernete/kubeconfig` then replace `/etc/kubernetes/ca.crt` from `kube-apiserver-to-kubelet-client-ca` configmap
Next stop kubelet, remove `/var/lib/kubelet/pki`, `/var/lib/kubelet/kubeconfig` then start kubelet.

Comment 25 Anjan 2019-08-09 04:11:48 UTC
One thing i missed in the previous comment is that, this is a single node cluster, master and worker running on the same machine.

Comment 28 Eric Rich 2019-09-09 20:44:40 UTC
Is this connected to https://bugzilla.redhat.com/show_bug.cgi?id=1735180

Comment 29 Ryan Phillips 2019-09-09 21:05:48 UTC
I think this ticket is old, and has been duplicated by a number of bugs.

https://bugzilla.redhat.com/show_bug.cgi?id=1724189
https://bugzilla.redhat.com/show_bug.cgi?id=1747608
https://bugzilla.redhat.com/show_bug.cgi?id=1741817



Is this issue still reproducible on a 4.2 nightly build?

Comment 32 Stephen Cuppett 2019-12-02 12:47:03 UTC
I've marked BZ1735180 as a blocker to this one. It has moved to 4.4.0 and this will need to as well.

Comment 36 Red Hat Bugzilla 2023-09-14 05:26:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days