Bug 1693951
Summary: | TLS errors due to expired kubelet certificates after node was shutdown | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Gerard Braad (Red Hat) <gbraad> |
Component: | Node | Assignee: | Ryan Phillips <rphillips> |
Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | 4.2.0 | CC: | ablum, anjan, aos-bugs, cfergeau, dconsoli, eparis, erich, fbrychta, jokerman, jrosenta, lbednar, lmohanty, maszulik, mfojtik, mfuruta, mmccomas, prkumar, rh-container, rphillips, sapandit, scuppett, tnozicka, veillard, vlaad, wking, yinzhou |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | 4.4.0 | ||
Hardware: | All | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-21 19:16:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1735180 | ||
Bug Blocks: |
Description
Gerard Braad (Red Hat)
2019-03-29 06:52:28 UTC
Also filed as: https://github.com/openshift/installer/issues/1494 So it looks like those were one-day certs. I don't know how often they are rotated, what installer and release image were you using? The 0.15.0 installer sets up the kubelet client with a one-day cert [1] (and probably the kubelet server too, although I haven't tracked down a link). Then the cluster rotates the certs with... something. For example, see [2] for the Kubernetes API server and related totations. But nothing in [3] is jumping out at me as the kubelet certs you're having issues with. Moving to the auth team, since they'll probably know, although the code itself may live in a master-team repo. But if you want to shut down nodes, you'll certainly want to wait after the initial install, for a whole day or however long it takes for the first in-cluster rotations to go through, to get certs with longer validity times before shutting down nodes. The auth/master teams may also have some advice for monitoring those rotations, even if it's just "grep the kube-apiserver-operator logs". Maybe there are Kubernetes Events you can watch for? I dunno. Alternatively, you can just let the certs expire, and when the cluster comes back up, use SSH (which we don't expire/rotate) to go through and rebuild the x.509 chains. I know that sort of thing has been discussed before, but don't have a reference handy at the moment. I'll see if I can dig up a link later. [1]: https://github.com/openshift/installer/blob/v0.15.0/pkg/asset/tls/kubelet.go#L184 [2]: https://github.com/openshift/cluster-kube-apiserver-operator/pull/342 [3]: https://github.com/openshift/cluster-kube-apiserver-operator/blob/0b686ff00295c382f245b0b4103a566d672498c8/pkg/operator/certrotationcontroller/certrotationcontroller.go > But if you want to shut down nodes, you'll certainly want to wait after the initial install, for a whole day or however long it takes for the first in-cluster rotations to go through, that is unacceptable as this would be part of a delivery pipeline > use SSH (which we don't expire/rotate) to go through and rebuild the x.509 chains. This pre-provisioning of the certificates is what we prefer, as this also allows to create certs that are created for a longer period. So far we have not seen/received any instructions about this. For recovery, I may have been remembering this ask [1], although there was no further discussion there. There's another ask for a manual rotation trigger in [2]. I'm not aware of procedures for either, but there may be more-specific trackers somewhere that I just haven't turned up (or maybe not :p). [1]: https://github.com/openshift/installer/blob/v0.15.0/docs/user/troubleshooting.md#unable-to-ssh-into-master-nodes [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1684547#c27 Oops, stale paste. [1] above should have been: https://github.com/openshift/api/pull/199#discussion_r261689426 I hear kubelet certs are the Pod team, so reassigning to see if they can link the rotation code and/or have ideas about triggering, monitoring, or recovering cert rotation. Recovery tool now has a tracker in bug 1694079. In my case I shutdown the VM for less than 2 hour and started again which was working but when I checked the kubelet cert it was vaild for 3 hours so does that mean within 24 hours till the kubelet get proper 30 days valid cert it rotate in ever 2-3 hour? ``` $ oc get nodes NAME STATUS ROLES AGE VERSION test1-svtfv-master-0 Ready master,worker 47h v1.12.4+30e6a0f55 # cat kubelet-client-current.txt | grep Not Not Before: Mar 29 05:58:00 2019 GMT Not After : Mar 29 08:43:34 2019 GMT # date Fri Mar 29 06:27:12 UTC 2019 ``` This bug is strange because you can have a single master setup in OCP 4.x. HA is required; 3 master minimum. This is a known issue with rapidly rotating the kubelet client/server certs. If the kubelet is down during the time it would normally do the rotation and doesn't come back up before the existing certs expire, then the kubelet is unable to connect to the apiserver after that. I think I'm going to dup this to the recovery tool tracker because it does take external intervention to resolve this situation. *** This bug has been marked as a duplicate of bug 1694079 *** This was changed 5 days ago to a 60 day validity time and 30 day rotation time https://github.com/openshift/cluster-kube-controller-manager-operator/pull/203 I dunno if the commit is available in a nightly yet, but PR 203 has certainly landed, so this should be at least MODIFIED. So I tried the installer master which have this PR in as a payload for controller operator but then also I had to wait around 24 hour till the kubelet client/server cert actually rotated for a month validity. ``` $ openshift-install version openshift-install unreleased-master-663-g086a88534ad03776c97e31a843658e53e0088e78 built from commit 086a88534ad03776c97e31a843658e53e0088e78 [root@test1-sbgjm-master-0 pki]# uptime -p up 14 hours, 32 minutes [root@test1-sbgjm-master-0 pki] # openssl x509 -in kubelet-client-2019-03-31-13-57-00.pem -noout -text | grep Not Not Before: Mar 31 13:52:00 2019 GMT Not After : Apr 1 13:34:52 2019 GMT [root@test1-sbgjm-master-0 pki]# uptime -p up 1 day, 1 hours, 51 minutes [root@test1-sbgjm-master-0 pki] # openssl x509 -in kubelet-client-2019-04-01-11-04-04.pem -noout -text | grep Not Not Before: Apr 1 10:59:00 2019 GMT Not After : May 1 08:50:40 2019 GMT [root@test1-sbgjm-master-0 pki] # openssl x509 -in kubelet-server-2019-04-01-09-03-42.pem -noout -text | grep Not Not Before: Apr 1 08:59:00 2019 GMT Not After : May 1 08:51:02 2019 GMT ``` So I am still wondering that https://bugzilla.redhat.com/show_bug.cgi?id=1693951#c8 "the kubelet cert it was vaild for 3 hours so does that mean within 24 hours till the kubelet get proper 30 days valid cert it rotate in ever 2-3 hour?" (In reply to Praveen Kumar from comment #15) > So I tried the installer master which have this PR in as a payload for > controller operator but then also I had to wait around 24 hour till the > kubelet client/server cert actually rotated for a month validity. > > > ``` > $ openshift-install version > openshift-install > unreleased-master-663-g086a88534ad03776c97e31a843658e53e0088e78 > built from commit 086a88534ad03776c97e31a843658e53e0088e78 > > [root@test1-sbgjm-master-0 pki]# uptime -p > up 14 hours, 32 minutes > > [root@test1-sbgjm-master-0 pki] # openssl x509 -in > kubelet-client-2019-03-31-13-57-00.pem -noout -text | grep Not > Not Before: Mar 31 13:52:00 2019 GMT > Not After : Apr 1 13:34:52 2019 GMT > > [root@test1-sbgjm-master-0 pki]# uptime -p > up 1 day, 1 hours, 51 minutes > > [root@test1-sbgjm-master-0 pki] # openssl x509 -in > kubelet-client-2019-04-01-11-04-04.pem -noout -text | grep Not > Not Before: Apr 1 10:59:00 2019 GMT > Not After : May 1 08:50:40 2019 GMT > > [root@test1-sbgjm-master-0 pki] # openssl x509 -in > kubelet-server-2019-04-01-09-03-42.pem -noout -text | grep Not > Not Before: Apr 1 08:59:00 2019 GMT > Not After : May 1 08:51:02 2019 GMT > > ``` > > So I am still wondering that > https://bugzilla.redhat.com/show_bug.cgi?id=1693951#c8 "the kubelet cert it > was vaild for 3 hours so does that mean within 24 hours till the kubelet get > proper 30 days valid cert it rotate in ever 2-3 hour?" Forgot to add the payload info for this installer binary. ``` $ oc adm release info --commits | grep cluster-kube-controller-manager-operator cluster-kube-controller-manager-operator https://github.com/openshift/cluster-kube-controller-manager-operator 4e5073f837b1262db0c390c30b275b293db0b469 ``` The solution will be provided in this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1694079 *** This bug has been marked as a duplicate of bug 1694079 *** This is reopened. This was closed as a Duplicate of 1694079 The bug 1694079 is now closed with fixes but if you try to run the resulting script it fails for us in the 2 tests we tried based on Code Ready constainers, plus the instructions ask to wait 15 minutes plaus 20 minutes which means the operations takes at least 35 minutes which may be fine for an online cluster but absolutely not adequate for a developer waiting for his envioronment to start. https://docs.google.com/document/d/1ONkxdDmQVLBNJrSJymfKPrndo7b4vgCA2zwL9xHYx6A/edit So reopened, the solution for 1694079 is not adequate for this bug, Daniel Veillard Related bugs and recovery-verifications: https://bugzilla.redhat.com/show_bug.cgi?id=1713999#c8 https://bugzilla.redhat.com/show_bug.cgi?id=1711910#c28 https://bugzilla.redhat.com/show_bug.cgi?id=1715454#c5 The problem related to bug 1694079 was pasted there, didn't work for us but was automatically closed nonetheless. I think we provided all required infos at this point. I think this bug is now outdated. If a machine is kept turned down for a period of time long enough for certs to expire the recommended path is to run certificate recovery steps: https://docs.openshift.com/container-platform/4.1/disaster_recovery/scenario-3-expired-certs.html Tomáš the document indiocates at leaqst 35 mn for the recovery process to succeed. We can't expect developers to wait that timne for their cluster to show up. So i tried the force certification steps from the above gdoc, not sure what i am doing wrong but it does not rotate certs in the cluster. I am using the libvirt build for testing this and following are the steps that i followed: ``` # validity is 30 times the base (30*9000s = 270000s) oc create -n openshift-config configmap unsupported-cert-rotation-config --from-literal='base=9000s' # forcing rotation oc get secret -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | .!=null and fromdateiso8601<='$( date --date='+1year' +%s )') | "-n \(.metadata.namespace) \(.metadata.name)"' | xargs -n3 oc patch secret -p='{"metadata": {"annotations": {"auth.openshift.io/certificate-not-after": null}}}' # Wait ~ 5-10 minutes # Make sure at least the apiserver serving cert has 15 min validity (change your cluster name based on your kubeconfig) openssl s_client -connect api.tnozicka-1.devcluster.openshift.com:6443 | openssl x509 -noout -dates ``` Actual O/P i got: ---------------- ``` $ ./oc create -n openshift-config configmap unsupported-cert-rotation-config --from-literal='base=9000s' configmap/unsupported-cert-rotation-config created $ ./oc get secret -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | .!=null and fromdateiso8601<='$( date --date='+1year' +%s )') | "-n \(.metadata.namespace) \(.metadata.name)"' | xargs -n3 ./oc patch secret -p='{"metadata": {"annotations": {"auth.openshift.io/certificate-not-after": null}}}' secret/kube-controller-manager-client-cert-key patched secret/kube-scheduler-client-cert-key patched secret/aggregator-client-signer patched secret/kube-apiserver-to-kubelet-signer patched secret/kube-control-plane-signer patched secret/aggregator-client patched secret/external-loadbalancer-serving-certkey patched secret/internal-loadbalancer-serving-certkey patched secret/kube-apiserver-cert-syncer-client-cert-key patched secret/kube-apiserver-cert-syncer-client-cert-key-2 patched secret/kube-apiserver-cert-syncer-client-cert-key-3 patched secret/kube-apiserver-cert-syncer-client-cert-key-4 patched secret/kube-apiserver-cert-syncer-client-cert-key-5 patched secret/kube-apiserver-cert-syncer-client-cert-key-6 patched secret/kubelet-client patched secret/kubelet-client-2 patched secret/kubelet-client-3 patched secret/kubelet-client-4 patched secret/kubelet-client-5 patched secret/kubelet-client-6 patched secret/localhost-serving-cert-certkey patched secret/service-network-serving-certkey patched secret/csr-signer patched secret/csr-signer-signer patched secret/kube-controller-manager-client-cert-key patched secret/kube-controller-manager-client-cert-key-2 patched secret/kube-controller-manager-client-cert-key-3 patched secret/kube-controller-manager-client-cert-key-4 patched secret/kube-controller-manager-client-cert-key-5 patched secret/kube-scheduler-client-cert-key patched secret/kube-scheduler-client-cert-key-2 patched secret/kube-scheduler-client-cert-key-3 patched secret/kube-scheduler-client-cert-key-4 patched secret/kube-scheduler-client-cert-key-5 patched #inside the VM that installer created certs are only valid for 1 day [core@crc-kmrrq-master-0 ~]$ sudo su [root@crc-kmrrq-master-0 core]# openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates notBefore=May 31 11:14:00 2019 GMT notAfter=Jun 1 11:05:06 2019 GMT ``` Moreover, the process described in the document https://docs.google.com/document/d/1ONkxdDmQVLBNJrSJymfKPrndo7b4vgCA2zwL9xHYx6A/edit# deals with recovery of a cluster whose certificates have expired. For CRC use case we want to force the rotation, so that we don't have to wait for 20-24hrs for the certs to be rotated and valid for 30days. So that we can automate our bundle generation process. Please ignore the previous comment. After following the docs at https://docs.openshift.com/container-platform/4.1/disaster_recovery/scenario-3-expired-certs.html#dr-scenario-3-recovering-expired-certs_dr-recovering-expired-certs I am not able to recover the cluster, the `kubelet` is still not able to find the node, below you can see the logs from the kubelet: ``` Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.293620 8455 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://api.crc.testing:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dcrc-cvgnz-master-0&limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.299748 8455 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://api.crc.testing:6443/api/v1/services?limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.302810 8455 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://api.crc.testing:6443/api/v1/nodes?fieldSelector=metadata.name%3Dcrc-cvgnz-master-0&limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.365414 8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.465569 8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.565772 8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.665910 8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.766073 8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.866251 8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found Aug 08 07:18:27 crc-cvgnz-master-0 hyperkube[8455]: E0808 07:18:27.966425 8455 kubelet.go:2274] node "crc-cvgnz-master-0" not found ``` Details of the steps i followed: -------------------------------- RELASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.1.3 KAO_IMAGE=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7c93e979f8b062841393470e3710c58245e47bf9cf0685ba0c6f95912c6d7882 ``` [root@crc-cvgnz-master-0 core]# podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" recovery-apiserver create I0808 08:42:49.569561 1 apiserver.go:215] Recovery apiserver certificates will be valid for 168h0m0s I0808 08:42:50.089333 1 create.go:82] To access the server. I0808 08:42:50.089413 1 create.go:83] export KUBECONFIG=/etc/kubernetes/static-pod-resources/recovery-kube-apiserver-pod/admin.kubeconfig [root@crc-cvgnz-master-0 core]# export KUBECONFIG=/etc/kubernetes/static-pod-resources/recovery-kube-apiserver-pod/admin.kubeconfig [root@crc-cvgnz-master-0 core]# podman run -it --network=host -v /etc/kubernetes/:/etc/kubernetes/:Z --entrypoint=/usr/bin/cluster-kube-apiserver-operator "${KAO_IMAGE}" regenerate-certificates I0808 09:05:33.378239 1 certrotationcontroller.go:452] Waiting for CertRotation I0808 09:05:33.478761 1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "AggregatorProxyClientCert" I0808 09:05:33.579077 1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "AggregatorProxyClientCert" I0808 09:05:33.579183 1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "KubeAPIServerToKubeletClientCert" I0808 09:05:33.679653 1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "KubeAPIServerToKubeletClientCert" I0808 09:05:33.679713 1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "LocalhostServing" I0808 09:05:33.780310 1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "LocalhostServing" I0808 09:05:33.780377 1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "ServiceNetworkServing" I0808 09:05:33.881138 1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "ServiceNetworkServing" I0808 09:05:33.881299 1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "ExternalLoadBalancerServing" I0808 09:05:33.981788 1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "ExternalLoadBalancerServing" I0808 09:05:33.982000 1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "InternalLoadBalancerServing" I0808 09:05:34.082485 1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "InternalLoadBalancerServing" I0808 09:05:34.082592 1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "KubeControllerManagerClient" I0808 09:05:34.182878 1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "KubeControllerManagerClient" I0808 09:05:34.182948 1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "KubeSchedulerClient" I0808 09:05:34.283259 1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "KubeSchedulerClient" I0808 09:05:34.283334 1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "KubeAPIServerCertSyncer" I0808 09:05:34.383636 1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "KubeAPIServerCertSyncer" I0808 09:05:34.383702 1 certrotationcontroller.go:474] Finished waiting for CertRotation I0808 09:05:34.383729 1 kubecontrollermanagercertrotation.go:84] Waiting for CertRotation I0808 09:05:34.383746 1 client_cert_rotation_controller.go:117] Waiting for CertRotationController - "CSRSigningCert" I0808 09:05:34.484094 1 client_cert_rotation_controller.go:124] Finished waiting for CertRotationController - "CSRSigningCert" I0808 09:05:34.484162 1 kubecontrollermanagercertrotation.go:90] Finished waiting for CertRotation I0808 09:05:34.484193 1 regenerate_certificates.go:196] Refreshing certificates. I0808 09:05:34.485152 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SignerUpdateRequired' "aggregator-client-signer" in "openshift-kube-apiserver-operator" requires a new signing cert/key pair: past its latest possible time 2019-07-23 03:52:22.8 +0000 UTC I0808 09:05:34.924741 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CABundleUpdateRequired' "kube-apiserver-aggregator-client-ca" in "openshift-config-managed" requires a new cert I0808 09:05:34.924882 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/aggregator-client-signer -n openshift-kube-apiserver-operator because it changed I0808 09:05:35.014446 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/kube-apiserver-aggregator-client-ca -n openshift-config-managed: cause by changes in data.ca-bundle.crt I0808 09:05:35.015462 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "aggregator-client" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-23 03:52:58.8 +0000 UTC I0808 09:05:35.142765 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/aggregator-client -n openshift-kube-apiserver because it changed I0808 09:05:35.143530 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "kubelet-client" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:06.8 +0000 UTC I0808 09:05:35.791868 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/kubelet-client -n openshift-kube-apiserver because it changed I0808 09:05:35.793313 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "localhost-serving-cert-certkey" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:07.8 +0000 UTC I0808 09:05:36.590911 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/localhost-serving-cert-certkey -n openshift-kube-apiserver because it changed I0808 09:05:36.593276 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "service-network-serving-certkey" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:07.8 +0000 UTC I0808 09:05:37.393286 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/service-network-serving-certkey -n openshift-kube-apiserver because it changed I0808 09:05:37.394794 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "external-loadbalancer-serving-certkey" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:06.8 +0000 UTC I0808 09:05:38.190468 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/external-loadbalancer-serving-certkey -n openshift-kube-apiserver because it changed I0808 09:05:38.192619 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "internal-loadbalancer-serving-certkey" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:07.8 +0000 UTC I0808 09:05:38.990541 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/internal-loadbalancer-serving-certkey -n openshift-kube-apiserver because it changed I0808 09:05:38.990782 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SignerUpdateRequired' "kube-control-plane-signer" in "openshift-kube-apiserver-operator" requires a new signing cert/key pair: past its refresh time 2019-07-28 08:40:05 +0000 UTC I0808 09:05:39.790036 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/kube-control-plane-signer -n openshift-kube-apiserver-operator because it changed I0808 09:05:39.790267 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CABundleUpdateRequired' "kube-control-plane-signer-ca" in "openshift-kube-apiserver-operator" requires a new cert I0808 09:05:40.394509 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/kube-control-plane-signer-ca -n openshift-kube-apiserver-operator: cause by changes in data.ca-bundle.crt I0808 09:05:40.395546 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "kube-controller-manager-client-cert-key" in "openshift-config-managed" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:08.8 +0000 UTC I0808 09:05:41.399186 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/kube-controller-manager-client-cert-key -n openshift-config-managed because it changed I0808 09:05:41.403527 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "kube-scheduler-client-cert-key" in "openshift-config-managed" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:06.8 +0000 UTC I0808 09:05:42.189084 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/kube-scheduler-client-cert-key -n openshift-config-managed because it changed I0808 09:05:42.190652 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "kube-apiserver-cert-syncer-client-cert-key" in "openshift-kube-apiserver" requires a new target cert/key pair: past its latest possible time 2019-07-22 08:54:08.8 +0000 UTC I0808 09:05:42.992317 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/kube-apiserver-cert-syncer-client-cert-key -n openshift-kube-apiserver because it changed I0808 09:05:42.992967 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SignerUpdateRequired' "csr-signer-signer" in "openshift-kube-controller-manager-operator" requires a new signing cert/key pair: past its refresh time 2019-07-29 03:52:03 +0000 UTC I0808 09:05:43.592810 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CABundleUpdateRequired' "csr-controller-signer-ca" in "openshift-kube-controller-manager-operator" requires a new cert I0808 09:05:43.592926 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/csr-signer-signer -n openshift-kube-controller-manager-operator because it changed I0808 09:05:44.193495 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/csr-controller-signer-ca -n openshift-kube-controller-manager-operator: cause by changes in data.ca-bundle.crt I0808 09:05:44.194495 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'TargetUpdateRequired' "csr-signer" in "openshift-kube-controller-manager-operator" requires a new target cert/key pair: past its latest possible time 2019-07-23 03:55:02.8 +0000 UTC I0808 09:05:45.193108 1 regenerate_certificates.go:203] Certificates refreshed. I0808 09:05:45.193185 1 regenerate_certificates.go:205] Refreshing derivative resources. I0808 09:05:45.193352 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/csr-signer -n openshift-kube-controller-manager-operator because it changed I0808 09:05:47.222137 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/csr-signer-ca -n openshift-kube-controller-manager-operator: cause by changes in data.ca-bundle.crt I0808 09:05:49.232076 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/csr-controller-ca -n openshift-kube-controller-manager-operator: cause by changes in data.ca-bundle.crt I0808 09:05:51.248541 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/csr-signer -n openshift-kube-controller-manager because it changed I0808 09:05:51.296868 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/csr-controller-ca -n openshift-config-managed: cause by changes in data.ca-bundle.crt I0808 09:05:53.318022 1 regenerate_certificates.go:233] Derivative resources refreshed. I0808 09:05:53.318120 1 event.go:221] Event(v1.ObjectReference{Kind:"namespace", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ConfigMapUpdated' Updated ConfigMap/client-ca -n openshift-kube-apiserver: cause by changes in data.ca-bundle.crt I0808 09:05:53.366974 1 helpers.go:121] Wrote new content to file "/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/service-network-serving-certkey/tls.crt" ``` Then force re-deployment, generate the kubeconfig using `recover-kubeconfig.sh` and move it to `/etc/kubernete/kubeconfig` then replace `/etc/kubernetes/ca.crt` from `kube-apiserver-to-kubelet-client-ca` configmap Next stop kubelet, remove `/var/lib/kubelet/pki`, `/var/lib/kubelet/kubeconfig` then start kubelet. One thing i missed in the previous comment is that, this is a single node cluster, master and worker running on the same machine. Is this connected to https://bugzilla.redhat.com/show_bug.cgi?id=1735180 I think this ticket is old, and has been duplicated by a number of bugs. https://bugzilla.redhat.com/show_bug.cgi?id=1724189 https://bugzilla.redhat.com/show_bug.cgi?id=1747608 https://bugzilla.redhat.com/show_bug.cgi?id=1741817 Is this issue still reproducible on a 4.2 nightly build? I've marked BZ1735180 as a blocker to this one. It has moved to 4.4.0 and this will need to as well. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |