Bug 1882083

Summary: OCP 4.5 Certificates not renewed
Product: OpenShift Container Platform Reporter: Neil Girard <ngirard>
Component: kube-apiserverAssignee: Tomáš Nožička <tnozicka>
Status: CLOSED DUPLICATE QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.5CC: aos-bugs, mfojtik, xxia
Target Milestone: ---Keywords: UpcomingSprint
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-02 11:40:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Neil Girard 2020-09-23 18:40:49 UTC
Description of problem:
OpenShift 4.5 cluster has expired certificates and is unable to login via oc to perform any updates.  According to documentation certificates should not expire due to cluster handling the updates automatically.  Cluster is 4 months old and failed within the past week.


Version-Release number of selected component (if applicable):
OpenShift 4.5


How reproducible:
I am unable to reproduce.  Customer has hit this.

Steps to Reproduce:
N/A

Actual results:
Unable to login to oc.

Expected results:
Login success


Additional info:

# openshift-oauth 

Sep 22 16:07:28 mssocp4uat-665rv-master-0 hyperkube[3048240]: F0821 15:07:43.642697       1 cmd.go:125] unable to load configm
ap based request-header-client-ca-file: Get https://172.28.160.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiser
ver-authentication: dial tcp 172.28.160.1:443: connect: connection refused


# Kubelet

Sep 23 11:45:34 mssocp4uat-665rv-master-0 hyperkube[3048240]: I0923 11:45:34.821237 3048240 log.go:172] http: TLS handshake error from 172.28.134.75:58794: remote error: tls: bad certificate

Sep 22 12:28:56 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:56.997260 2608267 certificate_manager.go:409] Rotating certificates
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.015068 2608267 reflector.go:175] Starting reflector *v1beta1.CertificateSigningRequest (0s) from k8s.io/client-go/tools/watch/informerwatcher.go:146
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.015202 2608267 reflector.go:211] Listing and watching *v1beta1.CertificateSigningRequest from k8s.io/client-go/tools/watch/informerwatcher.go:146
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: E0922 12:28:57.028705 2608267 kubelet.go:2285] node "mssocp4uat-665rv-master-2" not found
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.099052 2608267 kubelet.go:1989] SyncLoop (SYNC): 1 pods; recyler-pod-mssocp4uat-665rv-master-2_openshift-infra(62c8dda59ddf0f65e57a5f320e1fa11d)
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.099126 2608267 kubelet.go:2034] Pod "recyler-pod-mssocp4uat-665rv-master-2_openshift-infra(62c8dda59ddf0f65e57a5f320e1fa11d)" has completed, ignoring remaining sync work: sync
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.101203 2608267 log.go:172] http: TLS handshake error from 172.28.134.71:60206: no serving certificate available for the kubelet
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.116289 2608267 csr.go:124] certificate signing request csr-vzskl is approved, waiting to be issued
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: E0922 12:28:57.129089 2608267 kubelet.go:2285] node "mssocp4uat-665rv-master-2" not found
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.147923 2608267 csr.go:121] certificate signing request csr-vzskl is issued
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.148157 2608267 reflector.go:181] Stopping reflector *v1beta1.CertificateSigningRequest (0s) from k8s.io/client-go/tools/watch/informerwatcher.go:146
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.220813 2608267 log.go:172] http: TLS handshake error from 172.28.134.78:40186: remote error: tls: bad certificate

# Kube-apiserver

2020-09-23T12:15:17.627542636-04:00 stderr F E0923 16:15:17.627259       1 authentication.go:53] Unable to authenticate the request due to an error: x509: certificate has expired or is not yet valid
2020-09-23T12:15:18.551619650-04:00 stderr F E0923 16:15:18.551482       1 authentication.go:53] Unable to authenticate the request due to an error: x509: certificate has expired or is not yet valid

Comment 7 Neil Girard 2020-10-01 21:17:15 UTC
We worked around the issue by doing the following:

1.) Generate new CSRs for each node.

2.) Regenerating new certs by hand in the following locations (expired certs):

/etc/kubernetes/static-pod-resources/kube-apiserver-pod-x/secrets/kubelet-client
/etc/kubernetes/static-pod-resources/kube-scheduler-certs/secrets/kube-scheduler-client-cert-key


We are not sure why these certificates had expired.  Still looking into probable causes.  Auto regeneration of the two client keys failed and should be further investigated.

Comment 8 Tomáš Nožička 2020-10-02 11:40:23 UTC

*** This bug has been marked as a duplicate of bug 1881322 ***