Bug 1882083 - OCP 4.5 Certificates not renewed
Summary: OCP 4.5 Certificates not renewed
Keywords:
Status: CLOSED DUPLICATE of bug 1881322
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Tomáš Nožička
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-23 18:40 UTC by Neil Girard
Modified: 2023-12-15 19:29 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-02 11:40:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Neil Girard 2020-09-23 18:40:49 UTC
Description of problem:
OpenShift 4.5 cluster has expired certificates and is unable to login via oc to perform any updates.  According to documentation certificates should not expire due to cluster handling the updates automatically.  Cluster is 4 months old and failed within the past week.


Version-Release number of selected component (if applicable):
OpenShift 4.5


How reproducible:
I am unable to reproduce.  Customer has hit this.

Steps to Reproduce:
N/A

Actual results:
Unable to login to oc.

Expected results:
Login success


Additional info:

# openshift-oauth 

Sep 22 16:07:28 mssocp4uat-665rv-master-0 hyperkube[3048240]: F0821 15:07:43.642697       1 cmd.go:125] unable to load configm
ap based request-header-client-ca-file: Get https://172.28.160.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiser
ver-authentication: dial tcp 172.28.160.1:443: connect: connection refused


# Kubelet

Sep 23 11:45:34 mssocp4uat-665rv-master-0 hyperkube[3048240]: I0923 11:45:34.821237 3048240 log.go:172] http: TLS handshake error from 172.28.134.75:58794: remote error: tls: bad certificate

Sep 22 12:28:56 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:56.997260 2608267 certificate_manager.go:409] Rotating certificates
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.015068 2608267 reflector.go:175] Starting reflector *v1beta1.CertificateSigningRequest (0s) from k8s.io/client-go/tools/watch/informerwatcher.go:146
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.015202 2608267 reflector.go:211] Listing and watching *v1beta1.CertificateSigningRequest from k8s.io/client-go/tools/watch/informerwatcher.go:146
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: E0922 12:28:57.028705 2608267 kubelet.go:2285] node "mssocp4uat-665rv-master-2" not found
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.099052 2608267 kubelet.go:1989] SyncLoop (SYNC): 1 pods; recyler-pod-mssocp4uat-665rv-master-2_openshift-infra(62c8dda59ddf0f65e57a5f320e1fa11d)
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.099126 2608267 kubelet.go:2034] Pod "recyler-pod-mssocp4uat-665rv-master-2_openshift-infra(62c8dda59ddf0f65e57a5f320e1fa11d)" has completed, ignoring remaining sync work: sync
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.101203 2608267 log.go:172] http: TLS handshake error from 172.28.134.71:60206: no serving certificate available for the kubelet
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.116289 2608267 csr.go:124] certificate signing request csr-vzskl is approved, waiting to be issued
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: E0922 12:28:57.129089 2608267 kubelet.go:2285] node "mssocp4uat-665rv-master-2" not found
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.147923 2608267 csr.go:121] certificate signing request csr-vzskl is issued
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.148157 2608267 reflector.go:181] Stopping reflector *v1beta1.CertificateSigningRequest (0s) from k8s.io/client-go/tools/watch/informerwatcher.go:146
Sep 22 12:28:57 mssocp4uat-665rv-master-2 hyperkube[2608267]: I0922 12:28:57.220813 2608267 log.go:172] http: TLS handshake error from 172.28.134.78:40186: remote error: tls: bad certificate

# Kube-apiserver

2020-09-23T12:15:17.627542636-04:00 stderr F E0923 16:15:17.627259       1 authentication.go:53] Unable to authenticate the request due to an error: x509: certificate has expired or is not yet valid
2020-09-23T12:15:18.551619650-04:00 stderr F E0923 16:15:18.551482       1 authentication.go:53] Unable to authenticate the request due to an error: x509: certificate has expired or is not yet valid

Comment 7 Neil Girard 2020-10-01 21:17:15 UTC
We worked around the issue by doing the following:

1.) Generate new CSRs for each node.

2.) Regenerating new certs by hand in the following locations (expired certs):

/etc/kubernetes/static-pod-resources/kube-apiserver-pod-x/secrets/kubelet-client
/etc/kubernetes/static-pod-resources/kube-scheduler-certs/secrets/kube-scheduler-client-cert-key


We are not sure why these certificates had expired.  Still looking into probable causes.  Auto regeneration of the two client keys failed and should be further investigated.

Comment 8 Tomáš Nožička 2020-10-02 11:40:23 UTC

*** This bug has been marked as a duplicate of bug 1881322 ***


Note You need to log in before you can comment on or make changes to this bug.