Bug 1694079

Summary: Add tool that can restore expired certificates in case a cluster was suspended for longer period of time
Product: OpenShift Container Platform Reporter: Michal Fojtik <mfojtik>
Component: MasterAssignee: Tomáš Nožička <tnozicka>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: medium Docs Contact:
Priority: urgent    
Version: 4.1.0CC: anjan, aos-bugs, cfergeau, gblomqui, gbraad, gklein, jokerman, lmohanty, mifiedle, mmccomas, prkumar, rcyriac, schituku, sdodson, sponnaga, trking, yzamir
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:46:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Michal Fojtik 2019-03-29 13:06:16 UTC
Description of problem:

This tool will allow users that suspended the cluster longer than validity of
our certificates.
This tool can be baked into a VM boot process so when somebody want to package
OpenShift cluster into a VM, it will start with valid certificates.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Comment 1 Seth Jennings 2019-03-29 14:14:35 UTC
*** Bug 1693951 has been marked as a duplicate of this bug. ***

Comment 2 Maciej Szulik 2019-04-09 14:27:20 UTC
*** Bug 1693951 has been marked as a duplicate of this bug. ***

Comment 3 Seth Jennings 2019-04-18 20:17:27 UTC
*** Bug 1699470 has been marked as a duplicate of this bug. ***

Comment 4 Xingxing Xia 2019-04-19 06:04:40 UTC
Adding the keyword considering the issue of bug 1699470 and the label of MSTR-363. Feel free to remove it if you disagree. Thanks.

Comment 6 Tomáš Nožička 2019-04-30 08:02:54 UTC
WIP is tracked here https://github.com/openshift/cluster-kube-apiserver-operator/pull/444

Comment 9 Michal Fojtik 2019-05-07 07:50:29 UTC
https://github.com/openshift/cluster-kube-apiserver-operator/pull/460 merged, moving to QA.

@Tomas can you please work with QA on explaining how they can test recovery procedure?

Comment 10 zhou ying 2019-05-08 09:36:26 UTC
Fellow this doc  https://docs.google.com/document/d/1ONkxdDmQVLBNJrSJymfKPrndo7b4vgCA2zwL9xHYx6A/edit, I can recovery the cluster, no issue found , will verify .

Comment 11 zhou ying 2019-05-08 09:36:56 UTC
Payload 4.1.0-0.nightly-2019-05-08-012425

Comment 12 Anjan 2019-05-31 12:55:00 UTC
So i tried the force certification steps from the above gdoc, not sure what i am doing wrong but it does not rotate certs in the cluster.
I am using the libvirt build for testing this and following are the steps that i followed:

# validity is 30 times the base (30*9000s = 270000s)
oc create -n openshift-config configmap unsupported-cert-rotation-config --from-literal='base=9000s'

# forcing rotation
oc get secret -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | .!=null and fromdateiso8601<='$( date --date='+1year' +%s )') | "-n \(.metadata.namespace) \(.metadata.name)"' | xargs -n3 oc patch secret -p='{"metadata": {"annotations": {"auth.openshift.io/certificate-not-after": null}}}'

# Wait ~ 5-10 minutes

# Make sure at least the apiserver serving cert has 15 min validity (change your cluster name based on your kubeconfig)
openssl s_client -connect api.tnozicka-1.devcluster.openshift.com:6443 | openssl x509 -noout -dates

Actual O/P i got:
$ ./oc create -n openshift-config configmap unsupported-cert-rotation-config --from-literal='base=9000s'
configmap/unsupported-cert-rotation-config created

$ ./oc get secret -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | .!=null and fromdateiso8601<='$( date --date='+1year' +%s )') | "-n \(.metadata.namespace) \(.metadata.name)"' | xargs -n3 ./oc patch secret -p='{"metadata": {"annotations": {"auth.openshift.io/certificate-not-after": null}}}'
secret/kube-controller-manager-client-cert-key patched
secret/kube-scheduler-client-cert-key patched
secret/aggregator-client-signer patched
secret/kube-apiserver-to-kubelet-signer patched
secret/kube-control-plane-signer patched
secret/aggregator-client patched
secret/external-loadbalancer-serving-certkey patched
secret/internal-loadbalancer-serving-certkey patched
secret/kube-apiserver-cert-syncer-client-cert-key patched
secret/kube-apiserver-cert-syncer-client-cert-key-2 patched
secret/kube-apiserver-cert-syncer-client-cert-key-3 patched
secret/kube-apiserver-cert-syncer-client-cert-key-4 patched
secret/kube-apiserver-cert-syncer-client-cert-key-5 patched
secret/kube-apiserver-cert-syncer-client-cert-key-6 patched
secret/kubelet-client patched
secret/kubelet-client-2 patched
secret/kubelet-client-3 patched
secret/kubelet-client-4 patched
secret/kubelet-client-5 patched
secret/kubelet-client-6 patched
secret/localhost-serving-cert-certkey patched
secret/service-network-serving-certkey patched
secret/csr-signer patched
secret/csr-signer-signer patched
secret/kube-controller-manager-client-cert-key patched
secret/kube-controller-manager-client-cert-key-2 patched
secret/kube-controller-manager-client-cert-key-3 patched
secret/kube-controller-manager-client-cert-key-4 patched
secret/kube-controller-manager-client-cert-key-5 patched
secret/kube-scheduler-client-cert-key patched
secret/kube-scheduler-client-cert-key-2 patched
secret/kube-scheduler-client-cert-key-3 patched
secret/kube-scheduler-client-cert-key-4 patched
secret/kube-scheduler-client-cert-key-5 patched

#inside the VM that installer created certs are only valid for 1 day
[core@crc-kmrrq-master-0 ~]$ sudo su
[root@crc-kmrrq-master-0 core]# openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates
notBefore=May 31 11:14:00 2019 GMT
notAfter=Jun  1 11:05:06 2019 GMT


Comment 14 errata-xmlrpc 2019-06-04 10:46:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.