Hide Forgot
Description of problem: Before beta 3, we need to increase the certification rotation interval to allow customers to shutdown instances. Looking at the default duration for 3.11 for nodes, it looks like we set --experimental-cluster-signing-duration for 1 year. This bug is a tracker to ensure we increase the period before beta. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
PR addressing the problem is https://github.com/openshift/cluster-kube-apiserver-operator/pull/338
The PR landed.
Didn't hit the related bug 1688820 today with latest build. Will continue checking more next day.
Read above PR, get all 30 days certs. Collect these certs' namespace and secret in a file: $ cat certs.txt openshift-kube-apiserver-operator aggregator-client-signer openshift-kube-apiserver aggregator-client openshift-kube-apiserver localhost-serving-cert-certkey openshift-kube-apiserver service-network-serving-certkey openshift-kube-apiserver loadbalancer-serving-certkey openshift-config-managed kube-controller-manager-client-cert-key openshift-kube-controller-manager kube-controller-manager-client-cert-key openshift-config-managed kube-scheduler-client-cert-key openshift-kube-scheduler kube-scheduler-client-cert-key openshift-kube-apiserver kube-apiserver-cert-syncer-client-cert-key Then check their dates in today's build latest 4.0.0-0.nightly-2019-03-19-004004 by: export IFS=$'\n' for i in `cat certs.txt` do NS=`echo $i | cut -d ' ' -f 1` SECRET=`echo $i | cut -d ' ' -f 2` rm -f tls.crt oc extract secret/$SECRET -n $NS --confirm > /dev/null echo "Check cert dates of $SECRET in project $NS:" echo "openssl x509 -noout --dates -in tls.crt" openssl x509 -noout --dates -in tls.crt echo done Got: Check cert dates of aggregator-client-signer in project openshift-kube-apiserver-operator: openssl x509 -noout --dates -in tls.crt notBefore=Mar 19 04:44:46 2019 GMT notAfter=Mar 20 04:44:46 2019 GMT Check cert dates of aggregator-client in project openshift-kube-apiserver: openssl x509 -noout --dates -in tls.crt notBefore=Mar 19 05:02:03 2019 GMT notAfter=Mar 20 04:44:46 2019 GMT Check cert dates of localhost-serving-cert-certkey in project openshift-kube-apiserver: openssl x509 -noout --dates -in tls.crt notBefore=Mar 19 05:02:03 2019 GMT notAfter=Apr 18 05:02:04 2019 GMT Check cert dates of service-network-serving-certkey in project openshift-kube-apiserver: openssl x509 -noout --dates -in tls.crt notBefore=Mar 19 05:02:03 2019 GMT notAfter=Apr 18 05:02:04 2019 GMT Check cert dates of loadbalancer-serving-certkey in project openshift-kube-apiserver: openssl x509 -noout --dates -in tls.crt notBefore=Mar 19 05:02:06 2019 GMT notAfter=Apr 18 05:02:07 2019 GMT Check cert dates of kube-controller-manager-client-cert-key in project openshift-config-managed: openssl x509 -noout --dates -in tls.crt notBefore=Mar 19 05:02:04 2019 GMT notAfter=Apr 18 05:02:05 2019 GMT Check cert dates of kube-controller-manager-client-cert-key in project openshift-kube-controller-manager: openssl x509 -noout --dates -in tls.crt notBefore=Mar 19 05:02:04 2019 GMT notAfter=Apr 18 05:02:05 2019 GMT Check cert dates of kube-scheduler-client-cert-key in project openshift-config-managed: openssl x509 -noout --dates -in tls.crt notBefore=Mar 19 05:02:02 2019 GMT notAfter=Apr 18 05:02:03 2019 GMT Check cert dates of kube-scheduler-client-cert-key in project openshift-kube-scheduler: openssl x509 -noout --dates -in tls.crt notBefore=Mar 19 05:02:02 2019 GMT notAfter=Apr 18 05:02:03 2019 GMT Check cert dates of kube-apiserver-cert-syncer-client-cert-key in project openshift-kube-apiserver: openssl x509 -noout --dates -in tls.crt notBefore=Mar 19 05:02:08 2019 GMT notAfter=Apr 18 05:02:09 2019 GMT Most of these certs have 30 days validity except the two: aggregator-client-signer and aggregator-client which only have 1 day validity. Maciej, does this mean these two still have bug?
This is not a bug, the initial cert validity is 1 day, only after that period we'll be getting the default 30 days certs. You need to keep the cluster running for a longer period to verify the certs.
I still hit the issue in https://bugzilla.redhat.com/show_bug.cgi?id=1688820 on 4.0.0-0.nightly-2019-03-19-004004 after leaving the cluster up overnight. It has not been up 24h
Yeah, while testing the ability to shorten the cert rotation I've noticed that openshift apiserver died with errors from https://bugzilla.redhat.com/show_bug.cgi?id=1688820. I'll continue debugging that tomorrow.
(In reply to Maciej Szulik from comment #5) > This is not a bug, the initial cert validity is 1 day, only after that > period we'll be getting the default 30 days certs. You need to keep the > cluster running for a longer period to verify the certs. Yes, now these two show 30 days too: Check cert dates of aggregator-client-signer in project openshift-kube-apiserver-operator: openssl x509 -noout --dates -in tls.crt notBefore=Mar 20 14:57:38 2019 GMT notAfter=Apr 19 14:57:39 2019 GMT Check cert dates of aggregator-client in project openshift-kube-apiserver: openssl x509 -noout --dates -in tls.crt notBefore=Mar 20 14:57:37 2019 GMT notAfter=Apr 18 23:56:56 2019 GMT
I'm confused. The original comment said "Before beta 3, we need to increase the certification rotation interval to allow customers to shutdown instances. Looking at the default duration for 3.11 for nodes, it looks like we set --experimental-cluster-signing-duration for 1 year" ... but then everything I see in this bz and the attached PR seems to suggest that all we did was bump it to 30 days. Are there any more plans to address the original issue? For those of us in the training space (creating courses and exams for the product), 30 days is simply not enough. In case it's not clear, we need to be able to shutdown and snapshot environments that will then be copied and spun up for many many many months to come.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758