Bug 1688610 - Increase certificate rotation period
Summary: Increase certificate rotation period
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Maciej Szulik
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-14 05:26 UTC by Derek Carr
Modified: 2019-06-04 10:45 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:45:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:45:56 UTC

Description Derek Carr 2019-03-14 05:26:31 UTC
Description of problem:

Before beta 3, we need to increase the certification rotation interval to allow customers to shutdown instances.

Looking at the default duration for 3.11 for nodes, it looks like we set --experimental-cluster-signing-duration for 1 year.  This bug is a tracker to ensure we increase the period before beta.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Maciej Szulik 2019-03-18 11:43:17 UTC
PR addressing the problem is https://github.com/openshift/cluster-kube-apiserver-operator/pull/338

Comment 2 Michal Fojtik 2019-03-18 15:16:32 UTC
The PR landed.

Comment 3 Xingxing Xia 2019-03-19 10:54:45 UTC
Didn't hit the related bug 1688820 today with latest build. Will continue checking more next day.

Comment 4 Xingxing Xia 2019-03-19 13:53:21 UTC
Read above PR, get all 30 days certs. Collect these certs' namespace and secret in a file:
$ cat certs.txt
openshift-kube-apiserver-operator aggregator-client-signer
openshift-kube-apiserver aggregator-client
openshift-kube-apiserver localhost-serving-cert-certkey
openshift-kube-apiserver service-network-serving-certkey
openshift-kube-apiserver loadbalancer-serving-certkey
openshift-config-managed kube-controller-manager-client-cert-key
openshift-kube-controller-manager kube-controller-manager-client-cert-key
openshift-config-managed kube-scheduler-client-cert-key
openshift-kube-scheduler kube-scheduler-client-cert-key
openshift-kube-apiserver kube-apiserver-cert-syncer-client-cert-key

Then check their dates in today's build latest 4.0.0-0.nightly-2019-03-19-004004 by:
export IFS=$'\n'
for i in `cat certs.txt`
do
  NS=`echo $i | cut -d ' ' -f 1`
  SECRET=`echo $i | cut -d ' ' -f 2`
  rm -f tls.crt
  oc extract secret/$SECRET -n $NS --confirm > /dev/null
  echo "Check cert dates of $SECRET in project $NS:"
  echo "openssl x509 -noout --dates -in tls.crt"
  openssl x509 -noout --dates -in tls.crt
  echo
done

Got:
Check cert dates of aggregator-client-signer in project openshift-kube-apiserver-operator:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 04:44:46 2019 GMT
notAfter=Mar 20 04:44:46 2019 GMT

Check cert dates of aggregator-client in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:03 2019 GMT
notAfter=Mar 20 04:44:46 2019 GMT

Check cert dates of localhost-serving-cert-certkey in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:03 2019 GMT
notAfter=Apr 18 05:02:04 2019 GMT

Check cert dates of service-network-serving-certkey in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:03 2019 GMT
notAfter=Apr 18 05:02:04 2019 GMT

Check cert dates of loadbalancer-serving-certkey in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:06 2019 GMT
notAfter=Apr 18 05:02:07 2019 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-config-managed:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:04 2019 GMT
notAfter=Apr 18 05:02:05 2019 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-kube-controller-manager:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:04 2019 GMT
notAfter=Apr 18 05:02:05 2019 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-config-managed:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:02 2019 GMT
notAfter=Apr 18 05:02:03 2019 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-kube-scheduler:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:02 2019 GMT
notAfter=Apr 18 05:02:03 2019 GMT

Check cert dates of kube-apiserver-cert-syncer-client-cert-key in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:08 2019 GMT
notAfter=Apr 18 05:02:09 2019 GMT


Most of these certs have 30 days validity except the two: aggregator-client-signer and aggregator-client which only have 1 day validity.
Maciej, does this mean these two still have bug?

Comment 5 Maciej Szulik 2019-03-20 13:40:41 UTC
This is not a bug, the initial cert validity is 1 day, only after that period we'll be getting the default 30 days certs. You need to keep the cluster running for a longer period to verify the certs.

Comment 6 Mike Fiedler 2019-03-20 14:46:17 UTC
I still hit the issue in https://bugzilla.redhat.com/show_bug.cgi?id=1688820 on  4.0.0-0.nightly-2019-03-19-004004 after leaving the cluster up overnight.  It has not been up 24h

Comment 7 Maciej Szulik 2019-03-20 20:37:27 UTC
Yeah, while testing the ability to shorten the cert rotation I've noticed that openshift apiserver died with errors from https://bugzilla.redhat.com/show_bug.cgi?id=1688820.
I'll continue debugging that tomorrow.

Comment 8 Xingxing Xia 2019-03-21 01:09:56 UTC
(In reply to Maciej Szulik from comment #5)
> This is not a bug, the initial cert validity is 1 day, only after that
> period we'll be getting the default 30 days certs. You need to keep the
> cluster running for a longer period to verify the certs.

Yes, now these two show 30 days too:
Check cert dates of aggregator-client-signer in project openshift-kube-apiserver-operator:
openssl x509 -noout --dates -in tls.crt                                                                                                
notBefore=Mar 20 14:57:38 2019 GMT                                                                                                     
notAfter=Apr 19 14:57:39 2019 GMT 
                                 
Check cert dates of aggregator-client in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt                                                                                                
notBefore=Mar 20 14:57:37 2019 GMT                                                                                                     
notAfter=Apr 18 23:56:56 2019 GMT

Comment 9 Ryan Sawhill 2019-05-30 16:38:59 UTC
I'm confused. The original comment said "Before beta 3, we need to increase the certification rotation interval to allow customers to shutdown instances. Looking at the default duration for 3.11 for nodes, it looks like we set --experimental-cluster-signing-duration for 1 year" ... but then everything I see in this bz and the attached PR seems to suggest that all we did was bump it to 30 days.

Are there any more plans to address the original issue? For those of us in the training space (creating courses and exams for the product), 30 days is simply not enough. In case it's not clear, we need to be able to shutdown and snapshot environments that will then be copied and spun up for many many many months to come.

Comment 11 errata-xmlrpc 2019-06-04 10:45:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.