Bug 1688610

Summary: Increase certificate rotation period
Product: OpenShift Container Platform Reporter: Derek Carr <decarr>
Component: MasterAssignee: Maciej Szulik <maszulik>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, jokerman, maszulik, mfojtik, mifiedle, mmccomas, rsawhill, wjiang, yinzhou
Target Milestone: ---Keywords: BetaBlocker
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:45:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Derek Carr 2019-03-14 05:26:31 UTC
Description of problem:

Before beta 3, we need to increase the certification rotation interval to allow customers to shutdown instances.

Looking at the default duration for 3.11 for nodes, it looks like we set --experimental-cluster-signing-duration for 1 year.  This bug is a tracker to ensure we increase the period before beta.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Maciej Szulik 2019-03-18 11:43:17 UTC
PR addressing the problem is https://github.com/openshift/cluster-kube-apiserver-operator/pull/338

Comment 2 Michal Fojtik 2019-03-18 15:16:32 UTC
The PR landed.

Comment 3 Xingxing Xia 2019-03-19 10:54:45 UTC
Didn't hit the related bug 1688820 today with latest build. Will continue checking more next day.

Comment 4 Xingxing Xia 2019-03-19 13:53:21 UTC
Read above PR, get all 30 days certs. Collect these certs' namespace and secret in a file:
$ cat certs.txt
openshift-kube-apiserver-operator aggregator-client-signer
openshift-kube-apiserver aggregator-client
openshift-kube-apiserver localhost-serving-cert-certkey
openshift-kube-apiserver service-network-serving-certkey
openshift-kube-apiserver loadbalancer-serving-certkey
openshift-config-managed kube-controller-manager-client-cert-key
openshift-kube-controller-manager kube-controller-manager-client-cert-key
openshift-config-managed kube-scheduler-client-cert-key
openshift-kube-scheduler kube-scheduler-client-cert-key
openshift-kube-apiserver kube-apiserver-cert-syncer-client-cert-key

Then check their dates in today's build latest 4.0.0-0.nightly-2019-03-19-004004 by:
export IFS=$'\n'
for i in `cat certs.txt`
do
  NS=`echo $i | cut -d ' ' -f 1`
  SECRET=`echo $i | cut -d ' ' -f 2`
  rm -f tls.crt
  oc extract secret/$SECRET -n $NS --confirm > /dev/null
  echo "Check cert dates of $SECRET in project $NS:"
  echo "openssl x509 -noout --dates -in tls.crt"
  openssl x509 -noout --dates -in tls.crt
  echo
done

Got:
Check cert dates of aggregator-client-signer in project openshift-kube-apiserver-operator:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 04:44:46 2019 GMT
notAfter=Mar 20 04:44:46 2019 GMT

Check cert dates of aggregator-client in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:03 2019 GMT
notAfter=Mar 20 04:44:46 2019 GMT

Check cert dates of localhost-serving-cert-certkey in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:03 2019 GMT
notAfter=Apr 18 05:02:04 2019 GMT

Check cert dates of service-network-serving-certkey in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:03 2019 GMT
notAfter=Apr 18 05:02:04 2019 GMT

Check cert dates of loadbalancer-serving-certkey in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:06 2019 GMT
notAfter=Apr 18 05:02:07 2019 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-config-managed:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:04 2019 GMT
notAfter=Apr 18 05:02:05 2019 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-kube-controller-manager:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:04 2019 GMT
notAfter=Apr 18 05:02:05 2019 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-config-managed:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:02 2019 GMT
notAfter=Apr 18 05:02:03 2019 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-kube-scheduler:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:02 2019 GMT
notAfter=Apr 18 05:02:03 2019 GMT

Check cert dates of kube-apiserver-cert-syncer-client-cert-key in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt
notBefore=Mar 19 05:02:08 2019 GMT
notAfter=Apr 18 05:02:09 2019 GMT


Most of these certs have 30 days validity except the two: aggregator-client-signer and aggregator-client which only have 1 day validity.
Maciej, does this mean these two still have bug?

Comment 5 Maciej Szulik 2019-03-20 13:40:41 UTC
This is not a bug, the initial cert validity is 1 day, only after that period we'll be getting the default 30 days certs. You need to keep the cluster running for a longer period to verify the certs.

Comment 6 Mike Fiedler 2019-03-20 14:46:17 UTC
I still hit the issue in https://bugzilla.redhat.com/show_bug.cgi?id=1688820 on  4.0.0-0.nightly-2019-03-19-004004 after leaving the cluster up overnight.  It has not been up 24h

Comment 7 Maciej Szulik 2019-03-20 20:37:27 UTC
Yeah, while testing the ability to shorten the cert rotation I've noticed that openshift apiserver died with errors from https://bugzilla.redhat.com/show_bug.cgi?id=1688820.
I'll continue debugging that tomorrow.

Comment 8 Xingxing Xia 2019-03-21 01:09:56 UTC
(In reply to Maciej Szulik from comment #5)
> This is not a bug, the initial cert validity is 1 day, only after that
> period we'll be getting the default 30 days certs. You need to keep the
> cluster running for a longer period to verify the certs.

Yes, now these two show 30 days too:
Check cert dates of aggregator-client-signer in project openshift-kube-apiserver-operator:
openssl x509 -noout --dates -in tls.crt                                                                                                
notBefore=Mar 20 14:57:38 2019 GMT                                                                                                     
notAfter=Apr 19 14:57:39 2019 GMT 
                                 
Check cert dates of aggregator-client in project openshift-kube-apiserver:
openssl x509 -noout --dates -in tls.crt                                                                                                
notBefore=Mar 20 14:57:37 2019 GMT                                                                                                     
notAfter=Apr 18 23:56:56 2019 GMT

Comment 9 Ryan Sawhill 2019-05-30 16:38:59 UTC
I'm confused. The original comment said "Before beta 3, we need to increase the certification rotation interval to allow customers to shutdown instances. Looking at the default duration for 3.11 for nodes, it looks like we set --experimental-cluster-signing-duration for 1 year" ... but then everything I see in this bz and the attached PR seems to suggest that all we did was bump it to 30 days.

Are there any more plans to address the original issue? For those of us in the training space (creating courses and exams for the product), 30 days is simply not enough. In case it's not clear, we need to be able to shutdown and snapshot environments that will then be copied and spun up for many many many months to come.

Comment 11 errata-xmlrpc 2019-06-04 10:45:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758