Bug 1716622

Summary:	Updating the kube-apiserver certificate with a new certificate fails to reload the kube-apiserver certificate
Product:	OpenShift Container Platform	Reporter:	Greg Blomquist <gblomqui>
Component:	Master	Assignee:	Luis Sanchez <sanchezl>
Status:	CLOSED ERRATA	QA Contact:	Xingxing Xia <xxia>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	4.1.0	CC:	aos-bugs, deads, gblomqui, jokerman, jupierce, mfojtik, mifiedle, mmccomas, mwoodson, sanchezl, sponnaga, wking, xtian, xxia
Target Milestone:	---	Keywords:	OSE41z_next
Target Release:	4.1.z
Hardware:	All
OS:	All
Whiteboard:	4.1.2
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1714771	Environment:
Last Closed:	2019-06-19 06:45:34 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1714771
Bug Blocks:	1718956

Comment 1 Greg Blomquist 2019-06-03 19:18:53 UTC

https://github.com/openshift/library-go/pull/435

Comment 3 Greg Blomquist 2019-06-05 13:47:06 UTC

There are two other PRs still required for this BZ:

* https://github.com/openshift/cluster-kube-apiserver-operator/pull/492
* https://github.com/openshift/cluster-kube-controller-manager-operator/pull/257

Comment 4 W. Trevor King 2019-06-05 15:06:16 UTC

Both PRs merged.

Comment 5 W. Trevor King 2019-06-05 23:13:27 UTC

https://openshift-release.svc.ci.openshift.org/releasestream/4.1.0-0.nightly/release/4.1.0-0.nightly-2019-06-05-223716?from=4.1.0 has both of the new PRs.

Comment 6 Xingxing Xia 2019-06-06 04:55:34 UTC

Tested 4.1.0-0.nightly-2019-06-05-233256, updating the kube-apiserver certificate with a new certificate, still fails to reload/rollout:
First, add certificate by following https://bugzilla.redhat.com/show_bug.cgi?id=1685704#c26 :
$ openssl genrsa  -out custom2.key 1024
$ openssl req -new -key custom2.key -out custom2.csr
...skipped...
Common Name (eg, your name or your server's hostname) []:api.xxia-test.qe.devcluster.openshift.com
...skipped...
$ openssl x509 -req -days 1 -in custom2.csr -signkey custom2.key -out custom2.crt
$ oc create secret tls api-certs --cert=custom2.crt --key=custom2.key -n openshift-config
$ oc edit apiserver cluster
...
spec:
  servingCerts:
    namedCertificates:
    - names:
      - api.xxia-test.qe.devcluster.openshift.com
      servingCertificate:
        name: api-certs

Then new installer-7-ip-* pods run --> kube-apiserver pods restart.

Second, update the certificate with new .crt:
$ openssl x509 -req -days 1 -in custom2.csr -signkey custom2.key -out custom2-2.crt
$ oc create secret tls api-certs --cert=custom2-2.crt --key=custom2.key -n openshift-config --dry-run -o yaml | oc apply -f -
Watch pods, no new installer-8-ip-* appear, kube-apiserver pods never restart accordingly:
$ watch oc get po -n openshift-kube-apiserver
Every 2.0s: oc get po -n openshift-kube-apiserver                                                                                   fedora29: Thu Jun  6 12:51[0/1243]

NAME                                                           READY   STATUS      RESTARTS   AGE 
installer-2-ip-10-0-133-233.us-east-2.compute.internal         0/1     Completed   0          3h15m 
installer-2-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          3h13m 
installer-2-ip-10-0-172-171.us-east-2.compute.internal         0/1     Completed   0          3h15m 
installer-3-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          3h13m 
installer-4-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          3h12m 
installer-5-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          3h11m 
installer-6-ip-10-0-133-233.us-east-2.compute.internal         0/1     Completed   0          3h7m 
installer-6-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          3h11m 
installer-6-ip-10-0-172-171.us-east-2.compute.internal         0/1     Completed   0          3h9m 
installer-7-ip-10-0-133-233.us-east-2.compute.internal         0/1     Completed   0          18m 
installer-7-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          16m 
installer-7-ip-10-0-172-171.us-east-2.compute.internal         0/1     Completed   0          19m 
kube-apiserver-ip-10-0-133-233.us-east-2.compute.internal      2/2     Running     0          17m 
kube-apiserver-ip-10-0-159-216.us-east-2.compute.internal      2/2     Running     0          16m 
kube-apiserver-ip-10-0-172-171.us-east-2.compute.internal      2/2     Running     0          19m

Comment 7 David Eads 2019-06-06 15:02:56 UTC

The API server doesn't restart when certificates change, only when configuration changes.  Check to see if you're actually serving with the new certificates.

Also, as a reminder, please include the `oc adm must-gather` report so we can avoid ping-ponging back and forth.

Comment 8 Mike Fiedler 2019-06-06 16:05:27 UTC

Accessed the cluster in question above and the rollouts were still on the same generation:

$ oc get pods -n openshift-kube-apiserver
NAME                                                           READY   STATUS      RESTARTS   AGE
installer-2-ip-10-0-133-233.us-east-2.compute.internal         0/1     Completed   0          14h
installer-2-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          14h
installer-2-ip-10-0-172-171.us-east-2.compute.internal         0/1     Completed   0          14h
installer-3-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          14h
installer-4-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          14h
installer-5-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          14h
installer-6-ip-10-0-133-233.us-east-2.compute.internal         0/1     Completed   0          14h
installer-6-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          14h
installer-6-ip-10-0-172-171.us-east-2.compute.internal         0/1     Completed   0          14h
installer-7-ip-10-0-133-233.us-east-2.compute.internal         0/1     Completed   0          11h
installer-7-ip-10-0-159-216.us-east-2.compute.internal         0/1     Completed   0          11h
installer-7-ip-10-0-172-171.us-east-2.compute.internal         0/1     Completed   0          11h
kube-apiserver-ip-10-0-133-233.us-east-2.compute.internal      2/2     Running     0          11h
kube-apiserver-ip-10-0-159-216.us-east-2.compute.internal      2/2     Running     0          11h
kube-apiserver-ip-10-0-172-171.us-east-2.compute.internal      2/2     Running     0          11h


Retrieved the cert being served with:

openssl s_client -showcerts -servername api.xxia-0606.qe.devcluster.openshift.com -connect api.xxia-0606.qe.devcluster.openshift.com:6443 </dev/null | tee -a showcert.out 

and compared it to tls.crt in the secret api-certs in the namespace openshift-config.   The certificates did not match.   Verified in the apiserver CR that api-certs is still the servingCertificate.

I will include the openssl output and the secret output in the must-gather zip I will be linking shortly.

Comment 10 Mike Fiedler 2019-06-06 16:49:39 UTC

Moving this back ON_QA.   Using the correct hostname for the SNI version of the openssl showcerts does show that we are serving the correct cert:

openssl s_client -showcerts -servername api.xxia-test.qe.devcluster.openshift.com -connect api.xxia-0606.qe.devcluster.openshift.com:6443

CONNECTED(00000003)
---
Certificate chain
 0 s:C = US, ST = test, L = Default City, O = Default Company Ltd, CN = api.xxia-test.qe.devcluster.openshift.com
   i:C = US, ST = test, L = Default City, O = Default Company Ltd, CN = api.xxia-test.qe.devcluster.openshift.com
-----BEGIN CERTIFICATE-----
MIICjjCCAfcCFBWHWgPnkHlHbTENgY9DZA7NFkWIMA0GCSqGSIb3DQEBCwUAMIGF
MQswCQYDVQQGEwJVUzENMAsGA1UECAwEdGVzdDEVMBMGA1UEBwwMRGVmYXVsdCBD
aXR5MRwwGgYDVQQKDBNEZWZhdWx0IENvbXBhbnkgTHRkMTIwMAYDVQQDDClhcGku
eHhpYS10ZXN0LnFlLmRldmNsdXN0ZXIub3BlbnNoaWZ0LmNvbTAeFw0xOT.......

Comment 11 Mike Fiedler 2019-06-06 16:59:05 UTC

@deads verified the subject and signer from the cert and I was able to verify the expiration:

echo | openssl s_client -showcerts -servername api.xxia-test.qe.devcluster.openshift.com -connect api.xxia-0606.qe.devcluster.openshift.com:6443 2>/dev/null | openssl x509 -text
Certificate:
    Data:
        Version: 1 (0x0)
        Serial Number:
            15:87:5a:03:e7:90:79:47:6d:31:0d:81:8f:43:64:0e:cd:16:45:88
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, ST = test, L = Default City, O = Default Company Ltd, CN = api.xxia-test.qe.devcluster.openshift.com
        Validity
            Not Before: Jun  6 04:39:18 2019 GMT
            Not After : Jun  7 04:39:18 2019 GMT
        Subject: C = US, ST = test, L = Default City, O = Default Company Ltd, CN = api.xxia-test.qe.devcluster.openshift.com


Marking this VERIFIED on 4.1.0-0.nightly/release/4.1.0-0.nightly-2019-06-05-223716.   New certificate is served from disk without new rollout.

Comment 13 errata-xmlrpc 2019-06-19 06:45:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1382