Bug 1714771

Summary: Updating the kube-apiserver certificate with a new certificate fails to reload the kube-apiserver certificate
Product: OpenShift Container Platform Reporter: Matt Woodson <mwoodson>
Component: kube-apiserverAssignee: Luis Sanchez <sanchezl>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, calfonso, christoph.obexer, deads, gblomqui, jokerman, jupierce, mfojtik, mfuruta, mmccomas
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1716622 (view as bug list) Environment:
Last Closed: 2019-10-16 06:29:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1716622    

Description Matt Woodson 2019-05-28 19:14:22 UTC
Description of problem:

Somewhat related to https://bugzilla.redhat.com/show_bug.cgi?id=1711431

We are attempting to rotate certificates for the kube-apiserver.  We have already applied a certificate by specifying here:

  servingCerts:
    namedCertificates:
    - names:
      - api.cluster_name.basedomain
      servingCertificate:
        name: my_secret_name

Everything is applied as desired.

Now, when this certificate expires, we need to renew.  We do the renewal, and replace the contents of the secret "my_secret_name" with the updated certificate, but the kube-apiserver never restarts and the new certificate is never applied.

Workaround: Delete the kube-apiserver pods, they start back up with the new certificate.



Version-Release number of selected component (if applicable):

version   4.1.0-rc.7   True        False         4h49m     Cluster version is 4.1.0-rc.7




Steps to Reproduce:

Steps already described


Actual results:
Certs dont' get applied


Expected results:

New cert to start serving

Comment 1 David Eads 2019-05-28 19:53:42 UTC
@sanchezl  We may be trying to auto-reload. Check the certs on disk.

Comment 2 David Eads 2019-05-28 19:56:49 UTC
Which pods did you delete?  Deleting a static pod doesn't do anything.

Also, you'll want to attach the must-gather output.

Comment 3 Matt Woodson 2019-05-28 20:59:15 UTC
I deleted the pod in the openshift-kube-apiserver namespace.  the pods I deleted were 'kube-apiserver-ip-???'

I have the must-gather script downloaded, but which operator does this need to be called on?  I tried openshift-kube-apiserver-operator and kube-apiserver-operator.  Please advise, and I will get it added

Comment 5 Luis Sanchez 2019-05-29 17:20:48 UTC
Recreation attempt:

Updated user cert in openshift-config namespace:

oc -n openshift-config create secret tls my_secret_name --cert cert.pem  --key privkey.pem --dry-run -o yaml  | oc --insecure-skip-tls-verify apply -f -

I noticed this event:

I0529 15:15:14.298881       1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"c025d5a1-8158-11e9-9b3d-0ab2223433b6", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/user-serving-cert-000 -n openshift-kube-apiserver because it changed

I confirmed {Secret/user-serving-cert-000 -n openshift-kube-apiserver} matched {Secret/my_secret_name -n openshift-config}. 

I was watching the returned certs with this command:

watch -n 0.1 bash -c 'echo | openssl s_client -showcerts -connect api.sanchezl.devcluster.openshift.com:6443 2>/dev/null | openssl x509 -inform pem -noout -text | grep -E "(Not After\s*:|Issuer\s*:|Subject\s*:|DNS\s*:|Not Before\s*:|Serial Number\s*:)"'

I was getting both the old and new cert returned on random retries.

When I checked the files on disk, only 1 out of 3 kube-apiserver pods had the updated certificate on disk. The cert-syncer container logs in the pods which did not update had very terse logs:

I0529 14:53:11.501705       1 observer_polling.go:106] Starting file observer
I0529 14:53:11.503146       1 certsync_controller.go:161] Starting CertSyncer

while the cert-syncer container logs on the "working" pod were verbose.

must-gather logs captured (https://drive.google.com/file/d/1eJl2WBvkOS8ZtC2OqFMzKz_9x81cEADx/view?usp=sharing).

Comment 6 Luis Sanchez 2019-05-29 17:42:37 UTC
Forcing redeployment works to ensure the new certs are being used:

oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "pickup new certs" } ]'

Comment 7 Luis Sanchez 2019-06-03 18:39:58 UTC
PR https://github.com/openshift/library-go/pull/430

Comment 11 errata-xmlrpc 2019-10-16 06:29:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922