|Summary:||Template router metrics listener is potentially broken by service CA rotation|
|Product:||OpenShift Container Platform||Reporter:||Maru Newby <mnewby>|
|Component:||Routing||Assignee:||Dan Mace <dmace>|
|Status:||CLOSED ERRATA||QA Contact:||Hongan Li <hongli>|
|Fixed In Version:||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|:||1809398 (view as bug list)||Environment:|
|Last Closed:||2020-01-23 11:12:22 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:|
Description Maru Newby 2019-11-12 23:42:10 UTC
The template router controller is supplied with the names of the serving cert and key files via env var: ROUTER_METRICS_TLS_CERT_FILE, ROUTER_METRICS_TLS_KEY_FILE The key material is loaded when the router runs and used to configure the metrics listener: https://github.com/openshift/router/blob/master/pkg/cmd/infra/router/template.go#L420 These vars are set in the deployment to a serving cert and key injected by the service ca controller: https://github.com/openshift/cluster-ingress-operator/blob/master/pkg/operator/controller/ingress/deployment.go#L241 In the event of the serving cert and key changing after service CA rotation, the operator (or router) should be capable of automatically detecting the change and refreshing to use the latest key material. The 'Refresh Strategies' section of the linked compatibility doc catalogs potential strategies for responding to changes in key material supplied by the service ca operator. Note that CA rotation can be manually triggered in any 4.x release by removing the service CA signing secret. Automated rotation is likely to be introduced in a future z-stream release. Reference: Enhancement for automated service CA rotation: https://github.com/openshift/enhancements/blob/master/enhancements/automated-service-ca-rotation.md Operator compatibility with service ca rotation: https://docs.google.com/document/d/1NB2wUf9e8XScfVM6jFBl8VuLYG6-3uV63eUpqmYE8Ts/edit
Comment 1 Dan Mace 2019-11-18 17:55:16 UTC
Miciah wants to try a kube-rbac-proxy approach to replace my implementation in https://github.com/openshift/router/pull/61, which I think is a good idea. I'm going to re-assign and Miciah will supersede my PR with a new one.
Comment 2 Dan Mace 2019-11-18 20:12:12 UTC
Further discussion revealed that the kube-rbac-proxy approach is incompatible with HostNetwork-exposed ingresscontrollers, so we're going back to my solution for now.
Comment 4 Hongan Li 2019-11-22 06:16:09 UTC
verified with 4.3.0-0.nightly-2019-11-19-122017 and issue has been fixed. delete the secret and router can reload the new one. # oc delete secret router-metrics-certs-default # oc logs router-default-xx-xx I1122 05:07:51.514632 1 template.go:630] router "level"=0 "msg"="reloaded metrics certificate" "cert"="/etc/pki/tls/metrics-certs/tls.crt" "key"="/etc/pki/tls/metrics-certs/tls.key"
Comment 6 errata-xmlrpc 2020-01-23 11:12:22 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062
Comment 7 Maru Newby 2020-02-17 20:44:48 UTC
I'm in the process of backporting service CA rotation to 4.2 and 4.3. Would it make sense to for this fix to similarly be backported to preclude failure of metrics collection in the event that the ingress operator is not restarted after CA rotation and before expiry of the pre-rotation CA?
Comment 8 Dan Mace 2020-02-17 21:14:22 UTC
(In reply to Maru Newby from comment #7) > I'm in the process of backporting service CA rotation to 4.2 and 4.3. Would > it make sense to for this fix to similarly be backported to preclude failure > of metrics collection in the event that the ingress operator is not > restarted after CA rotation and before expiry of the pre-rotation CA? I think the router reload function would be a nice addition to the rotation backport. I'll bet the patch applies cleanly through the cherry-pick automation. Could be an easy win.
Comment 9 Maru Newby 2020-03-03 04:20:07 UTC
Updating the target release to satisfy the bugzilla bot. Not sure why it should care what the target release is for something that is long fixed.