The template router controller is supplied with the names of the serving cert and key files via env var: ROUTER_METRICS_TLS_CERT_FILE, ROUTER_METRICS_TLS_KEY_FILE The key material is loaded when the router runs and used to configure the metrics listener: https://github.com/openshift/router/blob/master/pkg/cmd/infra/router/template.go#L420 These vars are set in the deployment to a serving cert and key injected by the service ca controller: https://github.com/openshift/cluster-ingress-operator/blob/master/pkg/operator/controller/ingress/deployment.go#L241 In the event of the serving cert and key changing after service CA rotation, the operator (or router) should be capable of automatically detecting the change and refreshing to use the latest key material. The 'Refresh Strategies' section of the linked compatibility doc catalogs potential strategies for responding to changes in key material supplied by the service ca operator. Note that CA rotation can be manually triggered in any 4.x release by removing the service CA signing secret. Automated rotation is likely to be introduced in a future z-stream release. Reference: Enhancement for automated service CA rotation: https://github.com/openshift/enhancements/blob/master/enhancements/automated-service-ca-rotation.md Operator compatibility with service ca rotation: https://docs.google.com/document/d/1NB2wUf9e8XScfVM6jFBl8VuLYG6-3uV63eUpqmYE8Ts/edit
Miciah wants to try a kube-rbac-proxy approach to replace my implementation in https://github.com/openshift/router/pull/61, which I think is a good idea. I'm going to re-assign and Miciah will supersede my PR with a new one.
Further discussion revealed that the kube-rbac-proxy approach is incompatible with HostNetwork-exposed ingresscontrollers, so we're going back to my solution for now.
verified with 4.3.0-0.nightly-2019-11-19-122017 and issue has been fixed. delete the secret and router can reload the new one. # oc delete secret router-metrics-certs-default # oc logs router-default-xx-xx I1122 05:07:51.514632 1 template.go:630] router "level"=0 "msg"="reloaded metrics certificate" "cert"="/etc/pki/tls/metrics-certs/tls.crt" "key"="/etc/pki/tls/metrics-certs/tls.key"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062
I'm in the process of backporting service CA rotation to 4.2 and 4.3. Would it make sense to for this fix to similarly be backported to preclude failure of metrics collection in the event that the ingress operator is not restarted after CA rotation and before expiry of the pre-rotation CA?
(In reply to Maru Newby from comment #7) > I'm in the process of backporting service CA rotation to 4.2 and 4.3. Would > it make sense to for this fix to similarly be backported to preclude failure > of metrics collection in the event that the ingress operator is not > restarted after CA rotation and before expiry of the pre-rotation CA? I think the router reload function would be a nice addition to the rotation backport. I'll bet the patch applies cleanly through the cherry-pick automation. Could be an easy win.
Updating the target release to satisfy the bugzilla bot. Not sure why it should care what the target release is for something that is long fixed.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days