Bug 1915079

Summary: Canary controller should not periodically rotate the canary route endpoint for performance reasons
Product: OpenShift Container Platform Reporter: Stephen Greene <sgreene>
Component: NetworkingAssignee: Stephen Greene <sgreene>
Networking sub component: router QA Contact: Arvind iyengar <aiyengar>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aiyengar, aos-bugs, hongli
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:51:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stephen Greene 2021-01-11 21:58:06 UTC
Description of problem:
Currently in 4.7, the new canary route is periodically modified by the canary controller so that the canary route switches which endpoint it hits. This is great for verifying whether or not the router has wedged, but since router reloads have a performance cost, this should not be enabled by default (especially for larger clusters already impacted by router reload performance issues). 

To resolve this BZ, add an annotation option for the default ingress controller that makes the canary route rotation functionality opt-in (disabled by default).


Version-Release number of selected component (if applicable):

4.7
How reproducible:
100%

Steps to Reproduce:
1. View ingress-operator logs on a 4.7 cluster


Actual results:
Observe that the canary route is periodically rotated every 5 minutes by default.

Expected results:
The canary route is not rotated once it is created, unless the canary-route-rotation annotation on the default ingress controller is set to true.


Additional info:

Comment 2 Arvind iyengar 2021-01-19 11:16:22 UTC
Verfied in "4.7.0-0.nightly-2021-01-18-214951" release payload. With this release, enabling the route rotation with the new annotation option, the rotation is triggered periodically:
-----
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-01-18-214951   True        False         142m    Cluster version is 4.7.0-0.nightly-2021-01-18-214951

With the value set to "true" which is default applied:
$ oc -n openshift-ingress-operator get  ingresscontroller default -o yaml 
         apiVersion: operator.openshift.io/v1
         kind: IngressController
         metadata:
           annotations:
             ingress.operator.openshift.io/rotate-canary-route: "true" <---
           creationTimestamp: "2021-01-19T08:19:33Z"
           finalizers:
           - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller
           generation: 1
           managedFields:

$ oc -n openshift-ingress logs deployment.apps/router-default  --tail 20
I0119 10:12:29.201234       1 router.go:578] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0119 10:19:29.344200       1 router.go:578] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0119 10:26:29.503635       1 router.go:578] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0119 10:33:29.707338       1 router.go:578] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0119 10:40:29.857865       1 router.go:578] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0119 10:47:30.045171       1 router.go:578] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

$ oc -n openshift-ingress-operator logs deployment.apps/ingress-operator -c ingress-operator --tail 20
2021-01-19T10:12:28.962Z	INFO	operator.canary_controller	canary/controller.go:363	updated canary route	{"namespace": "openshift-ingress-canary", "name": "canary"}
2021-01-19T10:19:29.108Z	INFO	operator.canary_controller	canary/controller.go:363	updated canary route	{"namespace": "openshift-ingress-canary", "name": "canary"}
2021-01-19T10:26:29.267Z	INFO	operator.canary_controller	canary/controller.go:363	updated canary route	{"namespace": "openshift-ingress-canary", "name": "canary"}
2021-01-19T10:33:29.473Z	INFO	operator.canary_controller	canary/controller.go:363	updated canary route	{"namespace": "openshift-ingress-canary", "name": "canary"}
2021-01-19T10:40:29.622Z	INFO	operator.canary_controller	canary/controller.go:363	updated canary route	{"namespace": "openshift-ingress-canary", "name": "canary"}
2021-01-19T10:47:29.809Z	INFO	operator.canary_controller	canary/controller.go:363	updated canary route	{"namespace": "openshift-ingress-canary", "name": "canary"}   
-----

Whereas setting it "False" or removing the annotation from the controller section, the reload no more occurs.

Comment 5 errata-xmlrpc 2021-02-24 15:51:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633