Description of problem: With the newly introduced "tlsInspectDelay" parameter applied through "TuningOptions" setting in the ingresscontroller, a negative value applied causes the router pod to go into crash loop. OpenShift release version: Release version: 4.9.0-0.nightly-2021-07-27-125952 Cluster Platform: OCP How reproducible: Always Steps to Reproduce (in detail): 1.Deploy a cluster with said release version or above. 2.Deploy an ingresscontroller or modify any existing one to have the "tlsInspectDelay" set to a negative value: ---- If all the values are set to something negative, they are ignored and the proxy is configured with default timer values: spec: tuningOptions: tlsInspectDelay: -10s ---- 3.Check the router pod status or the logs of the pod. Actual results: The pod fails to reloaded with below error: ----- [NOTICE] 209/044252 (21) : haproxy version is 2.2.15-5e8f49d [NOTICE] 209/044252 (21) : path to executable is /usr/sbin/haproxy [ALERT] 209/044252 (21) : parsing [/var/lib/haproxy/conf/haproxy.config:67] : 'tcp-request inspect-delay' expects a positive delay in milliseconds, in frontend 'public' (unexpected character '-') [ALERT] 209/044252 (21) : parsing [/var/lib/haproxy/conf/haproxy.config:93] : 'tcp-request inspect-delay' expects a positive delay in milliseconds, in frontend 'public_ssl' (unexpected character '-') [ALERT] 209/044252 (21) : Error(s) found in configuration file : /var/lib/haproxy/conf/haproxy.config [ALERT] 209/044252 (21) : Fatal errors found in configuration. ----- Expected results: The negative value should not be parsed or attempted to be applied in the router configuration. It should ideally be discarded and the router should load with the default values instead of failing. Impact of the problem: The "TuningOptions" parameters introduce many options to tune the haproxy performance and functions. There is a good chance if a negative value gets applied by human error it will lead to router crashes which are not desirable Additional info: The Other tuning options such as tcp server/client and tunnel timers applied via the "TuningOptions" section appear to work perfectly where the negative value is discarded and the router gets loaded with the default value. ------ Ingresscontroller configuration: tuningOptions: clientFinTimeout: -2s clientTimeout: -32s serverFinTimeout: -2s serverTimeout: -25s tunnelTimeout: -2h router status post the change: oc -n openshift-ingress get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-d8f4b6d59-f9bpq 1/1 Running 0 27h 10.131.0.12 ip-10-0-167-125.us-east-2.compute.internal <none> <none> router-default-d8f4b6d59-qh4ff 1/1 Running 0 27h 10.128.2.7 ip-10-0-220-179.us-east-2.compute.internal <none> <none> router-internalapps-6b5fdb4c89-4tjv8 2/2 Running 0 3m35s 10.131.0.35 ip-10-0-167-125.us-east-2.compute.internal <none> <none> Router pod environment variables: sh-4.4$ env | grep -i timeout ROUTER_CLIENT_FIN_TIMEOUT=-2s ROUTER_DEFAULT_CLIENT_TIMEOUT=-32s ROUTER_DEFAULT_SERVER_TIMEOUT=-25s ROUTER_DEFAULT_SERVER_FIN_TIMEOUT=-2s ROUTER_DEFAULT_TUNNEL_TIMEOUT=-2h haproxy.config timeout client 30s timeout client-fin 1s timeout server 30s timeout server-fin 1s # Long timeout for WebSocket connections. timeout tunnel 1h -------
Verified in "4.9.0-0.ci.test-2021-07-30-084757-ci-ln-06q9z1b-latest" release version. With this release, the router no more appears to crash for the negative values specified for "tlsInspectDelay" parameter as it gets discarded and the router loads with the default value: ------- oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.ci.test-2021-07-30-084757-ci-ln-06q9z1b-latest True False 2m3s Cluster version is 4.9.0-0.ci.test-2021-07-30-084757-ci-ln-06q9z1b-latest Post the change: oc -n openshift-ingress-operator get ingresscontroller internalapps -o yaml | grep -i tuning -A1 tuningOptions: tlsInspectDelay: -10s oc -n openshift-ingress get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-575d9dc464-7nx55 1/1 Running 0 28m 10.131.0.5 ip-10-0-165-4.us-east-2.compute.internal <none> <none> router-default-575d9dc464-m9cqh 1/1 Running 0 28m 10.128.2.8 ip-10-0-250-44.us-east-2.compute.internal <none> <none> router-internalapps-6747cd588d-tt6cj 2/2 Running 0 5s 10.131.0.27 ip-10-0-165-4.us-east-2.compute.internal <none> <none> sh-4.4$ env | grep -i inspect ROUTER_INSPECT_DELAY=-10s sh-4.4$ cat haproxy.config frontend public bind :80 accept-proxy mode http tcp-request inspect-delay 5s <---- -------
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759