Bug 1920421
Summary: | Too many haproxy processes in default-router pod causing high load average | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
Component: | Networking | Assignee: | Andrew McDermott <amcdermo> |
Networking sub component: | router | QA Contact: | Arvind iyengar <aiyengar> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | aiyengar, amcdermo, aos-bugs, bperkins, ddelcian, dgautam, hongli, kpelc, ltitov, mrobson, obockows, sthakare, wking |
Version: | 4.5 | ||
Target Milestone: | --- | ||
Target Release: | 4.5.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-03-03 04:40:35 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1918371 | ||
Bug Blocks: |
Comment 1
Andrew McDermott
2021-01-26 10:14:55 UTC
Verified in '4.5.0-0.nightly-2021-01-30-093850' release payload. With this version, "hard-stop-after" options appear to work as intended where the option get applied globally with the annotation added to "ingresses.config/cluster" and it can be applied on a per ingresscontroller basis as well: ------ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2021-01-30-093850 True False 87m Cluster version is 4.5.0-0.nightly-2021-01-30-093850 $ oc annotate ingresses.config/cluster ingress.operator.openshift.io/hard-stop-after=30m ingress.config.openshift.io/cluster annotated $ oc -n openshift-ingress get pods router-default-6c5bbf6476-qn8lv -o yaml | grep -i HARD -A1 | grep -iv "\{"~ k:{"name":"ROUTER_HARD_STOP_AFTER"}: .: {} -- - name: ROUTER_HARD_STOP_AFTER value: 30m $ oc -n openshift-ingress get pods router-internalapps-574c9c47c5-bv2gw -o yaml | grep -i HARD -A1 | grep -iv "\{"~ k:{"name":"ROUTER_HARD_STOP_AFTER"}: .: {} -- - name: ROUTER_HARD_STOP_AFTER value: 30m ------ When applied on per ingresscontroller basis: ------ $ oc -n openshift-ingress-operator annotate ingresscontrollers/internalapps ingress.operator.openshift.io/hard-stop-after=15m ingresscontroller.operator.openshift.io/default annotated $ oc -n openshift-ingress get pods router-default-6c5bbf6476-qn8lv -o yaml | grep -i HARD -A1 | grep -iv "\{" -- - name: ROUTER_HARD_STOP_AFTER value: 30m $ oc -n openshift-ingress get pods router-internalapps-574c9c47c5-bv2gw -o yaml | grep -i HARD -A1 | grep -iv "\{" -- - name: ROUTER_HARD_STOP_AFTER value: 15m ------ Moving this back to POST as it needs to include https://github.com/openshift/router/pull/250. Verified in '4.5.0-0.ci.test-2021-01-02-031712-ci-ln-dplk5kt release payload. With this version, the "timeout-tunnel" option appears to work as intended where the "haproxy.router.openshift.io/timeout-tunnel" annotation when applied along with "haproxy.router.openshift.io/timeout", both values gets preserved in the haproxy configuration for clear/edge/re-encrypt routes: ----- $ oc get route -o wide NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD edge-route edge-route-test1.apps.ci-ln-dplk5kt-f76d1.origin-ci-int-gce.dev.openshift.com service-unsecure2 http edge None reen-route reen-route-test1.apps.ci-ln-dplk5kt-f76d1.origin-ci-int-gce.dev.openshift.com service-secure https reencrypt None service-unsecure service-unsecure-test1.apps.ci-ln-dplk5kt-f76d1.origin-ci-int-gce.dev.openshift.com service-unsecure http None $ oc annotate route edge-route haproxy.router.openshift.io/timeout-tunnel=5s route.route.openshift.io/edge-route annotated $ oc annotate route edge-route haproxy.router.openshift.io/timeout=15s route.route.openshift.io/edge-route annotated $ oc annotate route reen-route haproxy.router.openshift.io/timeout=15s route.route.openshift.io/reen-route annotated $ oc annotate route reen-route haproxy.router.openshift.io/timeout-tunnel=5s route.route.openshift.io/reen-route annotated $ oc annotate route service-unsecure haproxy.router.openshift.io/timeout-tunnel=15s route.route.openshift.io/service-unsecure annotated $ oc annotate route service-unsecure haproxy.router.openshift.io/timeout=5s route.route.openshift.io/service-unsecure annotated oc -n openshift-ingress exec router-default-864d8b5b76-4brsr -- grep "test1:reen-route" haproxy.config -A8 backend be_secure:test1:reen-route mode http option redispatch option forwardfor balance leastconn timeout server 15s timeout tunnel 5s $ oc -n openshift-ingress exec router-default-864d8b5b76-4brsr -- grep "test1:edge-route" haproxy.config -A8 backend be_edge_http:test1:edge-route mode http option redispatch option forwardfor balance leastconn timeout server 15s timeout tunnel 5s $ oc -n openshift-ingress exec router-default-864d8b5b76-4brsr -- grep "test1:service-unsecure" haproxy.config -A8 backend be_http:test1:service-unsecure mode http option redispatch option forwardfor balance leastconn timeout server 5s timeout tunnel 15s ----- * Whereas for the passthrough routes, the "timeout-tunnel" will supersede 'timeout' values: ----- $ oc get route -o wide NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route-passth route-passth-test1.apps.ci-ln-dplk5kt-f76d1.origin-ci-int-gce.dev.openshift.com service-secure2 https passthrough None $ oc annotate route route-passth haproxy.router.openshift.io/timeout-tunnel=15s route.route.openshift.io/route-passth annotated $ oc annotate route route-passth haproxy.router.openshift.io/timeout=5s route.route.openshift.io/route-passth annotated backend be_tcp:test1:route-passth balance source timeout tunnel 15s ----- Re-verified in the latest "4.5.0-0.nightly-2021-02-05-192721" release version. The "haproxy.router.openshift.io/timeout-tunnel" and "hard-stop-after" anntotion are fully functional. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.5.33 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0428 |