Bug 1920421 - Too many haproxy processes in default-router pod causing high load average
Summary: Too many haproxy processes in default-router pod causing high load average
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.5.z
Assignee: Andrew McDermott
QA Contact: Arvind iyengar
URL:
Whiteboard:
: 1920423 (view as bug list)
Depends On: 1918371
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-26 09:50 UTC by OpenShift BugZilla Robot
Modified: 2021-03-03 04:40 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-03 04:40:35 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 538 0 None closed Bug 1920421: Add "ingress.operator.openshift.io/hard-stop-after" annotation 2021-02-15 10:41:00 UTC
Github openshift router pull 250 0 None closed [release-4.5] Bug 1920421: Add tunnel-timeout and hard-stop-after options to haproxy template 2021-02-15 10:41:00 UTC
Red Hat Product Errata RHSA-2021:0428 0 None None None 2021-03-03 04:40:55 UTC

Comment 1 Andrew McDermott 2021-01-26 10:14:55 UTC
*** Bug 1920423 has been marked as a duplicate of this bug. ***

Comment 4 Arvind iyengar 2021-02-01 09:59:01 UTC
Verified in '4.5.0-0.nightly-2021-01-30-093850' release payload. With this version, "hard-stop-after" options appear to work as intended where the option get applied globally with the annotation added to "ingresses.config/cluster" and it can be applied on a per ingresscontroller basis as well:
------
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2021-01-30-093850   True        False         87m     Cluster version is 4.5.0-0.nightly-2021-01-30-093850

$ oc annotate ingresses.config/cluster ingress.operator.openshift.io/hard-stop-after=30m     
ingress.config.openshift.io/cluster annotated

$ oc -n openshift-ingress get pods router-default-6c5bbf6476-qn8lv -o yaml | grep -i HARD -A1 | grep -iv  "\{"~
              k:{"name":"ROUTER_HARD_STOP_AFTER"}:
                .: {}
--
    - name: ROUTER_HARD_STOP_AFTER
      value: 30m


$ oc -n openshift-ingress get pods router-internalapps-574c9c47c5-bv2gw -o yaml | grep -i HARD -A1 | grep -iv  "\{"~
              k:{"name":"ROUTER_HARD_STOP_AFTER"}:
                .: {}
--
    - name: ROUTER_HARD_STOP_AFTER
      value: 30m
------

When applied on per ingresscontroller basis:
------
$ oc -n openshift-ingress-operator annotate ingresscontrollers/internalapps ingress.operator.openshift.io/hard-stop-after=15m
ingresscontroller.operator.openshift.io/default annotated

$ oc -n openshift-ingress get pods router-default-6c5bbf6476-qn8lv -o yaml  | grep -i HARD -A1 | grep -iv  "\{"
--
    - name: ROUTER_HARD_STOP_AFTER
      value: 30m

$ oc -n openshift-ingress get pods router-internalapps-574c9c47c5-bv2gw -o yaml  | grep -i HARD -A1 | grep -iv  "\{"         
--
    - name: ROUTER_HARD_STOP_AFTER
      value: 15m
------

Comment 5 Andrew McDermott 2021-02-01 11:40:52 UTC
Moving this back to POST as it needs to include https://github.com/openshift/router/pull/250.

Comment 6 Arvind iyengar 2021-02-02 10:23:10 UTC
Verified in '4.5.0-0.ci.test-2021-01-02-031712-ci-ln-dplk5kt release payload. With this version, the "timeout-tunnel"  option appears to work as intended where the "haproxy.router.openshift.io/timeout-tunnel" annotation when applied along with "haproxy.router.openshift.io/timeout", both values gets preserved in the haproxy configuration for clear/edge/re-encrypt routes:
-----
$ oc get route -o wide
NAME               HOST/PORT                                                                             PATH   SERVICES            PORT    TERMINATION   WILDCARD
edge-route         edge-route-test1.apps.ci-ln-dplk5kt-f76d1.origin-ci-int-gce.dev.openshift.com                service-unsecure2   http    edge          None
reen-route         reen-route-test1.apps.ci-ln-dplk5kt-f76d1.origin-ci-int-gce.dev.openshift.com                service-secure      https   reencrypt     None
service-unsecure   service-unsecure-test1.apps.ci-ln-dplk5kt-f76d1.origin-ci-int-gce.dev.openshift.com          service-unsecure    http                  None


$  oc annotate route  edge-route  haproxy.router.openshift.io/timeout-tunnel=5s
route.route.openshift.io/edge-route annotated

$ oc annotate route  edge-route  haproxy.router.openshift.io/timeout=15s 
route.route.openshift.io/edge-route annotated

$ oc annotate route  reen-route  haproxy.router.openshift.io/timeout=15s
route.route.openshift.io/reen-route annotated

$ oc annotate route  reen-route  haproxy.router.openshift.io/timeout-tunnel=5s 
route.route.openshift.io/reen-route annotated

$ oc annotate route  service-unsecure  haproxy.router.openshift.io/timeout-tunnel=15s
route.route.openshift.io/service-unsecure annotated

$ oc annotate route  service-unsecure  haproxy.router.openshift.io/timeout=5s  
route.route.openshift.io/service-unsecure annotated


oc -n openshift-ingress exec router-default-864d8b5b76-4brsr --  grep "test1:reen-route" haproxy.config  -A8  
backend be_secure:test1:reen-route
  mode http
  option redispatch
  option forwardfor
  balance leastconn
  timeout server  15s
  timeout tunnel  5s

 $ oc -n openshift-ingress exec router-default-864d8b5b76-4brsr --  grep "test1:edge-route" haproxy.config  -A8 
backend be_edge_http:test1:edge-route
  mode http
  option redispatch
  option forwardfor
  balance leastconn
  timeout server  15s
  timeout tunnel  5s

$ oc -n openshift-ingress exec router-default-864d8b5b76-4brsr --  grep "test1:service-unsecure" haproxy.config  -A8  
backend be_http:test1:service-unsecure
  mode http
  option redispatch
  option forwardfor
  balance leastconn
  timeout server  5s
  timeout tunnel  15s

-----

* Whereas for the passthrough routes, the "timeout-tunnel" will supersede 'timeout' values:
-----

$ oc get route -o wide
NAME               HOST/PORT                                                                             PATH   SERVICES            PORT    TERMINATION   WILDCARD
route-passth       route-passth-test1.apps.ci-ln-dplk5kt-f76d1.origin-ci-int-gce.dev.openshift.com              service-secure2     https   passthrough   None

$ oc annotate route  route-passth  haproxy.router.openshift.io/timeout-tunnel=15s                               
route.route.openshift.io/route-passth annotated

$ oc annotate route  route-passth  haproxy.router.openshift.io/timeout=5s         
route.route.openshift.io/route-passth annotated


backend be_tcp:test1:route-passth
  balance source
  timeout tunnel  15s
-----

Comment 10 Arvind iyengar 2021-02-08 08:04:32 UTC
Re-verified in the latest "4.5.0-0.nightly-2021-02-05-192721" release version. The "haproxy.router.openshift.io/timeout-tunnel" and "hard-stop-after" anntotion are fully functional.

Comment 12 errata-xmlrpc 2021-03-03 04:40:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.5.33 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0428


Note You need to log in before you can comment on or make changes to this bug.