Bug 2015829 - Too many haproxy processes in default-router pod causing high load average after upgrade from v4.8.3 to v4.8.10
Summary: Too many haproxy processes in default-router pod causing high load average af...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.z
Assignee: Andrew McDermott
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On: 2007581
Blocks: 2017708
TreeView+ depends on / blocked
 
Reported: 2021-10-20 08:19 UTC by OpenShift BugZilla Robot
Modified: 2022-08-04 22:35 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Release Note
Doc Text:
Prior to 4.8 the default balancing algorithm was "leastconn". The default was changed to "random" in OpenShift 4.8.0 for non-passthrough routes. Switching to "random" significantly increases memory consumption for each and every HAProxy process and the cumulative memory usage can be significant particularly if you have a lot of websocket connections. To mitigate this significant memory consumption the default balancing algorithm has now been reverted to "leastconn" for OpenShift 4.8 and 4.9. Once we have a solution that does not incur such significant memory requirements we would default to "random" again, but that would be for a future OpenShift release (i.e., 4.10 or later). You can check the default setting via: $ oc get deployment -n openshift-ingress router-default -o yaml | grep -A 2 ROUTER_LOAD_BALANCE_ALGORITHM - name: ROUTER_LOAD_BALANCE_ALGORITHM value: leastconn The "random" option is still available but each and every route that would benefit from this algorithmic choice will now have to explicitly request that by setting the following annotation on a per-route basis: $ oc annotate -n <NAMESPACE> route/<ROUTE-NAME> "haproxy.router.openshift.io/balance=random"
Clone Of:
Environment:
Last Closed: 2021-11-10 21:02:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 667 0 None open [release-4.9] Bug 2015829: Change default balancing algorithm to "leastconn" 2021-10-25 12:16:44 UTC
Red Hat Product Errata RHBA-2021:4119 0 None None None 2021-11-10 21:02:54 UTC

Comment 1 Arvind iyengar 2021-10-26 05:21:43 UTC
Verified in "4.9.0-0.ci.test-2021-10-26-041049-ci-ln-3jsdntt-latest" release. With this payload, it is observed that LB algorithm now defaults to "leastconn" instead of "Random". The "random" algorithm can be invoked using the "unsupportedConfigOverrides" operator option.
-----
oc get clusterversion                     
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.ci.test-2021-10-26-041049-ci-ln-3jsdntt-latest   True        False         3m33s   Cluster version is 4.9.0-0.ci.test-2021-10-26-041049-ci-ln-3jsdntt-latest

From inside the default router:
env | grep -i ROUTER_LOAD_BALANCE_ALGORITHM
ROUTER_LOAD_BALANCE_ALGORITHM=leastconn


Haproxy configuration post deploying a test route:
backend be_http:test1:service-unsecure
  mode http
  option redispatch
  option forwardfor
  balance leastconn <-----

  timeout check 5000ms
  http-request add-header X-Forwarded-Host %[req.hdr(host)]
  http-request add-header X-Forwarded-Port %[dst_port]
  http-request add-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
  cookie e96c07fa08f2609cadf847f019750244 insert indirect nocache httponly
  server pod:web-server-rc-v5pfs:service-unsecure:http:10.129.2.12:8080 10.129.2.12:8080 cookie 54da8f055054764759dc581cc5af53d7 weight 256 check inter 5000ms
  server pod:web-server-rc-vffmf:service-unsecure:http:10.131.0.22:8080 10.131.0.22:8080 cookie 228dac422c1912e77ef56e94b8caf1de weight 256 check inter 5000ms


Changes after applying the "unsupportedConfigOverrides" option to enable the "random" algorithm:
oc -n openshift-ingress-operator patch ingresscontroller/internalapps --type=merge --patch='{"spec":{"unsupportedConfigOverrides":{"loadBalancingAlgorithm":"random"}}}'
ingresscontroller.operator.openshift.io/internalapps patched


sh-4.4$ env | grep -i ROUTER_LOAD_BALANCE_ALGORITHM
ROUTER_LOAD_BALANCE_ALGORITHM=random


backend be_http:test1:service-unsecure
  mode http
  option redispatch
  option forwardfor
  balance random <-----

  timeout check 5000ms
  http-request add-header X-Forwarded-Host %[req.hdr(host)]
  http-request add-header X-Forwarded-Port %[dst_port]
  http-request add-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
  cookie e96c07fa08f2609cadf847f019750244 insert indirect nocache httponly
  server pod:web-server-rc-v5pfs:service-unsecure:http:10.129.2.12:8080 10.129.2.12:8080 cookie 54da8f055054764759dc581cc5af53d7 weight 256 check inter 5000ms
  server pod:web-server-rc-vffmf:service-unsecure:http:10.131.0.22:8080 10.131.0.22:8080 cookie 228dac422c1912e77ef56e94b8caf1de weight 256 check inter 5000ms
-----

Comment 2 Miciah Dashiel Butler Masters 2021-10-26 16:08:49 UTC
This is a high-severity issue, but I'm marking as blocker- because (1) there is a workaround and (2) the issue in 4.9 also exists in 4.8, and therefore this isn't a regression.

Comment 7 errata-xmlrpc 2021-11-10 21:02:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4119


Note You need to log in before you can comment on or make changes to this bug.