1845522 – haproxy temporarily leaks resources when reloading

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1845522 - haproxy temporarily leaks resources when reloading

Summary: haproxy temporarily leaks resources when reloading

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	haproxy
Sub Component:
Version:	8.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	8.0
Assignee:	Ryan O'Hara
QA Contact:	Brandon Perkins
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1845406
TreeView+	depends on / blocked

Reported:	2020-06-09 12:43 UTC by Gregory Thiemonge
Modified:	2020-07-06 19:21 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-06 19:21:07 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Description Gregory Thiemonge 2020-06-09 12:43:12 UTC

Description of problem:
Originally reported for the Red Hat OpenStack Platform Octavia component (BZ 1845406):
haproxy 1.8.23 consumes twice as much memory for more than 2 minutes when reloading a configuration that uses stick tables.

Version-Release number of selected component (if applicable):
1.8.23-3.el8 RHEL8.2

How reproducible:
100%

Steps to Reproduce:
1. we have a configuration file that combines "peers", "stick-table":

global
    daemon
    user nobody
    log /run/rsyslog/octavia/log local0
    log /run/rsyslog/octavia/log local1 notice
    stats socket /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.sock mode 0666 level user
    maxconn 1000000

defaults
    log global
    retries 3
    option redispatch
    option splice-request
    option splice-response
    option http-keep-alive

peers cbbef8ef65504a559c2f19f1a63bdbdd_peers
    peer G4SXZTP8batgIQ4iAoBGkWJMeeg 10.0.0.28:1025
    peer eTOcOhwbysoktiQ1CxhjxbKu2BU 10.0.0.54:1025


frontend 73f33f7d-b8b7-4c79-bb92-de2e97ba08f6
    maxconn 1000000
    bind 10.0.0.11:80
    mode http
    default_backend a69bee16-7b23-44b9-bd63-602a108cf477:73f33f7d-b8b7-4c79-bb92-de2e97ba08f6
    timeout client 50000
    log-format ec4fc6e9845d44ada460fab5c985adf9\ cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd\ %f\ %ci\ %cp\ %t\ %{+Q}r\ %ST\ %B\ %U\ %[ssl_c_verify]\ %{+Q}[ssl_c_s_dn]\ %b\ %s\ %Tt\ %tsc

backend a69bee16-7b23-44b9-bd63-602a108cf477:73f33f7d-b8b7-4c79-bb92-de2e97ba08f6
    mode http
    http-reuse safe
    balance roundrobin
    stick-table type string len 64 size 10k peers cbbef8ef65504a559c2f19f1a63bdbdd_peers
    stick store-response res.cook(my_cookie)
    stick match req.cook(my_cookie)
    fullconn 1000000
    option allbackups
    timeout connect 5000
    timeout server 50000


2. when haproxy is running it consumes ~150MB of memory, in a 1GB VM

[root@amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1 cloud-user]# ps axu | grep haproxy
root        5760  0.2  1.0  80712  8744 ?        Ss   03:58   0:00 /usr/sbin/haproxy -Ws -f /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -p /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.pid -L G4SXZTP8batgIQ4iAoBGkWJMeeg -sf 6130
nobody      6199  0.0 18.9 295568 158812 ?       Ss   04:01   0:00 /usr/sbin/haproxy -Ws -f /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -p /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.pid -L G4SXZTP8batgIQ4iAoBGkWJMeeg -sf 6130
root        6223  0.0  0.1  12108  1060 pts/1    S+   04:02   0:00 grep --color=auto haproxy


3. when reloading haproxy (it is reloaded by Octavia after any configuration changes), a new worker is created, but the previous worker is cleaned up after more than 2 minutes.

[root@amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1 cloud-user]# systemctl reload haproxy-cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.service

Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal systemd[1]: Reloading HAProxy Load Balancer.
Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal haproxy[6277]: Configuration file is valid
Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040152 (5760) : Reexecuting Master process
Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040340 (5760) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 2500040, limit is 2097152.
Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040340 (5760) : [/usr/sbin/haproxy.main()] FD limit (2097152) too low for maxconn=1000000/maxsock=2500040. Please raise 'ulimit-n' to 2500040 or more to avoid any trouble.
Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal systemd[1]: Reloaded HAProxy Load Balancer.


[root@amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1 cloud-user]# ps axu | grep haproxy
root        5760  0.1  1.0  80712  8584 ?        Ss   03:58   0:00 /usr/sbin/haproxy -Ws -f /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -p /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.pid -L G4SXZTP8batgIQ4iAoBGkWJMeeg -sf 6199
nobody      6199  0.0 18.9 295568 158812 ?       Ss   04:01   0:00 /usr/sbin/haproxy -Ws -f /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -p /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.pid -L G4SXZTP8batgIQ4iAoBGkWJMeeg -sf 6130
nobody      6279  0.0 18.9 295568 158808 ?       Ss   04:03   0:00 /usr/sbin/haproxy -Ws -f /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -p /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.pid -L G4SXZTP8batgIQ4iAoBGkWJMeeg -sf 6199
root        6281  0.0  0.1  12108  1112 pts/1    S+   04:03   0:00 grep --color=auto haproxy



Actual results:

During that period, any new configuration change creates a new worker that triggers some memory allocation issues:


Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal systemd[1]: Reloading HAProxy Load Balancer.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal haproxy[6335]: Configuration file is valid
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040340 (5760) : Reexecuting Master process
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040503 (5760) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 2500040, limit is 2097152.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040503 (5760) : [/usr/sbin/haproxy.main()] FD limit (2097152) too low for maxconn=1000000/maxsock=2500040. Please raise 'ulimit-n' to 2500040 or more to avoid any trouble.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [ALERT] 160/040503 (5760) : [/usr/sbin/haproxy.main()] Cannot fork.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040503 (5760) : Reexecuting Master process in waitpid mode
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040503 (5760) : Reexecuting Master process
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: Usage : haproxy [-f <cfgfile|cfgdir>]* [ -vdVD ] [ -n <maxconn> ] [ -N <maxpconn> ]
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         [ -p <pidfile> ] [ -m <max megs> ] [ -C <dir> ] [-- <cfgfile>*]
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -v displays version ; -vv shows known build options.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -d enters debug mode ; -db only disables background mode.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -dM[<byte>] poisons memory with <byte> (defaults to 0x50)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -V enters verbose mode (disables quiet mode)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -D goes daemon ; -C changes to <dir> before loading files.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -W master-worker mode.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -Ws master-worker mode with systemd notify support.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -q quiet mode : don't display messages
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -c check mode : only check config files and exit
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -n sets the maximum total # of connections (2000)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -m limits the usable amount of memory (in MB)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -N sets the default, per-proxy maximum # of connections (2000)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -L set local peer name (default to hostname)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -p writes pids of all children to this file
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -de disables epoll() usage even when available
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -dp disables poll() usage even when available
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -dS disables splice usage (broken on old kernels)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -dG disables getaddrinfo() usage
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -dR disables SO_REUSEPORT usage
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -dr ignores server address resolution failures
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -dV disables SSL verify on servers side
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -sf/-st [pid ]* finishes/terminates old pids.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]:         -x <unix_socket> get listening sockets from a unix socket
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: HA-Proxy version 1.8.23 2019/11/25
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: Copyright 2000-2019 Willy Tarreau <willy>
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal systemd[1]: haproxy-cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.service: Main process exited, code=exited, status=1/FAILURE
Jun 09 04:05:08 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal Keepalived_vrrp[4697]: Script `check_script` now returning 1
Jun 09 04:05:13 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal Keepalived_vrrp[4697]: VRRP_Script(check_script) failed (exited with status 1)
Jun 09 04:05:13 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal Keepalived_vrrp[4697]: (cbbef8ef65504a559c2f19f1a63bdbdd) Entering FAULT STATE



Expected results:
The previous worker should be cleaned up earlier, allowing us to reload haproxy for new configuration changes as soon as the new worker is initialized.


Additional info:

1. Logs that show that the former worker is destroyed after 2min

Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal systemd[1]: Reloading HAProxy Load Balancer.
Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal haproxy[7638]: Configuration file is valid
Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal ip[7213]: [WARNING] 160/043210 (7213) : Reexecuting Master process
Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal ip[7213]: [WARNING] 160/043248 (7213) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 2500040, limit is 2534.
Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal ip[7213]: [WARNING] 160/043248 (7213) : [/usr/sbin/haproxy.main()] FD limit (2534) too low for maxconn=1000000/maxsock=2500040. Please raise 'ulimit-n' to 2500040 or more to avoid any trouble.
Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal systemd[1]: Reloaded HAProxy Load Balancer.
Jun 09 04:34:59 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal ip[7213]: [WARNING] 160/043248 (7213) : Former worker 7605 exited with code 0

2. The issue is not reproducible without using stick-tables.

3. Config file doesn't contain any backend server, we don't generate any network traffic on the frontend.

Comment 1 Ryan O'Hara 2020-06-09 14:57:21 UTC

Did you report this issue upstream?

Comment 2 Gregory Thiemonge 2020-06-10 17:01:37 UTC

(In reply to Ryan O'Hara from comment #1)
> Did you report this issue upstream?

No, I didn't.
I tested with haproxy 2.1.7 and I ran into a similar issue: the worker was destroyed after 8min.

It looks like a normal behavior but it's a problem for us (OpenStack Octavia) because we use a 1GB VM and 1000000 as default value for maxconn, so haproxy consumes most of the memory.

I'll check if I can find a simple reproducer for 2.1.x and I'll report it upstream.
And we'll have a look to change our defaults in Octavia to mitigate that issue.

Comment 3 Ryan O'Hara 2020-06-16 15:59:29 UTC

I've not yet seen anything on the upstream mailing list. If there is upstream conversation about this, it is worth noting here so I can follow along.

Comment 4 Gregory Thiemonge 2020-06-29 12:59:41 UTC

Hi Ryan,

I haven't reproduced it outside of my OpenStack Octavia environment, I tried to use similar configuration files, but the former workers exited within 1 or 2 seconds.
Feel free to close the BZ (we are working on a work-around on our side), I'd re-open it if we can provide more information about it.

Thanks

Comment 5 Ryan O'Hara 2020-06-30 13:27:08 UTC

(In reply to Gregory Thiemonge from comment #0)
> 
> 3. Config file doesn't contain any backend server, we don't generate any
> network traffic on the frontend.

The config file looks like it has a backend. If there are truly no connections, it should cleanup previous workers quickly. This is a reload (SIGUSR2), so if I recall correctly, haproxy will stop accepting connections on the frontend but will service existing sessions until they are closed or timeout. That would cause the workers to stick around after reload, but if there are no active sessions I don't know why this would happen.

You could use the stats socket to dump state of haproxy after the reload. It might give insight as to why haproxy is keeping the old workers aroung for so long. Also, the warnings about the FD limit are concerning. I've not seen that before. Any reason to think that is playing a role here?

Note You need to log in before you can comment on or make changes to this bug.