Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
DescriptionGregory Thiemonge
2020-06-09 12:43:12 UTC
Description of problem:
Originally reported for the Red Hat OpenStack Platform Octavia component (BZ 1845406):
haproxy 1.8.23 consumes twice as much memory for more than 2 minutes when reloading a configuration that uses stick tables.
Version-Release number of selected component (if applicable):
1.8.23-3.el8 RHEL8.2
How reproducible:
100%
Steps to Reproduce:
1. we have a configuration file that combines "peers", "stick-table":
global
daemon
user nobody
log /run/rsyslog/octavia/log local0
log /run/rsyslog/octavia/log local1 notice
stats socket /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.sock mode 0666 level user
maxconn 1000000
defaults
log global
retries 3
option redispatch
option splice-request
option splice-response
option http-keep-alive
peers cbbef8ef65504a559c2f19f1a63bdbdd_peers
peer G4SXZTP8batgIQ4iAoBGkWJMeeg 10.0.0.28:1025
peer eTOcOhwbysoktiQ1CxhjxbKu2BU 10.0.0.54:1025
frontend 73f33f7d-b8b7-4c79-bb92-de2e97ba08f6
maxconn 1000000
bind 10.0.0.11:80
mode http
default_backend a69bee16-7b23-44b9-bd63-602a108cf477:73f33f7d-b8b7-4c79-bb92-de2e97ba08f6
timeout client 50000
log-format ec4fc6e9845d44ada460fab5c985adf9\ cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd\ %f\ %ci\ %cp\ %t\ %{+Q}r\ %ST\ %B\ %U\ %[ssl_c_verify]\ %{+Q}[ssl_c_s_dn]\ %b\ %s\ %Tt\ %tsc
backend a69bee16-7b23-44b9-bd63-602a108cf477:73f33f7d-b8b7-4c79-bb92-de2e97ba08f6
mode http
http-reuse safe
balance roundrobin
stick-table type string len 64 size 10k peers cbbef8ef65504a559c2f19f1a63bdbdd_peers
stick store-response res.cook(my_cookie)
stick match req.cook(my_cookie)
fullconn 1000000
option allbackups
timeout connect 5000
timeout server 50000
2. when haproxy is running it consumes ~150MB of memory, in a 1GB VM
[root@amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1 cloud-user]# ps axu | grep haproxy
root 5760 0.2 1.0 80712 8744 ? Ss 03:58 0:00 /usr/sbin/haproxy -Ws -f /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -p /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.pid -L G4SXZTP8batgIQ4iAoBGkWJMeeg -sf 6130
nobody 6199 0.0 18.9 295568 158812 ? Ss 04:01 0:00 /usr/sbin/haproxy -Ws -f /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -p /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.pid -L G4SXZTP8batgIQ4iAoBGkWJMeeg -sf 6130
root 6223 0.0 0.1 12108 1060 pts/1 S+ 04:02 0:00 grep --color=auto haproxy
3. when reloading haproxy (it is reloaded by Octavia after any configuration changes), a new worker is created, but the previous worker is cleaned up after more than 2 minutes.
[root@amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1 cloud-user]# systemctl reload haproxy-cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.service
Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal systemd[1]: Reloading HAProxy Load Balancer.
Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal haproxy[6277]: Configuration file is valid
Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040152 (5760) : Reexecuting Master process
Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040340 (5760) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 2500040, limit is 2097152.
Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040340 (5760) : [/usr/sbin/haproxy.main()] FD limit (2097152) too low for maxconn=1000000/maxsock=2500040. Please raise 'ulimit-n' to 2500040 or more to avoid any trouble.
Jun 09 04:03:40 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal systemd[1]: Reloaded HAProxy Load Balancer.
[root@amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1 cloud-user]# ps axu | grep haproxy
root 5760 0.1 1.0 80712 8584 ? Ss 03:58 0:00 /usr/sbin/haproxy -Ws -f /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -p /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.pid -L G4SXZTP8batgIQ4iAoBGkWJMeeg -sf 6199
nobody 6199 0.0 18.9 295568 158812 ? Ss 04:01 0:00 /usr/sbin/haproxy -Ws -f /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -p /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.pid -L G4SXZTP8batgIQ4iAoBGkWJMeeg -sf 6130
nobody 6279 0.0 18.9 295568 158808 ? Ss 04:03 0:00 /usr/sbin/haproxy -Ws -f /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -p /var/lib/octavia/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd/cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.pid -L G4SXZTP8batgIQ4iAoBGkWJMeeg -sf 6199
root 6281 0.0 0.1 12108 1112 pts/1 S+ 04:03 0:00 grep --color=auto haproxy
Actual results:
During that period, any new configuration change creates a new worker that triggers some memory allocation issues:
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal systemd[1]: Reloading HAProxy Load Balancer.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal haproxy[6335]: Configuration file is valid
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040340 (5760) : Reexecuting Master process
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040503 (5760) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 2500040, limit is 2097152.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040503 (5760) : [/usr/sbin/haproxy.main()] FD limit (2097152) too low for maxconn=1000000/maxsock=2500040. Please raise 'ulimit-n' to 2500040 or more to avoid any trouble.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [ALERT] 160/040503 (5760) : [/usr/sbin/haproxy.main()] Cannot fork.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040503 (5760) : Reexecuting Master process in waitpid mode
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [WARNING] 160/040503 (5760) : Reexecuting Master process
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: Usage : haproxy [-f <cfgfile|cfgdir>]* [ -vdVD ] [ -n <maxconn> ] [ -N <maxpconn> ]
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: [ -p <pidfile> ] [ -m <max megs> ] [ -C <dir> ] [-- <cfgfile>*]
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -v displays version ; -vv shows known build options.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -d enters debug mode ; -db only disables background mode.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -dM[<byte>] poisons memory with <byte> (defaults to 0x50)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -V enters verbose mode (disables quiet mode)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -D goes daemon ; -C changes to <dir> before loading files.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -W master-worker mode.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -Ws master-worker mode with systemd notify support.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -q quiet mode : don't display messages
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -c check mode : only check config files and exit
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -n sets the maximum total # of connections (2000)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -m limits the usable amount of memory (in MB)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -N sets the default, per-proxy maximum # of connections (2000)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -L set local peer name (default to hostname)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -p writes pids of all children to this file
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -de disables epoll() usage even when available
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -dp disables poll() usage even when available
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -dS disables splice usage (broken on old kernels)
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -dG disables getaddrinfo() usage
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -dR disables SO_REUSEPORT usage
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -dr ignores server address resolution failures
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -dV disables SSL verify on servers side
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -sf/-st [pid ]* finishes/terminates old pids.
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: -x <unix_socket> get listening sockets from a unix socket
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: HA-Proxy version 1.8.23 2019/11/25
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal ip[5760]: Copyright 2000-2019 Willy Tarreau <willy>
Jun 09 04:05:03 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal systemd[1]: haproxy-cbbef8ef-6550-4a55-9c2f-19f1a63bdbdd.service: Main process exited, code=exited, status=1/FAILURE
Jun 09 04:05:08 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal Keepalived_vrrp[4697]: Script `check_script` now returning 1
Jun 09 04:05:13 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal Keepalived_vrrp[4697]: VRRP_Script(check_script) failed (exited with status 1)
Jun 09 04:05:13 amphora-de6c35a7-40b9-4c8d-8722-c663a0c15de1.novalocal Keepalived_vrrp[4697]: (cbbef8ef65504a559c2f19f1a63bdbdd) Entering FAULT STATE
Expected results:
The previous worker should be cleaned up earlier, allowing us to reload haproxy for new configuration changes as soon as the new worker is initialized.
Additional info:
1. Logs that show that the former worker is destroyed after 2min
Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal systemd[1]: Reloading HAProxy Load Balancer.
Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal haproxy[7638]: Configuration file is valid
Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal ip[7213]: [WARNING] 160/043210 (7213) : Reexecuting Master process
Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal ip[7213]: [WARNING] 160/043248 (7213) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 2500040, limit is 2534.
Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal ip[7213]: [WARNING] 160/043248 (7213) : [/usr/sbin/haproxy.main()] FD limit (2534) too low for maxconn=1000000/maxsock=2500040. Please raise 'ulimit-n' to 2500040 or more to avoid any trouble.
Jun 09 04:32:48 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal systemd[1]: Reloaded HAProxy Load Balancer.
Jun 09 04:34:59 amphora-f66f2cca-4962-4608-8620-acebace3ef21.novalocal ip[7213]: [WARNING] 160/043248 (7213) : Former worker 7605 exited with code 0
2. The issue is not reproducible without using stick-tables.
3. Config file doesn't contain any backend server, we don't generate any network traffic on the frontend.
Comment 2Gregory Thiemonge
2020-06-10 17:01:37 UTC
(In reply to Ryan O'Hara from comment #1)
> Did you report this issue upstream?
No, I didn't.
I tested with haproxy 2.1.7 and I ran into a similar issue: the worker was destroyed after 8min.
It looks like a normal behavior but it's a problem for us (OpenStack Octavia) because we use a 1GB VM and 1000000 as default value for maxconn, so haproxy consumes most of the memory.
I'll check if I can find a simple reproducer for 2.1.x and I'll report it upstream.
And we'll have a look to change our defaults in Octavia to mitigate that issue.
I've not yet seen anything on the upstream mailing list. If there is upstream conversation about this, it is worth noting here so I can follow along.
Comment 4Gregory Thiemonge
2020-06-29 12:59:41 UTC
Hi Ryan,
I haven't reproduced it outside of my OpenStack Octavia environment, I tried to use similar configuration files, but the former workers exited within 1 or 2 seconds.
Feel free to close the BZ (we are working on a work-around on our side), I'd re-open it if we can provide more information about it.
Thanks
(In reply to Gregory Thiemonge from comment #0)
>
> 3. Config file doesn't contain any backend server, we don't generate any
> network traffic on the frontend.
The config file looks like it has a backend. If there are truly no connections, it should cleanup previous workers quickly. This is a reload (SIGUSR2), so if I recall correctly, haproxy will stop accepting connections on the frontend but will service existing sessions until they are closed or timeout. That would cause the workers to stick around after reload, but if there are no active sessions I don't know why this would happen.
You could use the stats socket to dump state of haproxy after the reload. It might give insight as to why haproxy is keeping the old workers aroung for so long. Also, the warnings about the FD limit are concerning. I've not seen that before. Any reason to think that is playing a role here?