RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1756714 - HAProxy fails to start on boot
Summary: HAProxy fails to start on boot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: haproxy
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 8.0
Assignee: Ryan O'Hara
QA Contact: Brandon Perkins
URL: https://projects.engineering.redhat.c...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-29 10:03 UTC by Curaden AG
Modified: 2020-11-04 04:06 UTC (History)
0 users

Fixed In Version: haproxy-1.8.23-4.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-04 04:06:08 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4815 0 None None None 2020-11-04 04:06:11 UTC

Description Curaden AG 2019-09-29 10:03:01 UTC
Description of problem:

We run HAProxy with several (10-20) static secondary IP addresses on the network interface. We bind HAProxy fronetned to each secondary IP address. On boot, HAProxy fails to start with messages like this (ca. one for each frontend):

Sep 29 11:22:36 host.example.com haproxy[807]: [ALERT] 271/112236 (807) : Starting frontend some-frontend: cannot bind socket [A.B.C.D:443]

Likely reason for this is that systemd attempts to start HAProxy before the network is fully configured. The HAProxy unit file has following requirement, which, however, seems not to work properly in RHEL8:

[Unit]
After=network.target

Version-Release number of selected component (if applicable):

haproxy-1.8.15-5.el8.x86_64

How reproducible:

Every time.

Steps to Reproduce:
1. Install HAProxy on RHEL8. Enable it on boot.
2. Add several secondary IP addresses to the network interface.
3. Configure HAProxy frontends to bind to these secondary IP addresses.
4. Reboot the machine.

Actual results:

HAProxy fails to start with meesages like this:

Sep 29 11:22:36 host.example.com haproxy[807]: [ALERT] 271/112236 (807) : Starting frontend some-frontend: cannot bind socket [A.B.C.D:443]

Expected results:

Normal start-up

Additional info:

The same mainline of HAProxy, 1.8, from SCL (rh-haproxy18-haproxy-1.8.17-1.el7.x86_64) with the same configuration (multiple static secondary IP addresses) and same "After=network.target" requirement in the unit file works correctly in RHEL7 (i.e. no failures on boot). 

A known workaround is to extend the unit file with configuration like this:

[Service]
Restart=on-failure
RestartSec=30

It will restart HAProxy ~30 seconds after its failure, by when the secondary IP addresses will be available and the start-up of HAProxy will be successful. 

It might also be the case that in RHEL8 NetworkManager handles the secondary IP addresses differently from what it did in RHEL7, hence systemd believes the network is up and tries to start HAProxy while, in fact, it is not yet.

Comment 1 Ryan O'Hara 2019-09-30 16:06:22 UTC
Have you contacted support? Are you sure the SELinux isn't causing AVC denials?

Comment 3 Curaden AG 2019-11-02 23:31:42 UTC
RHEL support is not exactly super helpful or efficient to contact them, pus this is clearly an issue on RHEL side.

And I'm absolutely sure SELinux is disabled (even if it was not, HAProxy is part of the RHEL system so you should have ensured it complies with SELinux - but, as I said, SELinux is disabled). Did not you read my ticket which explains that systemd starts the service BEFORE the network is comletely up?!

I just spotted on another RHEL-8 machine the same behaviour with sshd when ListenAddress is enabled in config (not 0.0.0.0, but a specific address on one of its interfaces). Seems there is a general issue with services that bind to an address. I'd suggest you look at this more closely.

Comment 4 Ryan O'Hara 2019-11-06 18:13:52 UTC
(In reply to Curaden AG from comment #3)
> RHEL support is not exactly super helpful or efficient to contact them, pus
> this is clearly an issue on RHEL side.
> 
> And I'm absolutely sure SELinux is disabled (even if it was not, HAProxy is
> part of the RHEL system so you should have ensured it complies with SELinux
> - but, as I said, SELinux is disabled). Did not you read my ticket which
> explains that systemd starts the service BEFORE the network is comletely up?!

HAProxy does comply with SELinux policy. By default it only allows haproxy to bind to specific ports. That is why I asked -- to see if perhaps you were running into this common problem.

I read the bug report. There are options for binding to non-existent IP addresses. See the 'transparent' bind option in the official documentation.

> I just spotted on another RHEL-8 machine the same behaviour with sshd when
> ListenAddress is enabled in config (not 0.0.0.0, but a specific address on
> one of its interfaces). Seems there is a general issue with services that
> bind to an address. I'd suggest you look at this more closely.

We are looking at it, but adding restart capabilities to the systemd unit file is not widely appealing. We believe the correct solution is to change the systemd dependency from 'network.target' to 'network-online.target'.

Comment 5 Ryan O'Hara 2019-11-25 15:34:30 UTC
I did some research on this and found some interesting information about differences between 'network.target' and 'network-online.target' dependencies. Aside from the obvious differences, there is good information here [1]. Note that it warns of moving to dependency on 'network-online.target' due to potentially slower boot times, etc. At the bottom, it suggests using IP_FREEBIND as a means to bind to addresses that are not yet configured. From the link:

"If you write a server: if you want to listen on other, explicitly configured addresses, consider using the IP_FREEBIND sockopt functionality of the Linux kernel. This allows your code to bind to an address even if it is not actually (yet or ever) configured locally. This also makes your code robust towards network configuration changes."

This is exactly what the 'transparent' option for 'bind' does. This I am inclined to say that we do not want to change to "After=network-online.target" as discussed.

What I find most strange is that, as reported in the original bug report, haproxy-1.8 from RHSCL on RHEL7 does *not* experience this problem yet haproxy-1.8 on RHEL8 does. I have not yet been able to pinpoint a reason for this behavior. Is it possible that RHEL7 was faster at getting the network configured, ie. before haproxy tried to bind, thus avoiding any problem? I'm still investigating.

In the meantime, could the reporter try using 'transparent' option and reporting back if this is a valid workaround? Much appreciated.

[1] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

Comment 6 Curaden AG 2019-11-27 09:37:27 UTC
From what I see, "bind transparent" is deprecated in HAProxy in favour of the global "option transparent". However, it is always mentioned as a way of doing a transparent proxy and nothing is mentioned about the ability to bind to a non-existent IP address. 

I also don't see anything similar as an option for SSHd, which is equally affected by this issue.

In addition, personally I am quite happy to once extend the unit file for HAProxy with network-online.target - whether a server boot takes a second or two longer once a week is irrelevant to me (for servers we always use static IP addresses) - what matters is convenience and reliability; hence I have no incentive to experiment with "option transparent" and I'll stick with my current solution.

Comment 9 Ryan O'Hara 2020-01-02 17:16:32 UTC
Closing this as NOTABUG for the following reasons:

1. The "option transparent" in haproxy.cfg will allow binding to specific IP address regardless if that address exists on the machine.

2. The sysctl booleans "net.ipv4.ip_nonlocal_bind" and "net.ipv6.ip_nonlocal_bind", when set to "1", will also allow binding to nonexistent addresses. Note that this is system-wide.

3. Based on what I've read in the systemd documentation and explained in comment #5, I don't think we want to depend on "network-online.target".

Comment 10 Curaden AG 2020-05-05 15:21:45 UTC
OK, your position understood.

Comment 11 Ryan O'Hara 2020-06-09 15:51:56 UTC
I'm reopening this bug since I got another inquiry about this.

In the case of binding to a non-existent IP address we can use "option transparent" as a workaround, as stated above. But there is another situation where network-online is useful -- when using a resolver to resolve server names via DNS. If the network is not online, DNS queries will not work. Read more about haproxy and DNS resolution here [1].

I'm leaning towards changing the systemd service file to include:

After=network-online.target
Wants=network-online.target

The risk here is that it could cause a substantially delay boot time. Read more about that here [2]. But there are some other commonly used services that use this same model, so it is not unprecedented.

[1] http://cbonte.github.io/haproxy-dconv/1.8/configuration.html#5.3
[2] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

Comment 18 errata-xmlrpc 2020-11-04 04:06:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (haproxy bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4815


Note You need to log in before you can comment on or make changes to this bug.