Bug 1005814 - dhclient IPv6 binds to all eth interfaces, instead to just the specified one.
dhclient IPv6 binds to all eth interfaces, instead to just the specified one.
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: dhcp (Show other bugs)
7.0
Unspecified Unspecified
unspecified Severity high
: rc
: ---
Assigned To: Pavel Šimerda (pavlix)
Release Test Team
:
Depends On: 1001742
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-09 08:40 EDT by Pavel Šimerda (pavlix)
Modified: 2014-06-17 20:38 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When running dhclient for IPv6 (-6 option) with a network interface name specified (on command line), it listens on all network interfaces instead of the specified one. Consequence: When multiple such instances of dhclient are running, one of them receives all the replies from the DHCP server(s). The other instances fail to communicate with the Fix: dhclient now binds to the specified interface. Result: When multiple such instances are running, each of them receives replies coming through the specified interface and communicates correctly.
Story Points: ---
Clone Of: 1001742
Environment:
Last Closed: 2014-06-13 08:27:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
A proof of concept patch. (742 bytes, patch)
2013-09-16 11:01 EDT, Pavel Šimerda (pavlix)
no flags Details | Diff
dhclient: Bind to IPv6 link-local address. (654 bytes, patch)
2013-09-18 12:56 EDT, Pavel Šimerda (pavlix)
no flags Details | Diff

  None (edit)
Comment 5 Pavel Šimerda (pavlix) 2013-09-16 11:01:28 EDT
Created attachment 798308 [details]
A proof of concept patch.

I also confirmed using 'netstat' that dhclient on RHEL7 still binds to wildcard. I ported Jiří Popelka's patch from bug #1001742 to RHEL7 (and the current upstream).

Unfortunately, the DHCP client will now refuse to start:

Can't set SO_REUSEPORT option on dhcp socket: Protocol not available

The question is, why do we need SO_REUSEPORT in the first place. After all we never need more than one dhclient on one IPv6 link-local address.
Comment 6 Pavel Šimerda (pavlix) 2013-09-16 14:58:16 EDT
Regarding the need for SO_REUSEPORT, using the following summary:

http://stackoverflow.com/questions/14388706/socket-options-so-reuseaddr-and-so-reuseport-how-do-they-differ-do-they-mean-t

It looks like for UDP on Linux, SO_REUSEADDR is just an alias for SO_REUSEPORT and that SO_REUSEPORT is only useful when we need to bind() multiple times to the same address. We do not expect to use dhclient twice for one address on one interface, though. Therefore there are two options (adding dcbw to Cc in case he wants to answer):

1) We expect to see two identical link-local addresses set for different interfaces (like fe80::2 on two ppp links or something like that). Therefore we need SO_REUSEPORT (or equivalent SO_REUSEADDR) and we need to explicitly limit the socket to a specific interface.

2) We don't expect to see two identical link-local addresses (?) and therefore we don't need SO_REUSEADDR/SO_REUSEPORT for dhclient and it is enough to bind to specific addresses.

This dichotomy is not only important for SO_REUSEPORT usage but also for binding to interfaces (instead of or as well as addresses).

(In reply to Pavel Šimerda from comment #5)
> Unfortunately, the DHCP client will now refuse to start:
> 
> Can't set SO_REUSEPORT option on dhcp socket: Protocol not available

This is clearly because I'm running RHEL7 with kernel 3.7 and SO_REUSEPORT requires kernel 3.9 or newer. Curiously, I can't reboot the kernel to 3.10 as I end up in emergency mode and I can't choose the older kernel now as virt-manager (F18) crashes. I'll add more information when I retest it with the recent kernel.
Comment 7 Jiri Popelka 2013-09-17 07:40:25 EDT
(In reply to Pavel Šimerda from comment #5)
> Can't set SO_REUSEPORT option on dhcp socket: Protocol not available

Using of SO_REUSEPORT is conditioned by
#if defined(SO_REUSEPORT)
so is it possible that you had built it with 
kernel >= 3.9 and then run with kernel < 3.9 ?

(In reply to Pavel Šimerda from comment #6)
> It looks like for UDP on Linux, SO_REUSEADDR is just an alias for
> SO_REUSEPORT and that SO_REUSEPORT is only useful when we need to bind()
> multiple times to the same address. 

It's also useful when there are some unsent data from previous connection so we need it to not fail to start if we're being restarted.
 
> 1) ... and we need to explicitly limit the socket to a specific interface.

there seems to be no such option for AF_INET6, see socket(7):
"SO_BINDTODEVICE works only for some socket types, particularly AF_INET sockets."
Comment 8 Pavel Šimerda (pavlix) 2013-09-17 08:47:39 EDT
(In reply to Jiri Popelka from comment #7)
> (In reply to Pavel Šimerda from comment #5)
> > Can't set SO_REUSEPORT option on dhcp socket: Protocol not available
> 
> Using of SO_REUSEPORT is conditioned by
> #if defined(SO_REUSEPORT)
> so is it possible that you had built it with 
> kernel >= 3.9 and then run with kernel < 3.9 ?

Yes, as stated in comment #6.

> (In reply to Pavel Šimerda from comment #6)
> > It looks like for UDP on Linux, SO_REUSEADDR is just an alias for
> > SO_REUSEPORT and that SO_REUSEPORT is only useful when we need to bind()
> > multiple times to the same address. 
> 
> It's also useful when there are some unsent data from previous connection so
> we need it to not fail to start if we're being restarted.

According to my source of information, that applies to SO_REUSEADDR on TCP, not SO_REUSEPORT and not UDP:

http://stackoverflow.com/questions/14388706/socket-options-so-reuseaddr-and-so-reuseport-how-do-they-differ-do-they-mean-t

Any relevant source to challenge it?

> > 1) ... and we need to explicitly limit the socket to a specific interface.
> 
> there seems to be no such option for AF_INET6, see socket(7):
> "SO_BINDTODEVICE works only for some socket types, particularly AF_INET
> sockets."

Did you actually test it? It might be a documentation relic from pre-AF_INET6 times. A quick test with python seems to refute it:

>>> from socket import *
>>> s = socket(AF_INET6, SOCK_DGRAM, SOL_UDP)
>>> s.setsockopt(SOL_SOCKET, SO_BINDTODEVICE, "eth0")
Comment 9 Jiri Popelka 2013-09-17 08:52:48 EDT
(In reply to Pavel Šimerda from comment #8)
> Did you actually test it?

No
Comment 10 Jiri Popelka 2013-09-17 10:38:38 EDT
(In reply to Pavel Šimerda from comment #8)
> It might be a documentation relic from pre-AF_INET6 times.

That would simplify things a lot:

/common/socket.c @@ -245,7 +245,7 @@ if_register_socket(
  #if defined(SO_BINDTODEVICE)
        /* Bind this socket to this interface. */
-       if ((local_family != AF_INET6) && (info->ifp != NULL) &&
+       if ((*do_multicast == 0) && (info->ifp != NULL) &&
Comment 11 Pavel Šimerda (pavlix) 2013-09-17 15:27:20 EDT
(In reply to Pavel Šimerda from comment #5)
> Created attachment 798308 [details]
> A proof of concept patch.

I finally fixed my RHEL7 testing installation and I'm running on kernel 3.10 now.

> I also confirmed using 'netstat' that dhclient on RHEL7 still binds to
> wildcard. I ported Jiří Popelka's patch from bug #1001742 to RHEL7 (and the
> current upstream).

Tested the patched dhclient again with the current kernel and it works correctly (binds to link-local IPv6 address).

> Unfortunately, the DHCP client will now refuse to start:
> 
> Can't set SO_REUSEPORT option on dhcp socket: Protocol not available

This is no longer the case with Linux 3.10 (should apply to all >= 3.9).

(In reply to Jiri Popelka from comment #10)
> (In reply to Pavel Šimerda from comment #8)
> > It might be a documentation relic from pre-AF_INET6 times.
> 
> That would simplify things a lot:
> 
> /common/socket.c @@ -245,7 +245,7 @@ if_register_socket(
>   #if defined(SO_BINDTODEVICE)
>         /* Bind this socket to this interface. */
> -       if ((local_family != AF_INET6) && (info->ifp != NULL) &&
> +       if ((*do_multicast == 0) && (info->ifp != NULL) &&

Sounds good. Also the SO_REUSEPORT fatal error (when dhclient compiled against kernel >= 3.9 is used with a pre-3.9 kernel) could be turned into a warning (as it is not needed in most cases). I would still be happy to have dcbw's view on duplicate addresses as I expect NetworkManager to be the most common consumer of dhclient's IPv6 support.

A short summary of what we're addressing here:

1) The need for SO_BINDTODEVICE together with SO_REUSEPORT with AF_INET6 (+1 for your solution above). And whether we should suppress fatal errors upon setsockopt() in case we use an older kernel for some reason or another.

2) The need to bound to link-local address with AF_INET6. So far it worked well in our tests (with NetworkManager) but I guess even with #1 you should still make sure you always use the link-local address for DHCP client (we can check both the standards and whether it's enforced already, +1 for your solution I ported and attached to the bug report).
Comment 12 Jiri Popelka 2013-09-18 03:06:23 EDT
whoop, a little ;-) segfault

(In reply to Jiri Popelka from comment #10)
> /common/socket.c @@ -245,7 +245,7 @@ if_register_socket(
>   #if defined(SO_BINDTODEVICE)
>         /* Bind this socket to this interface. */
> -       if ((local_family != AF_INET6) && (info->ifp != NULL) &&
> +       if ((*do_multicast == 0) && (info->ifp != NULL) &&

+	if (((do_multicast == 0)||(*do_multicast == 0)) && (info->ifp != NULL) &&
Comment 13 Pavel Šimerda (pavlix) 2013-09-18 04:14:36 EDT
(In reply to Jiri Popelka from comment #12)
> +	if (((do_multicast == 0)||(*do_multicast == 0)) && (info->ifp != NULL) &&

Wouldn't it be nicer to just use the following?

if ((!do_multicast || !*do_multicast) && info->ifp &&

It doesn't follow the original style but is IMO much more readable.
Comment 14 Jiri Popelka 2013-09-18 04:16:29 EDT
sure
Comment 15 Pavel Šimerda (pavlix) 2013-09-18 12:56:07 EDT
Created attachment 799488 [details]
dhclient: Bind to IPv6 link-local address.

This patch (modelled after jpopelka's pseudo-patch in comment #12) removes the difference between IPv4 and IPv6 DHCP client binding logic, fixing this bug as a result. Link-local address binding is *not* included, keeping the former patch for reference.
Comment 18 Dan Williams 2013-09-19 00:06:23 EDT
(In reply to Pavel Šimerda from comment #11)
> Sounds good. Also the SO_REUSEPORT fatal error (when dhclient compiled
> against kernel >= 3.9 is used with a pre-3.9 kernel) could be turned into a
> warning (as it is not needed in most cases). I would still be happy to have
> dcbw's view on duplicate addresses as I expect NetworkManager to be the most
> common consumer of dhclient's IPv6 support.

Off the top of my head I can't think of a reason that we'd expect to see the same v6LL address on two different interface *and* use DHCP on both of them independently.

However, one case I can think of is some WWAN modems; often the firmware writers never expect multiple identical devices on the system (you never do that on Windows), so the firmware might have a hardcoded MAC address.  So if you had two devices in the system at the same time, both might get the same MAC address, and thus have the same v6LL address (?).  Unless the kernel has some way to prevent that; I don't think DAD is relevent here since both interfaces are not in the same broadcast domain.

Thus I think keeping the SO_BINDTODEVICE logic is the right way to go here, instead of binding to the LL address.
Comment 19 Pavel Šimerda (pavlix) 2013-09-19 06:08:57 EDT
(In reply to Dan Williams from comment #18)
> (In reply to Pavel Šimerda from comment #11)
> > Sounds good. Also the SO_REUSEPORT fatal error (when dhclient compiled
> > against kernel >= 3.9 is used with a pre-3.9 kernel) could be turned into a
> > warning (as it is not needed in most cases). I would still be happy to have
> > dcbw's view on duplicate addresses as I expect NetworkManager to be the most
> > common consumer of dhclient's IPv6 support.
> 
> Off the top of my head I can't think of a reason that we'd expect to see the
> same v6LL address on two different interface *and* use DHCP on both of them
> independently.
> 
> However, one case I can think of is some WWAN modems; often the firmware
> writers never expect multiple identical devices on the system (you never do
> that on Windows), so the firmware might have a hardcoded MAC address.  So if
> you had two devices in the system at the same time, both might get the same
> MAC address, and thus have the same v6LL address (?).

Btw, does a PPP interface need a MAC address at all?

> Unless the kernel has
> some way to prevent that; I don't think DAD is relevent here since both
> interfaces are not in the same broadcast domain.
> 
> Thus I think keeping the SO_BINDTODEVICE logic is the right way to go here,
> instead of binding to the LL address.

Fair enough. SO_BINDTODEVICE is the solution to avoid this problem. All other problems are easy to spot (dhclient would fail instead of malfunctioning). I think I'll raise some of the other issues with upstream dhclient or NetworkManager.

Thanks for your insight.
Comment 22 Jiri Popelka 2013-10-24 05:01:12 EDT
Upstream has taken a different approach based on my first suggestion from
bug #1001742, comment #12.
http://pkgs.fedoraproject.org/cgit/dhcp.git/plain/dhcp-dhclient6-bind.patch

The reason for not using solution from comment #15:
"
=> SO_BINDTODEVICE is system specific so can't be
a critical part of the solution. BTW I can say the same about
SO_REUSEPORT as when it is implemented SO_REUSEADDR
still detects multiple clients trying to bind() to *,port...

But as we have to support BSDs and Linuxes I am afraid the
only solution is to bind() to the (?) link-local address.
"
Comment 23 Pavel Šimerda (pavlix) 2013-10-25 08:30:07 EDT
(In reply to Jiri Popelka from comment #22)
> Upstream has taken a different approach based on my first suggestion from
> bug #1001742, comment #12.
> http://pkgs.fedoraproject.org/cgit/dhcp.git/plain/dhcp-dhclient6-bind.patch
> 
> The reason for not using solution from comment #15:
> "
> => SO_BINDTODEVICE is system specific so can't be
> a critical part of the solution. BTW I can say the same about
> SO_REUSEPORT as when it is implemented SO_REUSEADDR
> still detects multiple clients trying to bind() to *,port...
> 
> But as we have to support BSDs and Linuxes I am afraid the
> only solution is to bind() to the (?) link-local address.
> "

As SO_BINDTODEVICE is so simple to use, I recommend staying with that one until as long as we are building an upstream version where the bug is not resolved.
Comment 24 Ladislav Jozsa 2014-02-11 13:09:08 EST
Verified on dhclient-4.2.5-26.el7, IPv6 address is present on both interfaces even after original lease time expiration.
Comment 25 Ludek Smid 2014-06-13 08:27:59 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.