Bug 1713380 - NetworkManager must DHCPDECLINE when ACD/duplicate address detection fails
Summary: NetworkManager must DHCPDECLINE when ACD/duplicate address detection fails
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.7
Assignee: Thomas Haller
QA Contact: David Jaša
URL:
Whiteboard:
Depends On: 1868254
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-23 13:58 UTC by Pavel Zhukov
Modified: 2022-06-29 14:02 UTC (History)
11 users (show)

Fixed In Version: NetworkManager-1.39.6-1.el8
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-18 15:20:31 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab NetworkManager NetworkManager-ci merge_requests 1101 0 None opened ipv4: test that NM sends DHCPDECLINE and fails connection on getting duplicate... 2022-06-29 14:01:14 UTC
freedesktop.org Gitlab NetworkManager NetworkManager merge_requests 1230 0 None opened [th/dhcp-acd] dhcp: implement ACD (address collision detection) for DHCPv4 2022-05-19 12:09:43 UTC

Description Pavel Zhukov 2019-05-23 13:58:32 UTC
Description of problem:
Looks like NM has ip address conflict detection is disabled by default and violates RFC5227 if enabled

Version-Release number of selected component (if applicable):
NetworkManager-1.14.0-14.el8

How reproducible:
100%

Steps to Reproduce:
1. Send DHCPOFFER packet which contains duplicate address
2. Check ip address of the machine


Actual results:
NM assigns ip address without checking for the conflict in default configuration (dad-timeout = -1) which violates [2] (however is says SHOULD so it not strong requirement)
NM failed to activate connection if dad-timeout > 0 and address conflict detected. DHCDECLINE message is NOT sent. IP address is NOT assigned. It clear violation of [1]

Expected results:
DHCDECLINE sent another IP address assigned 


Additional info:
The behavior doesn't depend on chosen dhcp implemention (both internal and dhclient behave the same way) but dhclient works fine if it's used from CLI with default dhclient-script. 

[1]
https://tools.ietf.org/html/rfc2131#section-3.1
  5. The client receives the DHCPACK message with configuration
     parameters.  The client SHOULD perform a final check on the
     parameters (e.g., ARP for allocated network address), and notes the
     duration of the lease specified in the DHCPACK message.  At this
     point, the client is configured.  If the client detects that the
     address is already in use (e.g., through the use of ARP), the
     client MUST send a DHCPDECLINE message to the server and restarts
     the configuration process.  The client SHOULD wait a minimum of ten
     seconds before restarting the configuration process to avoid
     excessive network traffic in case of looping.

[2]
https://tools.ietf.org/html/rfc5227#section-1

   o the client SHOULD probe the newly received address, e.g., with ARP

   o The client SHOULD perform a final check on the parameters
     (e.g., ARP for allocated network address)

   o If the client detects that the address is already in use
     (e.g., through the use of ARP), the client MUST send a DHCPDECLINE
     message to the server

Comment 1 Pavel Zhukov 2019-05-28 14:48:47 UTC
Actually having ip address conflict detection is clear RFC violation as well.

https://tools.ietf.org/html/rfc5227#section-2.1
  Before beginning to use an IPv4 address (whether received from manual
  configuration, DHCP, or some other means), a host implementing this
  specification MUST test to see if the address is already in use, by
  broadcasting ARP Probe packets.

Comment 2 Pavel Zhukov 2019-05-28 14:49:35 UTC
(In reply to Pavel Zhukov from comment #1)
> Actually having ip address conflict detection is clear RFC violation as well.
I meant "having ip address conflict detection disabled"

Comment 3 Beniamino Galvani 2020-03-31 15:45:52 UTC
> NM assigns ip address without checking for the conflict in default
> configuration (dad-timeout = -1) which violates [2] (however is says
> SHOULD so it not strong requirement)

RFC 5227 actually says:

   Before beginning to use an IPv4 address (whether received from manual
   configuration, DHCP, or some other means), a host implementing this
   specification MUST test to see if the address is already in use, by
   broadcasting ARP Probe packets.

but I think it is fine if NM decides to not implement the
specification.

The major problem of duplicate address detection is that it adds a
delay during activation. The RFC specifies very high intervals so that
the full process can take up to 9 seconds. These values don't make
sense today. Even if we use a much shorter timeout, it would be in
addition to other existing delays (for example Wi-fi authentication,
DHCP), and that would be bad for the user experience.

> NM failed to activate connection if dad-timeout > 0 and address
> conflict detected. DHCDECLINE message is NOT sent. IP address is NOT
> assigned. It clear violation of [1]

On RHEL 8.2 NetworkManager sends a DHCP decline message when
dad-timeout > 0 and the address received via DHCP is duplicate. This
works only when using the internal DHCP client (which is the default
on RHEL 8.2).

Comment 6 Thomas Haller 2021-01-18 15:20:31 UTC
I agree with comment 3.

We are probably not enabling this by default.

You can configure it yourself by setting ipv4.dad-timeout in the connection profile.
You can also enable it by default by having a file /etc/NetworkManager/conf.d/90-dad-timeout.conf with

  [connection-90-dad-timeout]
  ipv4.dad-timeout=2



If you disagree, please reopen with some new reasoning. Thank you!!

Comment 7 Pavel Zhukov 2021-01-18 15:52:05 UTC
(In reply to Thomas Haller from comment #6)
> I agree with comment 3.
> 
> We are probably not enabling this by default.
> 
> You can configure it yourself by setting ipv4.dad-timeout in the connection
> profile.
> You can also enable it by default by having a file
> /etc/NetworkManager/conf.d/90-dad-timeout.conf with
> 
>   [connection-90-dad-timeout]
>   ipv4.dad-timeout=2
> 
> 
> 
> If you disagree, please reopen with some new reasoning. Thank you!!
I disagree. Description of the bug report says:

"NM failed to activate connection if dad-timeout > 0 and address conflict detected. DHCDECLINE message is NOT sent. IP address is NOT assigned. It clear violation of [1]"

Comment 9 Thomas Haller 2021-01-19 16:31:11 UTC
Makes sense.

I updated the title.


there is ongoing work in NetworkManager (named "l3cfg"), which reworks ACD handling and would thereby fix this.

Comment 13 Gris Ge 2022-01-18 07:30:48 UTC
Hi Beniamino,

Please check whether recent l3cfg patch fixed this issue or not.
Thanks!

Comment 14 Beniamino Galvani 2022-01-18 08:01:24 UTC
> Please check whether recent l3cfg patch fixed this issue or not.

From what I see, it doesn't.

Comment 17 Thomas Haller 2022-05-31 09:31:16 UTC
Notes for testing:


1) we have many DHCP tests. To also cover the case where ACD is enabled for DHCPv4 without having a conflict, set `ipv4.dad-timeout` for a few tests. There should be no change -- except that DHCP would take a bit longer for DAD to complete.

2) (optional) in addition to 1), have at least one tests that sets `ipv4.dad-timeout` and somehow verify that NetworkManager actually did ACD. It's not entirely clear how to do that. We could parse the logfile, but that is ugly and not stable. We could also check for arping packets on the interface (tcpdump). Or maybe have another python service running that listens whether such arping packets get received.

3) add tests that enable "ipv4.dad-timeout" where a conflict exists.

  - NetworkManager is supposed to perform ACD and detect the conflict.
  - it will log a message about the conflict
  - it will decline the lease with the used addresss
  - it will continue trying to get another lease.
    - if it can get another lease with an address with no conflict, that lease will be used/accepted and all is good
    - if it can get another lease but that address has a conflict, it continues as before.
    - if it cannot get another lease, after ipv4.dhcp-timeout the DHCP state fails... which -- depending on
      settings like `ipv4.may-fail=no` results in an entire failure of the activation/connection.
      - if we didn't fail the connection, NetworkManager continues trying to get a DHCP lease indefinitely. It will keep doing ACD 
        and decline leases with conflict. If a good lease can be obtained, the DHCP failure state can resolve.

4) this is implemented both for dhclient and internal client. See bug 2091891 for how to add cover both plugins. Have tests from 1), 2), and 3) marked to run with both plugins.

Comment 18 Thomas Haller 2022-06-01 08:47:57 UTC
ok, before NetworkManager-1.36 (in rhel-8.5 and older), ACD for DHCPv4 was working... (if you enabled `ipv4.dad-timeout`).
at least that:

   - we would have done ACD and not used the lease.
   - for "internal" plugin we would have sent the DECLINE. At least in theory, there is a bug [1] which I think would have rendered that broken (dunno).
   - for "dhclient" plugin we would not have sent DECLINE, which meant that we wouldn't get another lease because dhclient thought all was good.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/65cfece4c5c7f2605c353c70f0973a84af4242f2

That broke in 1.36.0+, rhel-8.6+, rhel-9.0+ and we would no longer do any ACD for DHCP addresses.
Apparently, our CI in this area was lacking, so we missed it.


It is now fixed upstream (1.39+) by [1]. It will thus also be fixed in rhel-8.7+ and rhel-9.1+.
Our CI will be improved as part of this fix, to cover this (comment 17).
What now also works is that dhclient plugin will properly send a DECLINE and get a new lease.

You still need to explicitly enable `ipv4.dad-timeout` to have ACD enabled. That didn't change. When a user creates a profile, usually they would not enable it, and ACD is usually not enabled.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/3079628263f7b5fa2c403dda371283684a5307db


With this, all should be working... QA is still missing.


Note You need to log in before you can comment on or make changes to this bug.