Bug 1713380
| Summary: | NetworkManager must DHCPDECLINE when ACD/duplicate address detection fails | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Pavel Zhukov <pzhukov> |
| Component: | NetworkManager | Assignee: | Thomas Haller <thaller> |
| Status: | CLOSED ERRATA | QA Contact: | David Jaša <djasa> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.0 | CC: | bgalvani, ferferna, fge, lrintel, pasik, rkhan, sfaye, sukulkar, thaller, till, vbenes |
| Target Milestone: | rc | Keywords: | Reopened, Triaged |
| Target Release: | 8.7 | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | NetworkManager-1.39.6-1.el8 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-11-08 10:07:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1868254 | ||
| Bug Blocks: | |||
|
Description
Pavel Zhukov
2019-05-23 13:58:32 UTC
Actually having ip address conflict detection is clear RFC violation as well. https://tools.ietf.org/html/rfc5227#section-2.1 Before beginning to use an IPv4 address (whether received from manual configuration, DHCP, or some other means), a host implementing this specification MUST test to see if the address is already in use, by broadcasting ARP Probe packets. (In reply to Pavel Zhukov from comment #1) > Actually having ip address conflict detection is clear RFC violation as well. I meant "having ip address conflict detection disabled" > NM assigns ip address without checking for the conflict in default > configuration (dad-timeout = -1) which violates [2] (however is says > SHOULD so it not strong requirement) RFC 5227 actually says: Before beginning to use an IPv4 address (whether received from manual configuration, DHCP, or some other means), a host implementing this specification MUST test to see if the address is already in use, by broadcasting ARP Probe packets. but I think it is fine if NM decides to not implement the specification. The major problem of duplicate address detection is that it adds a delay during activation. The RFC specifies very high intervals so that the full process can take up to 9 seconds. These values don't make sense today. Even if we use a much shorter timeout, it would be in addition to other existing delays (for example Wi-fi authentication, DHCP), and that would be bad for the user experience. > NM failed to activate connection if dad-timeout > 0 and address > conflict detected. DHCDECLINE message is NOT sent. IP address is NOT > assigned. It clear violation of [1] On RHEL 8.2 NetworkManager sends a DHCP decline message when dad-timeout > 0 and the address received via DHCP is duplicate. This works only when using the internal DHCP client (which is the default on RHEL 8.2). I agree with comment 3. We are probably not enabling this by default. You can configure it yourself by setting ipv4.dad-timeout in the connection profile. You can also enable it by default by having a file /etc/NetworkManager/conf.d/90-dad-timeout.conf with [connection-90-dad-timeout] ipv4.dad-timeout=2 If you disagree, please reopen with some new reasoning. Thank you!! (In reply to Thomas Haller from comment #6) > I agree with comment 3. > > We are probably not enabling this by default. > > You can configure it yourself by setting ipv4.dad-timeout in the connection > profile. > You can also enable it by default by having a file > /etc/NetworkManager/conf.d/90-dad-timeout.conf with > > [connection-90-dad-timeout] > ipv4.dad-timeout=2 > > > > If you disagree, please reopen with some new reasoning. Thank you!! I disagree. Description of the bug report says: "NM failed to activate connection if dad-timeout > 0 and address conflict detected. DHCDECLINE message is NOT sent. IP address is NOT assigned. It clear violation of [1]" Makes sense. I updated the title. there is ongoing work in NetworkManager (named "l3cfg"), which reworks ACD handling and would thereby fix this. Hi Beniamino, Please check whether recent l3cfg patch fixed this issue or not. Thanks! > Please check whether recent l3cfg patch fixed this issue or not.
From what I see, it doesn't.
Notes for testing:
1) we have many DHCP tests. To also cover the case where ACD is enabled for DHCPv4 without having a conflict, set `ipv4.dad-timeout` for a few tests. There should be no change -- except that DHCP would take a bit longer for DAD to complete.
2) (optional) in addition to 1), have at least one tests that sets `ipv4.dad-timeout` and somehow verify that NetworkManager actually did ACD. It's not entirely clear how to do that. We could parse the logfile, but that is ugly and not stable. We could also check for arping packets on the interface (tcpdump). Or maybe have another python service running that listens whether such arping packets get received.
3) add tests that enable "ipv4.dad-timeout" where a conflict exists.
- NetworkManager is supposed to perform ACD and detect the conflict.
- it will log a message about the conflict
- it will decline the lease with the used addresss
- it will continue trying to get another lease.
- if it can get another lease with an address with no conflict, that lease will be used/accepted and all is good
- if it can get another lease but that address has a conflict, it continues as before.
- if it cannot get another lease, after ipv4.dhcp-timeout the DHCP state fails... which -- depending on
settings like `ipv4.may-fail=no` results in an entire failure of the activation/connection.
- if we didn't fail the connection, NetworkManager continues trying to get a DHCP lease indefinitely. It will keep doing ACD
and decline leases with conflict. If a good lease can be obtained, the DHCP failure state can resolve.
4) this is implemented both for dhclient and internal client. See bug 2091891 for how to add cover both plugins. Have tests from 1), 2), and 3) marked to run with both plugins.
ok, before NetworkManager-1.36 (in rhel-8.5 and older), ACD for DHCPv4 was working... (if you enabled `ipv4.dad-timeout`). at least that: - we would have done ACD and not used the lease. - for "internal" plugin we would have sent the DECLINE. At least in theory, there is a bug [1] which I think would have rendered that broken (dunno). - for "dhclient" plugin we would not have sent DECLINE, which meant that we wouldn't get another lease because dhclient thought all was good. [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/65cfece4c5c7f2605c353c70f0973a84af4242f2 That broke in 1.36.0+, rhel-8.6+, rhel-9.0+ and we would no longer do any ACD for DHCP addresses. Apparently, our CI in this area was lacking, so we missed it. It is now fixed upstream (1.39+) by [1]. It will thus also be fixed in rhel-8.7+ and rhel-9.1+. Our CI will be improved as part of this fix, to cover this (comment 17). What now also works is that dhclient plugin will properly send a DECLINE and get a new lease. You still need to explicitly enable `ipv4.dad-timeout` to have ACD enabled. That didn't change. When a user creates a profile, usually they would not enable it, and ACD is usually not enabled. [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/3079628263f7b5fa2c403dda371283684a5307db With this, all should be working... QA is still missing. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7680 |