Bug 2059673
Summary: | NetworkManager DHCP client does not work maybe due to received NAKs | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Nils Koenig <nkoenig> | ||||
Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | ||||
Status: | CLOSED ERRATA | QA Contact: | Vladimir Benes <vbenes> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 8.2 | CC: | abezhani, bgalvani, csay, ferferna, fge, fpokryvk, gvincent, lrintel, miburke, rkhan, sukulkar, till, vbenes | ||||
Target Milestone: | rc | Keywords: | Triaged, ZStream | ||||
Target Release: | --- | ||||||
Hardware: | ppc64le | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | NetworkManager-1.36.0-4.el8 | Doc Type: | No Doc Update | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 2065187 2065188 2065191 (view as bug list) | Environment: | |||||
Last Closed: | 2022-05-10 14:55:01 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 2065187, 2065188, 2065191 | ||||||
Attachments: |
|
Comment 6
Guillaume Vincent
2022-03-03 16:05:27 UTC
Created attachment 1864830 [details]
Reproducer script
Python script to simulate the scenario (NAKs received before ACK).
I prepared a python script (in attachment) to reproduce the scenario and I have checked what happens with the internal client and with dhclient. The internal client fails to get a lease because it restarts the transaction when the NAK is received after sending a REQUEST: # nmcli connection up veth0+ Error: Connection activation failed: IP configuration could not be reserved (no available address, timeout, etc.) 2022-03-09 09:54:36.060559 | <- DHCP Discover 2022-03-09 09:54:36.084666 | -> DHCP Offer 2022-03-09 09:54:36.092373 | <- DHCP Request 2022-03-09 09:54:36.116479 | -> DHCP Nak 2022-03-09 09:54:36.229454 | -> DHCP Ack 2022-03-09 09:54:38.110549 | <- DHCP Discover 2022-03-09 09:54:38.140550 | -> DHCP Offer 2022-03-09 09:54:38.149837 | <- DHCP Request 2022-03-09 09:54:38.174495 | -> DHCP Nak 2022-03-09 09:54:38.291526 | -> DHCP Ack 2022-03-09 09:54:42.171706 | <- DHCP Discover 2022-03-09 09:54:42.196537 | -> DHCP Offer 2022-03-09 09:54:42.201158 | <- DHCP Request 2022-03-09 09:54:42.228571 | -> DHCP Nak 2022-03-09 09:54:42.351619 | -> DHCP Ack [...] dhclient is able to obtain the lease because it ignores NAKs when in state REQUESTING, and only accepts NAKs when when REBOOTING (i.e. when starting with a known lease). # nmcli connection up veth0+ Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/5 2022-03-09 09:55:39.188576 | <- DHCP Discover 2022-03-09 09:55:39.208457 | -> DHCP Offer 2022-03-09 09:55:39.218127 | <- DHCP Request 2022-03-09 09:55:39.247446 | -> DHCP Nak 2022-03-09 09:55:39.363512 | -> DHCP Ack I also checked the source of another popular dhcp client, dhcpcd, which also restarts the transaction when a NAK is received. According to RFC 2131, the result of a NAK in both REBOOTING and REQUESTING is that the client should throw away the offer/lease and restart the state machine from INIT (section 3.1.5, section 3.2.3, state machine in Figure 5): If the client receives a DHCPNAK message, the client restarts the configuration process. If the client receives a DHCPNAK message, it cannot reuse its remembered network address. It must instead request a new address by restarting the configuration process, this time using the (non-abbreviated) procedure described in section 3.1. Furthermore, I found no indication that the client should perform any validation on the NAK packet (e.g. on the server-id) except for the transaction-id. In section 3.2 ("Client-server interaction - reusing a previously allocated network address") the specification says: If the client's request is invalid (e.g., the client has moved to a new subnet), servers SHOULD respond with a DHCPNAK message to the client. Servers SHOULD NOT respond if their information is not guaranteed to be accurate. For example, a server that identifies a request for an expired binding that is owned by another server SHOULD NOT respond with a DHCPNAK unless the servers are using an explicit mechanism to maintain coherency among the servers. This paragraph is about rebooting (i.e. when the client start with a know address), so I'm not sure it applies also to a normal start (from DHCP DISCOVER); however the indication is clear that other servers should not send the NAK for leases belonging to other servers. In conclusion, I think that currently the internal client behaves according to RFC. Nevertheless, we could deviate from the standard to better deal with situations like the one reported here; the fact that dhclient does that (and dhclient is a very common DHCP client) probably guarantees that there are no side effects. I prepared a patch to implement the new behavior here: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commits/bg/dhcp-nak/ What do others think? @bgalvani Thank you very much for your comprehensive analysis and for providing a patch so quickly, much appreciated. Looking at the state machine and reading 3.1.3 and 3.1.4 I can understand, why NetworkManager is implemented that way, that as soon as any NAK is seen, it returns to INIT. But my personal opinion here is, why should we care about other servers NAKs when we are in REQUESTING (we have send a DHCPOFFER to a specific server)? Would it make sense to only respect NAKs from the server we have sent the DHCPOFFER to and ignore the others? Could it be that the RFC is a bit unspecific in the particular case on which NAKs to respect and which not or am I overseeing something here? In the case that we have moved to a different network segment and try to REBIND/RENEW I would expect to get a DHCPNAK from the server we have received the IP earlier. And I think we wouldn't change the behavior here if we only change what NAKs to respect/ignore in the REQUESTING state. I had a look at what dhclient does. If I read the code correct, it just ignores the NAK if there is no active lease (line 2303ff): if (!client -> active) { #if defined (DEBUG) log_info ("DHCPNAK with no active lease.\n"); #endif return; } So no fancy checking, if the NAK is from the server we've sent the DHCPOFFER to. I am surprised, that it works when we have an active lease. In the RENEWING/REBINDING state client->active should be not NULL and the NAKs would cause an DHCPOFFER to be turned down. Maybe it works because we go back to INIT and start over, but I am guessing here. -------- ------- | | +-------------------------->| |<-------------------+ | INIT- | | +-------------------->| INIT | | | REBOOT |DHCPNAK/ +---------->| |<---+ | | |Restart| | ------- | | -------- | DHCPNAK/ | | | | Discard offer | -/Send DHCPDISCOVER | -/Send DHCPREQUEST | | | | | | DHCPACK v | | ----------- | (not accept.)/ ----------- | | | | | Send DHCPDECLINE | | | | REBOOTING | | | | SELECTING |<----+ | | | | / | | |DHCPOFFER/ | ----------- | / ----------- | |Collect | | | / | | | replies | DHCPACK/ | / +----------------+ +-------+ | Record lease, set| | v Select offer/ | timers T1, T2 ------------ send DHCPREQUEST | | | +----->| | DHCPNAK, Lease expired/ | | | | REQUESTING | Halt network | DHCPOFFER/ | | | | Discard ------------ | | | | | | ----------- | | +--------+ DHCPACK/ | | | | Record lease, set -----| REBINDING | | | timers T1, T2 / | | | | | DHCPACK/ ----------- | | v Record lease, set ^ | +----------------> ------- /timers T1,T2 | | +----->| |<---+ | | | | BOUND |<---+ | | DHCPOFFER, DHCPACK, | | | T2 expires/ DHCPNAK/ DHCPNAK/Discard ------- | Broadcast Halt network | | | | DHCPREQUEST | +-------+ | DHCPACK/ | | T1 expires/ Record lease, set | | Send DHCPREQUEST timers T1, T2 | | to leasing server | | | | ---------- | | | | |------------+ | +->| RENEWING | | | |----------------------------+ ---------- Figure 5: State-transition diagram for DHCP clients 3. The client receives one or more DHCPOFFER messages from one or more servers. The client may choose to wait for multiple responses. The client chooses one server from which to request configuration parameters, based on the configuration parameters offered in the DHCPOFFER messages. The client broadcasts a DHCPREQUEST message that MUST include the 'server identifier' option to indicate which server it has selected, and that MAY include other options specifying desired configuration values. The 'requested IP address' option MUST be set to the value of 'yiaddr' in the DHCPOFFER message from the server. This DHCPREQUEST message is broadcast and relayed through DHCP/BOOTP relay agents. To help ensure that any BOOTP relay agents forward the DHCPREQUEST message to the same set of DHCP servers that received the original DHCPDISCOVER message, the DHCPREQUEST message MUST use the same value in the DHCP message header's 'secs' field and be sent to the same IP broadcast address as the original DHCPDISCOVER message. The client times out and retransmits the DHCPDISCOVER message if the client receives no DHCPOFFER messages. 4. The servers receive the DHCPREQUEST broadcast from the client. Those servers not selected by the DHCPREQUEST message use the message as notification that the client has declined that server's offer. The server selected in the DHCPREQUEST message commits the binding for the client to persistent storage and responds with a DHCPACK message containing the configuration parameters for the requesting client. The combination of 'client identifier' or 'chaddr' and assigned network address constitute a unique identifier for the client's lease and are used by both the client and server to identify a lease referred to in any DHCP messages. Any configuration parameters in the DHCPACK message SHOULD NOT conflict with those in the earlier DHCPOFFER message to which the client is responding. The server SHOULD NOT check the offered network address at this point. The 'yiaddr' field in the DHCPACK messages is filled in with the selected network address. If the selected server is unable to satisfy the DHCPREQUEST message (e.g., the requested network address has been allocated), the server SHOULD respond with a DHCPNAK message. A server MAY choose to mark addresses offered to clients in DHCPOFFER messages as unavailable. The server SHOULD mark an address offered to a client in a DHCPOFFER message as available if the server receives no DHCPREQUEST message from that client. Would someone be able to provide a scratch build for RHEL8 PPC64LE for us to test out the patch Beniamino has created? https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commits/bg/dhcp-nak/ I gave up after libndp-devel is not available for RHEL8 on PPC64LE. > Would it make sense to only respect NAKs from the server we have sent the DHCPOFFER to and ignore the others? Yes, on a second thought, that seems the best solution to me. I updated the patch and opened a merge request: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1144 I can confirm that the second patch https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1144 solves our issue. NMCI test added: https://gitlab.freedesktop.org/NetworkManager/NetworkManager-ci/-/merge_requests/997 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:1985 |