Bug 2059673
| Summary: | NetworkManager DHCP client does not work maybe due to received NAKs | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Nils Koenig <nkoenig> | ||||
| Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Vladimir Benes <vbenes> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 8.2 | CC: | abezhani, bgalvani, csay, ferferna, fge, fpokryvk, gvincent, lrintel, miburke, rkhan, sukulkar, till, vbenes | ||||
| Target Milestone: | rc | Keywords: | Triaged, ZStream | ||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | ppc64le | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | NetworkManager-1.36.0-4.el8 | Doc Type: | No Doc Update | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 2065187 2065188 2065191 (view as bug list) | Environment: | |||||
| Last Closed: | 2022-05-10 14:55:01 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2065187, 2065188, 2065191 | ||||||
| Attachments: |
|
||||||
|
Comment 6
Guillaume Vincent
2022-03-03 16:05:27 UTC
Created attachment 1864830 [details]
Reproducer script
Python script to simulate the scenario (NAKs received before ACK).
I prepared a python script (in attachment) to reproduce the scenario
and I have checked what happens with the internal client and with
dhclient.
The internal client fails to get a lease because it restarts the
transaction when the NAK is received after sending a REQUEST:
# nmcli connection up veth0+
Error: Connection activation failed: IP configuration could not be reserved (no available address, timeout, etc.)
2022-03-09 09:54:36.060559 | <- DHCP Discover
2022-03-09 09:54:36.084666 | -> DHCP Offer
2022-03-09 09:54:36.092373 | <- DHCP Request
2022-03-09 09:54:36.116479 | -> DHCP Nak
2022-03-09 09:54:36.229454 | -> DHCP Ack
2022-03-09 09:54:38.110549 | <- DHCP Discover
2022-03-09 09:54:38.140550 | -> DHCP Offer
2022-03-09 09:54:38.149837 | <- DHCP Request
2022-03-09 09:54:38.174495 | -> DHCP Nak
2022-03-09 09:54:38.291526 | -> DHCP Ack
2022-03-09 09:54:42.171706 | <- DHCP Discover
2022-03-09 09:54:42.196537 | -> DHCP Offer
2022-03-09 09:54:42.201158 | <- DHCP Request
2022-03-09 09:54:42.228571 | -> DHCP Nak
2022-03-09 09:54:42.351619 | -> DHCP Ack
[...]
dhclient is able to obtain the lease because it ignores NAKs when in
state REQUESTING, and only accepts NAKs when when REBOOTING (i.e. when
starting with a known lease).
# nmcli connection up veth0+
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/5
2022-03-09 09:55:39.188576 | <- DHCP Discover
2022-03-09 09:55:39.208457 | -> DHCP Offer
2022-03-09 09:55:39.218127 | <- DHCP Request
2022-03-09 09:55:39.247446 | -> DHCP Nak
2022-03-09 09:55:39.363512 | -> DHCP Ack
I also checked the source of another popular dhcp client, dhcpcd,
which also restarts the transaction when a NAK is received.
According to RFC 2131, the result of a NAK in both REBOOTING and
REQUESTING is that the client should throw away the offer/lease and
restart the state machine from INIT (section 3.1.5, section 3.2.3,
state machine in Figure 5):
If the client receives a DHCPNAK message, the client restarts the
configuration process.
If the client receives a DHCPNAK message, it cannot reuse its
remembered network address. It must instead request a new
address by restarting the configuration process, this time
using the (non-abbreviated) procedure described in section
3.1.
Furthermore, I found no indication that the client should
perform any validation on the NAK packet (e.g. on the server-id)
except for the transaction-id.
In section 3.2 ("Client-server interaction - reusing a previously
allocated network address") the specification says:
If the client's request is invalid (e.g., the client has moved
to a new subnet), servers SHOULD respond with a DHCPNAK message to
the client. Servers SHOULD NOT respond if their information is not
guaranteed to be accurate. For example, a server that identifies a
request for an expired binding that is owned by another server SHOULD
NOT respond with a DHCPNAK unless the servers are using an explicit
mechanism to maintain coherency among the servers.
This paragraph is about rebooting (i.e. when the client start with a
know address), so I'm not sure it applies also to a normal start (from
DHCP DISCOVER); however the indication is clear that other servers
should not send the NAK for leases belonging to other servers.
In conclusion, I think that currently the internal client behaves
according to RFC. Nevertheless, we could deviate from the standard to
better deal with situations like the one reported here; the fact that
dhclient does that (and dhclient is a very common DHCP client)
probably guarantees that there are no side effects. I prepared a patch
to implement the new behavior here:
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commits/bg/dhcp-nak/
What do others think?
@bgalvani Thank you very much for your comprehensive analysis and for providing a patch so quickly, much appreciated.
Looking at the state machine and reading 3.1.3 and 3.1.4 I can understand, why NetworkManager is implemented that way,
that as soon as any NAK is seen, it returns to INIT.
But my personal opinion here is, why should we care about other servers NAKs when we are in REQUESTING (we have send a DHCPOFFER to a specific server)?
Would it make sense to only respect NAKs from the server we have sent the DHCPOFFER to and ignore the others?
Could it be that the RFC is a bit unspecific in the particular case on which NAKs to respect and which not or am I overseeing something here?
In the case that we have moved to a different network segment and try to REBIND/RENEW I would expect to get a DHCPNAK from the server we have received the IP earlier.
And I think we wouldn't change the behavior here if we only change what NAKs to respect/ignore in the REQUESTING state.
I had a look at what dhclient does. If I read the code correct, it just ignores the NAK if there is no active lease (line 2303ff):
if (!client -> active) {
#if defined (DEBUG)
log_info ("DHCPNAK with no active lease.\n");
#endif
return;
}
So no fancy checking, if the NAK is from the server we've sent the DHCPOFFER to. I am surprised, that it works when we have an active lease.
In the RENEWING/REBINDING state client->active should be not NULL and the NAKs would cause an DHCPOFFER to be turned down.
Maybe it works because we go back to INIT and start over, but I am guessing here.
-------- -------
| | +-------------------------->| |<-------------------+
| INIT- | | +-------------------->| INIT | |
| REBOOT |DHCPNAK/ +---------->| |<---+ |
| |Restart| | ------- | |
-------- | DHCPNAK/ | | |
| Discard offer | -/Send DHCPDISCOVER |
-/Send DHCPREQUEST | | |
| | | DHCPACK v | |
----------- | (not accept.)/ ----------- | |
| | | Send DHCPDECLINE | | |
| REBOOTING | | | | SELECTING |<----+ |
| | | / | | |DHCPOFFER/ |
----------- | / ----------- | |Collect |
| | / | | | replies |
DHCPACK/ | / +----------------+ +-------+ |
Record lease, set| | v Select offer/ |
timers T1, T2 ------------ send DHCPREQUEST | |
| +----->| | DHCPNAK, Lease expired/ |
| | | REQUESTING | Halt network |
DHCPOFFER/ | | | |
Discard ------------ | |
| | | | ----------- |
| +--------+ DHCPACK/ | | |
| Record lease, set -----| REBINDING | |
| timers T1, T2 / | | |
| | DHCPACK/ ----------- |
| v Record lease, set ^ |
+----------------> ------- /timers T1,T2 | |
+----->| |<---+ | |
| | BOUND |<---+ | |
DHCPOFFER, DHCPACK, | | | T2 expires/ DHCPNAK/
DHCPNAK/Discard ------- | Broadcast Halt network
| | | | DHCPREQUEST |
+-------+ | DHCPACK/ | |
T1 expires/ Record lease, set | |
Send DHCPREQUEST timers T1, T2 | |
to leasing server | | |
| ---------- | |
| | |------------+ |
+->| RENEWING | |
| |----------------------------+
----------
Figure 5: State-transition diagram for DHCP clients
3. The client receives one or more DHCPOFFER messages from one or more
servers. The client may choose to wait for multiple responses.
The client chooses one server from which to request configuration
parameters, based on the configuration parameters offered in the
DHCPOFFER messages. The client broadcasts a DHCPREQUEST message
that MUST include the 'server identifier' option to indicate which
server it has selected, and that MAY include other options
specifying desired configuration values. The 'requested IP
address' option MUST be set to the value of 'yiaddr' in the
DHCPOFFER message from the server. This DHCPREQUEST message is
broadcast and relayed through DHCP/BOOTP relay agents. To help
ensure that any BOOTP relay agents forward the DHCPREQUEST message
to the same set of DHCP servers that received the original
DHCPDISCOVER message, the DHCPREQUEST message MUST use the same
value in the DHCP message header's 'secs' field and be sent to the
same IP broadcast address as the original DHCPDISCOVER message.
The client times out and retransmits the DHCPDISCOVER message if
the client receives no DHCPOFFER messages.
4. The servers receive the DHCPREQUEST broadcast from the client.
Those servers not selected by the DHCPREQUEST message use the
message as notification that the client has declined that server's
offer. The server selected in the DHCPREQUEST message commits the
binding for the client to persistent storage and responds with a
DHCPACK message containing the configuration parameters for the
requesting client. The combination of 'client identifier' or
'chaddr' and assigned network address constitute a unique
identifier for the client's lease and are used by both the client
and server to identify a lease referred to in any DHCP messages.
Any configuration parameters in the DHCPACK message SHOULD NOT
conflict with those in the earlier DHCPOFFER message to which the
client is responding. The server SHOULD NOT check the offered
network address at this point. The 'yiaddr' field in the DHCPACK
messages is filled in with the selected network address.
If the selected server is unable to satisfy the DHCPREQUEST message
(e.g., the requested network address has been allocated), the
server SHOULD respond with a DHCPNAK message.
A server MAY choose to mark addresses offered to clients in
DHCPOFFER messages as unavailable. The server SHOULD mark an
address offered to a client in a DHCPOFFER message as available if
the server receives no DHCPREQUEST message from that client.
Would someone be able to provide a scratch build for RHEL8 PPC64LE for us to test out the patch Beniamino has created? https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commits/bg/dhcp-nak/ I gave up after libndp-devel is not available for RHEL8 on PPC64LE. > Would it make sense to only respect NAKs from the server we have sent the DHCPOFFER to and ignore the others? Yes, on a second thought, that seems the best solution to me. I updated the patch and opened a merge request: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1144 I can confirm that the second patch https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1144 solves our issue. NMCI test added: https://gitlab.freedesktop.org/NetworkManager/NetworkManager-ci/-/merge_requests/997 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:1985 |