RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2059673 - NetworkManager DHCP client does not work maybe due to received NAKs
Summary: NetworkManager DHCP client does not work maybe due to received NAKs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.2
Hardware: ppc64le
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Beniamino Galvani
QA Contact: Vladimir Benes
URL:
Whiteboard:
Depends On:
Blocks: 2065187 2065188 2065191
TreeView+ depends on / blocked
 
Reported: 2022-03-01 17:14 UTC by Nils Koenig
Modified: 2022-05-10 15:33 UTC (History)
13 users (show)

Fixed In Version: NetworkManager-1.36.0-4.el8
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2065187 2065188 2065191 (view as bug list)
Environment:
Last Closed: 2022-05-10 14:55:01 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Reproducer script (3.38 KB, text/x-python3)
2022-03-09 08:48 UTC, Beniamino Galvani
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-114145 0 None None None 2022-03-01 17:19:45 UTC
Red Hat Product Errata RHEA-2022:1985 0 None None None 2022-05-10 14:55:37 UTC
freedesktop.org Gitlab NetworkManager NetworkManager merge_requests 1144 0 None merged n-dhcp4: discard NAKs from different servers in SELECTING 2022-03-17 08:52:08 UTC

Comment 6 Guillaume Vincent 2022-03-03 16:05:27 UTC
Hello,

The dracut framework in the initramfs use network-manager or network-legacy modules.
On RHEL 8.2, network-legacy is the default (network-manager is available). On RHEL 8.3, dracut will use network-manager by default.

So DHCP is working correctly with network-legacy (iproute2, dhclient, and arping are used to configure interfaces). RHEL-8.2 ok.
But it's not working correctly with RHEL 8.3+

I can't answer about the priority, but the bug is "kind" of recent.

Comment 12 Beniamino Galvani 2022-03-09 08:48:00 UTC
Created attachment 1864830 [details]
Reproducer script

Python script to simulate the scenario (NAKs received before ACK).

Comment 13 Beniamino Galvani 2022-03-09 09:00:13 UTC
I prepared a python script (in attachment) to reproduce the scenario
and I have checked what happens with the internal client and with
dhclient.

The internal client fails to get a lease because it restarts the
transaction when the NAK is received after sending a REQUEST:

 # nmcli connection up veth0+
 Error: Connection activation failed: IP configuration could not be reserved (no available address, timeout, etc.)

 2022-03-09 09:54:36.060559 | <- DHCP Discover
 2022-03-09 09:54:36.084666 | -> DHCP Offer
 2022-03-09 09:54:36.092373 | <- DHCP Request
 2022-03-09 09:54:36.116479 | -> DHCP Nak
 2022-03-09 09:54:36.229454 | -> DHCP Ack

 2022-03-09 09:54:38.110549 | <- DHCP Discover
 2022-03-09 09:54:38.140550 | -> DHCP Offer
 2022-03-09 09:54:38.149837 | <- DHCP Request
 2022-03-09 09:54:38.174495 | -> DHCP Nak
 2022-03-09 09:54:38.291526 | -> DHCP Ack

 2022-03-09 09:54:42.171706 | <- DHCP Discover
 2022-03-09 09:54:42.196537 | -> DHCP Offer
 2022-03-09 09:54:42.201158 | <- DHCP Request
 2022-03-09 09:54:42.228571 | -> DHCP Nak
 2022-03-09 09:54:42.351619 | -> DHCP Ack

 [...]

dhclient is able to obtain the lease because it ignores NAKs when in
state REQUESTING, and only accepts NAKs when when REBOOTING (i.e. when
starting with a known lease).

 # nmcli connection up veth0+
 Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/5

 2022-03-09 09:55:39.188576 | <- DHCP Discover
 2022-03-09 09:55:39.208457 | -> DHCP Offer
 2022-03-09 09:55:39.218127 | <- DHCP Request
 2022-03-09 09:55:39.247446 | -> DHCP Nak
 2022-03-09 09:55:39.363512 | -> DHCP Ack

I also checked the source of another popular dhcp client, dhcpcd,
which also restarts the transaction when a NAK is received.


According to RFC 2131, the result of a NAK in both REBOOTING and
REQUESTING is that the client should throw away the offer/lease and
restart the state machine from INIT (section 3.1.5, section 3.2.3,
state machine in Figure 5):

     If the client receives a DHCPNAK message, the client restarts the
     configuration process.

     If the client receives a DHCPNAK message, it cannot reuse its
     remembered network address.  It must instead request a new
     address by restarting the configuration process, this time
     using the (non-abbreviated) procedure described in section
     3.1.

Furthermore, I found no indication that the client should
perform any validation on the NAK packet (e.g. on the server-id)
except for the transaction-id.

In section 3.2 ("Client-server interaction - reusing a previously
allocated network address") the specification says:

      If the client's request is invalid (e.g., the client has moved
      to a new subnet), servers SHOULD respond with a DHCPNAK message to
      the client. Servers SHOULD NOT respond if their information is not
      guaranteed to be accurate.  For example, a server that identifies a
      request for an expired binding that is owned by another server SHOULD
      NOT respond with a DHCPNAK unless the servers are using an explicit
      mechanism to maintain coherency among the servers.

This paragraph is about rebooting (i.e. when the client start with a
know address), so I'm not sure it applies also to a normal start (from
DHCP DISCOVER); however the indication is clear that other servers
should not send the NAK for leases belonging to other servers.


In conclusion, I think that currently the internal client behaves
according to RFC. Nevertheless, we could deviate from the standard to
better deal with situations like the one reported here; the fact that
dhclient does that (and dhclient is a very common DHCP client)
probably guarantees that there are no side effects. I prepared a patch
to implement the new behavior here:

 https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commits/bg/dhcp-nak/

What do others think?

Comment 14 Nils Koenig 2022-03-09 20:59:38 UTC
@bgalvani Thank you very much for your comprehensive analysis and for providing a patch so quickly, much appreciated.

Looking at the state machine and reading 3.1.3 and 3.1.4 I can understand, why NetworkManager is implemented that way,
that as soon as any NAK is seen, it returns to INIT. 
But my personal opinion here is, why should we care about other servers NAKs when we are in REQUESTING (we have send a DHCPOFFER to a specific server)?
Would it make sense to only respect NAKs from the server we have sent the DHCPOFFER to and ignore the others?
Could it be that the RFC is a bit unspecific in the particular case on which NAKs to respect and which not or am I overseeing something here?

In the case that we have moved to a different network segment and try to REBIND/RENEW I would expect to get a DHCPNAK from the server we have received the IP earlier.
And I think we wouldn't change the behavior here if we only change what NAKs to respect/ignore in the REQUESTING state.

I had a look at what dhclient does. If I read the code correct, it just ignores the NAK if there is no active lease (line 2303ff):

	if (!client -> active) {
#if defined (DEBUG)
		log_info ("DHCPNAK with no active lease.\n");
#endif
		return;
	}

So no fancy checking, if the NAK is from the server we've sent the DHCPOFFER to. I am surprised, that it works when we have an active lease. 
In the RENEWING/REBINDING state client->active should be not NULL and the NAKs would cause an DHCPOFFER to be turned down.
Maybe it works because we go back to INIT and start over, but I am guessing here.

 --------                               -------
|        | +-------------------------->|       |<-------------------+
| INIT-  | |     +-------------------->| INIT  |                    |
| REBOOT |DHCPNAK/         +---------->|       |<---+               |
|        |Restart|         |            -------     |               |
 --------  |  DHCPNAK/     |               |                        |
    |      Discard offer   |      -/Send DHCPDISCOVER               |
-/Send DHCPREQUEST         |               |                        |
    |      |     |      DHCPACK            v        |               |
 -----------     |   (not accept.)/   -----------   |               |
|           |    |  Send DHCPDECLINE |           |                  |
| REBOOTING |    |         |         | SELECTING |<----+            |
|           |    |        /          |           |     |DHCPOFFER/  |
 -----------     |       /            -----------   |  |Collect     |
    |            |      /                  |   |       |  replies   |
DHCPACK/         |     /  +----------------+   +-------+            |
Record lease, set|    |   v   Select offer/                         |
timers T1, T2   ------------  send DHCPREQUEST      |               |
    |   +----->|            |             DHCPNAK, Lease expired/   |
    |   |      | REQUESTING |                  Halt network         |
    DHCPOFFER/ |            |                       |               |
    Discard     ------------                        |               |
    |   |        |        |                   -----------           |
    |   +--------+     DHCPACK/              |           |          |
    |              Record lease, set    -----| REBINDING |          |
    |                timers T1, T2     /     |           |          |
    |                     |        DHCPACK/   -----------           |
    |                     v     Record lease, set   ^               |
    +----------------> -------      /timers T1,T2   |               |
               +----->|       |<---+                |               |
               |      | BOUND |<---+                |               |
  DHCPOFFER, DHCPACK, |       |    |            T2 expires/   DHCPNAK/
   DHCPNAK/Discard     -------     |             Broadcast  Halt network
               |       | |         |            DHCPREQUEST         |
               +-------+ |        DHCPACK/          |               |
                    T1 expires/   Record lease, set |               |
                 Send DHCPREQUEST timers T1, T2     |               |
                 to leasing server |                |               |
                         |   ----------             |               |
                         |  |          |------------+               |
                         +->| RENEWING |                            |
                            |          |----------------------------+
                             ----------
          Figure 5:  State-transition diagram for DHCP clients




  3. The client receives one or more DHCPOFFER messages from one or more
     servers.  The client may choose to wait for multiple responses.
     The client chooses one server from which to request configuration
     parameters, based on the configuration parameters offered in the
     DHCPOFFER messages.  The client broadcasts a DHCPREQUEST message
     that MUST include the 'server identifier' option to indicate which
     server it has selected, and that MAY include other options
     specifying desired configuration values.  The 'requested IP
     address' option MUST be set to the value of 'yiaddr' in the
     DHCPOFFER message from the server.  This DHCPREQUEST message is
     broadcast and relayed through DHCP/BOOTP relay agents.  To help
     ensure that any BOOTP relay agents forward the DHCPREQUEST message
     to the same set of DHCP servers that received the original
     DHCPDISCOVER message, the DHCPREQUEST message MUST use the same
     value in the DHCP message header's 'secs' field and be sent to the
     same IP broadcast address as the original DHCPDISCOVER message.
     The client times out and retransmits the DHCPDISCOVER message if
     the client receives no DHCPOFFER messages.

  4. The servers receive the DHCPREQUEST broadcast from the client.
     Those servers not selected by the DHCPREQUEST message use the
     message as notification that the client has declined that server's
     offer.  The server selected in the DHCPREQUEST message commits the
     binding for the client to persistent storage and responds with a
     DHCPACK message containing the configuration parameters for the
     requesting client.  The combination of 'client identifier' or
     'chaddr' and assigned network address constitute a unique
     identifier for the client's lease and are used by both the client
     and server to identify a lease referred to in any DHCP messages.
     Any configuration parameters in the DHCPACK message SHOULD NOT
     conflict with those in the earlier DHCPOFFER message to which the
     client is responding.  The server SHOULD NOT check the offered
     network address at this point. The 'yiaddr' field in the DHCPACK
     messages is filled in with the selected network address.

     If the selected server is unable to satisfy the DHCPREQUEST message
     (e.g., the requested network address has been allocated), the
     server SHOULD respond with a DHCPNAK message.

     A server MAY choose to mark addresses offered to clients in
     DHCPOFFER messages as unavailable.  The server SHOULD mark an
     address offered to a client in a DHCPOFFER message as available if
     the server receives no DHCPREQUEST message from that client.

Comment 15 Nils Koenig 2022-03-10 15:41:05 UTC
Would someone be able to provide a scratch build for RHEL8 PPC64LE for us to test out the patch Beniamino has created?

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commits/bg/dhcp-nak/

I gave up after libndp-devel is not available for RHEL8 on PPC64LE.

Comment 17 Beniamino Galvani 2022-03-11 18:35:36 UTC
> Would it make sense to only respect NAKs from the server we have sent the DHCPOFFER to and ignore the others?

Yes, on a second thought, that seems the best solution to me.

I updated the patch and opened a merge request:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1144

Comment 19 Nils Koenig 2022-03-15 16:41:05 UTC
I can confirm that the second patch 
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1144
solves our issue.

Comment 28 errata-xmlrpc 2022-05-10 14:55:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:1985


Note You need to log in before you can comment on or make changes to this bug.