2050216 – Device is failing on DHCPv4 after NM restart

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2050216 - Device is failing on DHCPv4 after NM restart

Summary: Device is failing on DHCPv4 after NM restart

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	NetworkManager
Sub Component:
Version:	9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Fernando F. Mancera
QA Contact:	Vladimir Benes
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2077605
TreeView+	depends on / blocked

Reported:	2022-02-03 13:09 UTC by Vladimir Benes
Modified:	2022-11-15 12:07 UTC (History)
CC List:	9 users (show)
Fixed In Version:	NetworkManager-1.39.3-1.el9
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-11-15 10:49:31 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-110997	None	None	None	2022-02-03 13:18:45 UTC
Red Hat Product Errata	RHBA-2022:8265	None	None	None	2022-11-15 10:49:51 UTC
freedesktop.org Gitlab	NetworkManager NetworkManager merge_requests 1196	None	opened	l3cfg: drop NM_L3_CFG_COMMIT_TYPE_ASSUME and assume_config_once	2022-04-26 14:21:23 UTC

Description Vladimir Benes 2022-02-03 13:09:57 UTC

Description of problem:
    @rhbz1086906
    @veth @delete_testeth0 @newveth @con_general_remove @teardown_testveth @restart_if_needed
    @wait-online-for-both-ips
    Scenario: NM - general - wait-online - for both ipv4 and ipv6
    * Prepare simulated test "testG" device
    * Add a new connection of type "ethernet" and options "ifname testG con-name con_general ipv4.may-fail no ipv6.may-fail no"
    * Restart NM
    * Execute "/usr/bin/nm-online -s -q --timeout=30"
    When "inet .* global" is visible with command "ip a s testG"
    Then "inet6 .* global" is visible with command "ip a s testG"

This fails here and there when IPv4 is slower than IPv6 and the profile is not fully connected and the device is not fully activated. We need to define what to do here. The test was updated with waiting for the connected state so please remove the line if you want to reproduce it. We need to add another test with a delayed DHCP server anyway.

Version-Release number of selected component (if applicable):
1.36.0

How reproducible:
when DHCP is slow

Steps to Reproduce:
1. run the above-mentioned test from NMCI

Actual results:
the test is racy

Expected results:
determinism

Additional info:

Comment 2 Thomas Haller 2022-02-03 13:34:49 UTC

> This fails here and there when IPv4 is slower than IPv6 and the profile is not fully connected and the device is not fully activated.

Not really. The two steps

      * Add a new connection of type "ethernet" and options "ifname testG con-name con_general ipv4.may-fail no ipv6.may-fail no"
      * Restart NM

can happen fast after each other, where the new profile did not yet complete (auto)activation and is still activating.

Then when restarting NM, it "assumes" the device, that was not fully configured earlier.


> We need to define what to do here.

I guess the solution is that during stop, we tear down interfaces that are still activating.


> The test was updated with waiting for the connected state so please remove the line if you want to reproduce it.

This: https://gitlab.freedesktop.org/NetworkManager/NetworkManager-ci/-/commit/07beacb8b540a134a9732b4d8beac522d7c57a5c

Comment 6 Vladimir Benes 2022-05-03 14:20:16 UTC

    @rhbz1086906
    @delete_testeth0 @restart_if_needed
    @wait-online-for-both-ips
    Scenario: NM - general - wait-online - for both ipv4 and ipv6
    * Prepare simulated test "testG" device
    * Add "ethernet" connection named "con_general" for device "testG" with options "ipv4.may-fail no ipv6.may-fail no"
    * Restart NM
    * Execute "/usr/bin/nm-online -s -q --timeout=30"
    When "inet .* global" is visible with command "ip a s testG"
    Then "inet6 .* global" is visible with command "ip a s testG"

tested 100 times w/o any issue with main branch copr package and original test as shown above

Comment 9 Vladimir Benes 2022-05-19 07:29:59 UTC

We have a new test in NMCI
https://gitlab.freedesktop.org/NetworkManager/NetworkManager-ci/-/merge_requests/1050

It randomly slows down the DHCP server and does a service restart. This covers both situations when DHCPv4 is done or not.

Comment 11 errata-xmlrpc 2022-11-15 10:49:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8265

Note You need to log in before you can comment on or make changes to this bug.