Bug 2012182

Summary: [NMCI] NetworkManager breaks connection to remote rootfs over vlan/bond/bridge created in dracut
Product: Red Hat Enterprise Linux 9 Reporter: Filip Pokryvka <fpokryvk>
Component: NetworkManagerAssignee: Wen Liang <wenliang>
Status: CLOSED ERRATA QA Contact: Filip Pokryvka <fpokryvk>
Severity: unspecified Docs Contact:
Priority: high    
Version: 9.0CC: bgalvani, fge, lrintel, rkhan, sukulkar, till, vbenes, wenliang
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: NetworkManager-1.36.0-0.4.el9 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-17 15:48:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Filip Pokryvka 2021-10-08 14:23:27 UTC
Description of problem:
Our dracut testsuite in NetworkManager-ci fails most of the time on RHEL9 (while it is stable on RHEL8), because when NetworkManager is started from root, it cleans the virtual device and atempts to create new. However, with boot over network (NFS, iSCSI), once the connection is down, rootfs is inaccessible and connection never gets restored.

Version-Release number of selected component (if applicable):
1.33.3-29316.copr.098a963e42.fc34

How reproducible:
very often (sometimes some of the tests pass)

Steps to Reproduce:
1. the following tests seems affected:
dracut_NM_vlan_over_bridge
dracut_NM_vlan_over_bond
dracut_NM_bridge_eth0
dracut_NM_bridge_custom_name_2_ifaces
dracut_NM_bond_over_2_ifaces_rr
dracut_NM_vlan_over_nic
dracut_NM_vlan_mutliple_over_nic

Actual results:
tests are unstable

Expected results:
tests should pass consistently, as on rhel8

Comment 1 Wen Liang 2021-12-14 02:30:46 UTC
After December 6th 2021 (build #731), all the dracut failures listed above were fixed according to the result from the build stats here, https://tools.dqe.lab.eng.bos.redhat.com/vbenes/nm_ci_stats/stats.html#project:beaker-NetworkManager-main-veth-rhel9-upstream;build:;search:dracut.

In order to find out which commit from NM or NM-ci might accidentally fix the dracut test failures, I reproduced the dracut test with different NM version and NM-ci version last Friday, 
 and some random dracut test failures appeared again. I summarized all the test results here, https://docs.google.com/spreadsheets/d/1mXhpOKHmO1u7IzcH4TBWS6T5HZF7Gh_57fRvFjst_eg/edit?usp=sharing.

Based on my investigation, the following dracut test failures appeared again:

```
tests.dracut_NM_vlan_over_nic
tests.dracut_NM_vlan_over_bond
tests.dracut_NM_bridge_eth0
tests.dracut_NM_bond_over_2_ifaces_rr
tests.dracut_NM_vlan_mutliple_over_nic
tests.dracut_NM_vlan_over_bridge
```

In NM log, all the above failures contained the same error (modprobe failed to apply the blacklist command in the configuration files to Linux SCSI Generic driver - `sg` ):

```
Dec 10 13:28:09 localhost.localdomain systemd-udevd[695]: 0:0:0:0: Process '/sbin/modprobe -bv sg' failed with exit code 1.
Dec 10 13:28:09 localhost.localdomain systemd[1]: Starting D-Bus System Message Bus...
Dec 10 13:28:09 localhost.localdomain systemd-udevd[699]: target1:0:0: Process '/sbin/modprobe -bv sg' failed with exit code 1.
Dec 10 13:28:09 localhost.localdomain systemd-udevd[692]: Using default interface naming scheme 'v249'.
Dec 10 13:28:09 localhost.localdomain systemd-udevd[697]: 1:0:0:0: Process '/sbin/modprobe -bv sg' failed with exit code 1.
```

Comment 3 Filip Pokryvka 2022-01-13 15:42:59 UTC
This seems to be fixed in NetworkManager main branch. Preverifing, when tests pass on the package, I will switch to verified.

Comment 7 Filip Pokryvka 2022-01-20 14:08:20 UTC
Tests seems stable now on rhel9, verifying.

Comment 10 errata-xmlrpc 2022-05-17 15:48:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: NetworkManager), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:3915