Bug 2051588

Summary: Unavailable network after re-loading kernel modules from driverdisk
Product: Red Hat Enterprise Linux 9 Reporter: Jan Stodola <jstodola>
Component: dracutAssignee: dracut-maint-list
Status: CLOSED WONTFIX QA Contact: qe-baseos-daemons
Severity: high Docs Contact:
Priority: unspecified    
Version: 9.0CC: jikortus, jkonecny, lrintel
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-07 07:28:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1942219, 2027125    

Description Jan Stodola 2022-02-07 14:59:08 UTC
Description of problem:
The installer fails to mount nfs server with the installation tree after (re)loading kernel modules from driverdisk.

Based on console log, the installer downloads the kickstart file followed by the driverdisk image specified in the kickstart file.
Then the updated kernel module e1000e from the driverdisk is re-loaded.
The installer tries to mount the installation nfs server, but since the network device has not been re-activated by Network-Manager yet, the mount attempt fails and the installation is aborted:

...
dracut initqueue hook   
...  
[   14.029445] dracut-initqueue[1249]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current  
[   14.029560] dracut-initqueue[1249]:                                  Dload  Upload   Total   Spent    Left  Speed  
[   14.039445] dracut-initqueue[1249]:    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 100 27180  100 27180    0     0  2654k      0 --:--:-- --:--:-- --:--:-- 2654k  
[   14.907721] dracut-initqueue[1310]: grep: /tmp/dd_net.done: No such file or directory  
[   14.916039] dracut-initqueue[1317]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current  
[   14.916155] dracut-initqueue[1317]:                                  Dload  Upload   Total   Spent    Left  Speed  
[   14.922958] dracut-initqueue[1317]:    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 100  524k  100  524k    0     0  73.1M      0 --:--:-- --:--:-- --:--:-- 73.1M  
[   14.996639] dracut-initqueue[1320]: DD: Examining /tmp/curl_f[   16.391205] loop: module loaded 
etch_url0/dd.iso[   16.396323] loop0: detected capacity change from 0 to 1048 
  
[   15.019093] dracut-initqueue[1321]: mount: /media/DD-1: WARNING: source write-protected, mounted read-only.  
[   15.025853] dracut-initqueue[1320]: DD: Extracting: kmod-redhat-e1000e  
[   15.031783] dracut-initqueue[1320]: DD: Extracting: kmod-redhat-virtio_blk  
[   16.529016] e1000e 0000:0b:00.0 enp11s0: NIC Link is Down 
etry after Duplicate Address Detection failures (back off)  
<info>  [1643631968.9703] device (enp11s0): s[   16.628003] e1000e 0000:06:00.0 enp6s0: removed PHC 
tate change: unavailable -> unmanaged (reason 'removed', sys-iface-state: 'removed')  
<info>  [1643631968.9942] device (enp6s0): state change: activated -> unmanaged (reason 'removed', sys-iface-state: 'removed')  
<info>  [1643631968.9992] dhcp4 (enp6s0): canceled DHCP transaction  
<info>  [1643631968.9993] dhcp4 (enp6s0): activation: beginning transaction (timeout in 90 seconds)  
<info>  [1643631968.9993] dhcp4 (enp6s0): state changed no lease  
<info>  [1643631968.9998] manager: NetworkManager state is now DISCONNECTED  
[   19.179914] virtio_blk: loading out-of-tree module taints kernel. 
[   19.186748] virtio_blk: module verification failed: signature and/or required key missing - tainting kernel 
[   19.198261] DRIVERDISK: virtio_blk loaded 
[   19.205869] e1000e: DRIVERDISK: e1000e loaded 
[   19.210733] e1000e: Intel(R) PRO/1000 Network Driver 
[   19.216274] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. 
[   19.223085] e1000e 0000:06:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode 
[   19.274429] e1000e 0000:06:00.0 0000:06:00.0 (uninitialized): registered PHC clock 
[   19.326905] e1000e 0000:06:00.0 eth0: (PCI Express:2.5GT/s:Width x1) 40:f2:e9:32:7e:36 
[   19.329000] e1000e 0000:06:00.0 enp6s0: renamed from eth0 
[   19.335746] e1000e 0000:06:00.0 enp6s0: Intel(R) PRO/1000 Network Connection 
[   19.335763] e1000e 0000:06:00.0 enp6s0: MAC: 3, PHY: 8, PBA No: FFFFFF-0FF 
<info>  [1643631[   19.357576] e1000e 0000:0b:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode 
971.6852] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/5)  
<info>  [1643631971.7285] device (eth0): interface index 5 renamed iface from 'eth0' to 'enp6s0'  
<info>  [1643631971.7336] device (enp6s0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')  
[   19.409421] e1000e 0000:0b:00.0 0000:0b:00.0 (uninitialized): registered PHC clock 
[   19.437174] 8021q: adding VLAN 0 to HW filter on device enp6s0 
[   19.461887] e1000e 0000:0b:00.0 eth0: (PCI Express:2.5GT/s:Width x1) 40:f2:e9:32:7e:37 
[   19.462920] e1000e 0000:0b:00.0 enp11s0: renamed from eth0 
<info>  [1643631[   19.470728] e1000e 0000:0b:00.0 enp11s0: Intel(R) PRO/1000 Network Connection 
971.8199] manage[   19.470748] e1000e 0000:0b:00.0 enp11s0: MAC: 3, PHY: 8, PBA No: FFFFFF-0FF 
r: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/6)  
<info>  [1643631971.8585] device (eth0): interface index 6 renamed iface from 'eth0' to 'enp11s0'  
<info>  [1643631971.8637] device (enp11s0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')  
[   18.125897] dracut-initqueue[1394]: mount.nfs: Failed to resolve server XYZ.redhat.com: Name or service not known  
[   18.126211] dracut-initqueue[1081]: Warning: Couldn't mount nfs:nfsvers=3:XYZ.redhat.com:/<PATH_TO_COMPOSE>/  
[   18.130591] d[   19.565220] 8021q: adding VLAN 0 to HW filter on device enp11s0 
racut-initqueue[1081]: Warning: no suitable images  
[   21.946868] e1000e 0000:06:00.0 enp6s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None 
[   21.957027] IPv6: ADDRCONF(NETDEV_CHANGE): enp6s0: link becomes ready 
<info>  [1643631974.3220] device (enp6s0): carrier: link connected  
<info>  [1643631974.3223] device (enp6s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')  
<info>  [1643631974.3229] policy: auto-activating connection 'Wired Connection' (4d9ea41c-ab47-42fb-bd0e-03c21f460869)  
<info>  [1643631974.3233] device (enp6s0): Activation: starting connection 'Wired Connection' (4d9ea41c-ab47-42fb-bd0e-03c21f460869)  
<info>  [1643631974.3233] device (enp6s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')  
<info>  [1643631974.3235] manager: NetworkManager state is now CONNECTING  
<info>  [1643631974.3236] device (enp6s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')  
<info>  [1643631974.3241] device (enp6s0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')  
<info>  [1643631974.3246] dhcp4 (enp6s0): activation: beginning transaction (timeout in 90 seconds)  
<info>  [1643631975.7221] policy: set 'Wired Connection' (enp6s0) as default for IPv6 routing and DNS  
<info>  [1643631976.3284] dhcp4 (enp6s0): state changed new lease, address=10.16.67.202  
<info>  [1643631976.3287] policy: set 'Wired Connection' (enp6s0) as default for IPv4 routing and DNS  
<info>  [1643631976.3307] device (enp6s0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')  
<info>  [1643631976.3320] device (enp6s0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')  
<info>  [1643631976.3322] device (enp6s0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')  
<info>  [1643631976.3324] manager: NetworkManager state is now CONNECTED_SITE  
<info>  [1643631976.3326] device (enp6s0): Activation: successful, device activated.  
<info>  [1643631976.3330] manager: NetworkManager state is now CONNECTED_GLOBAL  
...
dracut shell


Version-Release number of selected component (if applicable):
RHEL-9.0.0-20220117.0
anaconda-34.25.0.24-1.el9
dracut-055-10.git20210824.el9

How reproducible:
Sometimes

Steps to Reproduce:
1. Prepare a driverdisk with an updated kernel module for the network device
2. Start a kickstart installation with the installation source on network (nfs/http), specify the driverdisk on the kernel command line or in the kickstart file

Actual results:
The installation is aborted due to unavailable network when fetching stage2.

Expected results:
The installer is able to fetch stage2 from network after re-loading the kernel modules from the driverdisk image.

Comment 2 Jiri Konecny 2022-02-09 09:49:09 UTC
Hi Lukas,

I would like to get NM recommendation about this bug.

To summarize this issue:
- During boot a user will use Driver Update Disks to change the kernel driver for ethernet device (e1000 in this case)
- During the load of the driver we are loosing the ethernet device and NM is disconnected
- When new driver is loaded then NM will not take it back and we are not able to reach resources on the network


Do you have an idea how to solve this issue? I'm thinking if I shouldn't just restart the NM service after Driver Disk load every time. Another problem is that I can't for sure tell what are the network drivers or if the driver loaded doesn't have dependency on that. And I think that restarting the NM service could load again as with the first start so we will continue as before.

Could you please confirm my assumption or give me recommendation how to solve this?

Comment 6 Lubomir Rintel 2022-12-19 19:15:57 UTC
(In reply to Jiri Konecny from comment #2)
> Hi Lukas,
> 
> I would like to get NM recommendation about this bug.

I'm looking into this. I believe this can't be solved without changes to NM, but I still need to figure out what precisely is needed.

I think we need to stop the network-online.target and restart the online check again once things settle, so that the target would eventually be reached again.

Unfortunatelly NM's idea of when things are settled is essentially tailored toward the initial startup; there's no way of making sure NetworkManager has seen and configured all the devices it's supposed to after we re-add the module.

I'll update this bug after I'm able to crank out something useful.

Comment 7 Lubomir Rintel 2022-12-20 13:43:09 UTC
Uh, ok, I overthought the above a bit. Turns out the network root handling is really a best effort thing really and the whole idea is that if we try hard enough we'd eventually end up with network configured. And it appears in RHEL-9 we no longer try hard enough.

This really is a dracut bug and this should solve the problem:
https://github.com/dracutdevs/dracut/pull/2134

Comment 8 Jiri Konecny 2022-12-20 14:47:43 UTC
Thanks a lot Lubomir for helping us here! Switching to Dracut to get it solved correctly by the PR above.

Comment 10 RHEL Program Management 2023-08-07 07:28:19 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.