Bug 2007563

Summary: NM failed to bring up bonding network in initrd when using the same .nmconnection files in the real root system
Product: Red Hat Enterprise Linux 9 Reporter: Coiby <coxu>
Component: NetworkManagerAssignee: NetworkManager Development Team <nm-team>
Status: NEW --- QA Contact: Desktop QE <desktop-qa-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.0CC: bgalvani, ferferna, fge, lrintel, rkhan, sukulkar, thaller, till
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: fge: needinfo? (bgalvani)
coxu: needinfo? (thaller)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
nm trace log when failed to bring up bonding network none

Description Coiby 2021-09-24 09:26:47 UTC
Created attachment 1825874 [details]
nm trace log when failed to bring up bonding network

Description of problem:


In the real root file system, there is an active network interface. I created a bonding network using this network inferface as the slave and copied the three .nmconnection files to the initramfs and trigger sysrq to boot into kdump kernel. But in initrd the bonding network failed to be brought up. Attached is the NM trace log. While in the normal kernel, the boding network could be brought up successfully each time after rebooting. 


Version-Release number of selected component (if applicable):


How reproducible:

always

Steps to Reproduce:
1. There is an active network interface specified in e.g. /etc/NetworkManager/system-connections/eno1.nmconnection,
```
[connection]
id=eno1
uuid=2a180551-3c17-4cee-b184-787a9069fc29
type=ethernet
interface-name=eno1
permissions=

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=eui64
dns-search=
method=auto

[proxy]

```

2. create a bonding network over this network interface, 
```
nmcli con add type bond ifname mybond0
nmcli con add type ethernet ifname eth0 master mybond0
nmcli c up bond-slave-eth0
``` 

3. Now NM created two new files,
```
[root@hpe-dl320egen8-02 system-connections]# cat bond-mybond0.nmconnection 
[connection]
id=bond-mybond0
uuid=1e489223-9a7c-4f30-859a-1586dc3142c8
type=bond
interface-name=mybond0
permissions=

[bond]
mode=balance-rr

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=stable-privacy
dns-search=
method=auto

[proxy]
[root@hpe-dl320egen8-02 system-connections]# cat bond-slave-eno1.nmconnection 
[connection]
id=bond-slave-eno1
uuid=3d0472d6-ccfa-438a-990f-828eee9528fe
type=ethernet
interface-name=eno1
master=mybond0
permissions=
slave-type=bond

[ethernet]
mac-address-blacklist=

```
3. Make kexec-tools copy the three .nmconnection files to the initramfs and trigger sysrq


Actual results:

NM failed to bring up the bonding network.

Expected results:

NM should bring up the bonding network in initrd successfully as in the normal kernel

Additional info:

1. If I didn't copy the origial .nmconnection e.g. /etc/NetworkManager/system-connections/eno1.nmconnection to initrd, the bonding network could be brought up successfully.

2. I first met this problem when trying to set up bonding network for a z/vm s390x machine. The z/vm s390x machine use znet and depends on /usr/lib/udev/ccw_init to activiate the network interface. ccw_init in turn depends on ifcfg-enc8000 or enc8000.nmconnection to extract s390-subchannels, s390-nettype and s390-options. So I have to copy enc8000.nmconnection to initrd as well. But if I simply remove the lines not containing s390-subchannels, s390-nettype or s390-options in enc8000.nmconnection. The bonding network could be brought up successfully.

3. nm-initrd-generator doesn't create a bond-slave-*.nmconnection like NM,

```
$ /usr/libexec/nm-initrd-generator -s -- rd.znet=qeth,0.0.8000,0.0.8001,0.0.8002,layer2=1,portno=0 ip=mybond0:dhcp ifname=enc8000:02:de:ad:be:ef:64 bond=mybond0:enc8000: nameserver=127.0.0
.53 rd.neednet

*** Connection 'enc8000' ***

[connection]
id=enc8000
uuid=aefc0e62-29e5-4cbd-ae9a-388cea2da78f
type=ethernet
autoconnect-retries=1
interface-name=enc8000
master=b4e9d4b0-5fa4-4ccf-9973-94b73d0d6e1a
multi-connect=1
permissions=
slave-type=bond
wait-device-timeout=60000

[ethernet]
mac-address-blacklist=
s390-nettype=qeth
s390-subchannels=0.0.8000;0.0.8001;0.0.8002;

[ethernet-s390-options]
layer2=1
portno=0

[user]
org.freedesktop.NetworkManager.origin=nm-initrd-generator

*** Connection 'mybond0' ***

[connection]
id=mybond0
uuid=b4e9d4b0-5fa4-4ccf-9973-94b73d0d6e1a
type=bond
autoconnect-retries=1
interface-name=mybond0
multi-connect=1
permissions=

[ethernet]
mac-address-blacklist=

[bond]
mode=balance-rr

[ipv4]
dhcp-timeout=90
dns=127.0.0.53;
dns-search=
may-fail=false
method=auto

[ipv6]
addr-gen-mode=eui64
dhcp-timeout=90
dns-search=
method=auto

[proxy]

[user]
org.freedesktop.NetworkManager.origin=nm-initrd-generator

```

 If I copy the two .nmconnection files to initrd, the bonding network could be brought up successlly.

Comment 1 Coiby 2021-09-24 11:55:23 UTC
This bug also applies to bridging network.

Comment 2 Beniamino Galvani 2021-09-27 07:38:44 UTC
> 1. If I didn't copy the origial .nmconnection e.g. /etc/NetworkManager/system-connections/eno1.nmconnection to initrd, the bonding network could be brought up successfully.

The problem is that there are two connection files for the same device eno1. One connection (the one copied from real root) configures eno1 as standalone interface, the other configures it as port of 'mybond0'. These two configuration are conflicting and NM chooses one (standalone), which is not the one you expect.

> 3. nm-initrd-generator doesn't create a bond-slave-*.nmconnection like NM,

Right, with `ip=mybond0:dhcp bond=mybond0:enc8000:` the generator should only create 2 connections: one for the bond and one for enc8000 (configured as port of the bond).

Comment 3 Beniamino Galvani 2021-09-27 14:04:57 UTC
(In reply to Beniamino Galvani from comment #2)
> > 1. If I didn't copy the origial .nmconnection e.g. /etc/NetworkManager/system-connections/eno1.nmconnection to initrd, the bonding network could be brought up successfully.
> 
> The problem is that there are two connection files for the same device eno1.
> One connection (the one copied from real root) configures eno1 as standalone
> interface, the other configures it as port of 'mybond0'. These two
> configuration are conflicting and NM chooses one (standalone), which is not
> the one you expect.

In case this wasn't clear, you should copy only the 'bond-mybond0' and the 'bond-slave-eno1' connections to the initrd. Otherwise, if you copy also the 'eno1' connection from real root, the device eno1 will not be put under the bond, and the bond will not be able to get an address via DHCP.

Comment 4 Coiby 2021-09-28 05:32:36 UTC
(In reply to Beniamino Galvani from comment #2)
> > 1. If I didn't copy the origial .nmconnection e.g. /etc/NetworkManager/system-connections/eno1.nmconnection to initrd, the bonding network could be brought up successfully.
> 
> The problem is that there are two connection files for the same device eno1.
> One connection (the one copied from real root) configures eno1 as standalone
> interface, the other configures it as port of 'mybond0'. These two
> configuration are conflicting and NM chooses one (standalone), which is not
> the one you expect.

Although I still don't know what leads to the difference between real root system and initrd, I find "nmcli connection modify --temporary ID connection.autoconnect false" would make NM to not bring up specific connnection thus bypass this issue. Will you recommend it? Btw, adjusting connection.autoconnection-priority could also make NM bring up the connection I want but it doesn't work for znet network device.

> 
> > 3. nm-initrd-generator doesn't create a bond-slave-*.nmconnection like NM,
> 
> Right, with `ip=mybond0:dhcp bond=mybond0:enc8000:` the generator should
> only create 2 connections: one for the bond and one for enc8000 (configured
> as port of the bond).

I know /usr/lib/udev/ccw_init need to extract the values of SUBCHANNELS, NETTYPE and LAYER2 from /etc/sysconfig/network-scripts/ifcfg-* or /etc/NetworkManager/system-connections/*.nmconnection to activate znet network device. But the info needed by ccw_init isn't contained in the connection profile bond-slave-enc8000.nmconnection like NM, created via "nmcli con add type ethernet ifname enc8000 master mybond0". This is why I need to copy enc8000.nmconnection to initrd as well which led me to find this bug.

Comment 5 Coiby 2021-09-28 05:37:46 UTC
(In reply to Beniamino Galvani from comment #3)
> (In reply to Beniamino Galvani from comment #2)
> > > 1. If I didn't copy the origial .nmconnection e.g. /etc/NetworkManager/system-connections/eno1.nmconnection to initrd, the bonding network could be brought up successfully.
> > 
> > The problem is that there are two connection files for the same device eno1.
> > One connection (the one copied from real root) configures eno1 as standalone
> > interface, the other configures it as port of 'mybond0'. These two
> > configuration are conflicting and NM chooses one (standalone), which is not
> > the one you expect.
> 
> In case this wasn't clear, you should copy only the 'bond-mybond0' and the
> 'bond-slave-eno1' connections to the initrd. Otherwise, if you copy also the

Thanks for the clarification. Is znet device the only exception as explained in Comment #4? 

> 'eno1' connection from real root, the device eno1 will not be put under the
> bond, and the bond will not be able to get an address via DHCP.

Comment 6 Thomas Haller 2021-10-14 15:17:31 UTC
> Although I still don't know what leads to the difference between real root system and initrd, I find "nmcli connection modify --temporary ID connection.autoconnect false" would make NM to not bring up specific connnection thus bypass this issue. Will you recommend it? Btw, adjusting connection.autoconnection-priority could also make NM bring up the connection I want but it doesn't work for znet network device.

In a simple case, there is one device and one suitable profile to autoconnect, and what happens iis clear.

If you have multiple profiles that are applicable at a time on the device (i.e. they are able to autoconnect because all the circumstances are right) then:

- if you configure different "connection.autoconnect-priority", then that determines which profile is chosen. Likewise, if you set `connection.autoconnect=false`, that of course disables autoconnect for the other profile, also resolving the tie.

- in case there are still multiple candidates, then NM first chooses the one with the more recent timestamp in /var/lib/NetworkManager/timestamps (which gets updated whenever you activate a new profile). But that timestamp information is not directly accessible or under your control. A human user can somewhat control that, by explicitly activate the profile they want. But for a non-interactive tool, there is real no solution what you can do. The solution is: don't configure conflicting/unsuitable things in NetworkManager if you want that automatically the right thing happens.


> I know /usr/lib/udev/ccw_init need to extract the values of SUBCHANNELS, NETTYPE and LAYER2 from /etc/sysconfig/network-scripts/ifcfg-* or /etc/NetworkManager/system-connections/*.nmconnection to activate znet network device. But the info needed by ccw_init isn't contained in the connection profile bond-slave-enc8000.nmconnection like NM, created via "nmcli con add type ethernet ifname enc8000 master mybond0". This is why I need to copy enc8000.nmconnection to initrd as well which led me to find this bug.


znet uses a udev rule which parses NetworkManager profiles to configure the interfaces. The way of doing that is in my opinion wrong and a hack. In particular, because the general idea of NetworkManager is that you create profiles and activate them (for the configuration to take effect). The udev rules only run once per interface. It's doubly odd that the udev rule likes to parse NetworkManager configuration. If you would write a udev rule (and configure it in the rule, or via some script), then that is all fine. But here the udev rule re-uses some NetworkManager files for its own configuration. I don't know how to solve that. I suggest a hack :)

if you have a profile for the sole purpose to configure the udev rule, then you probably don't want to configure that one to autoconnect.... well, it depends again on what purpose the user has for this profile, which a non-interactive tool can only guess.

Proper znet support from NetworkManager might be interesting. But a large effort.