RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2007563 - NM failed to bring up bonding network in initrd when using the same .nmconnection files in the real root system
Summary: NM failed to bring up bonding network in initrd when using the same .nmconnec...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: NetworkManager
Version: 9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: NetworkManager Development Team
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-24 09:26 UTC by Coiby
Modified: 2022-12-21 01:51 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-21 01:51:34 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nm trace log when failed to bring up bonding network (240.11 KB, text/plain)
2021-09-24 09:26 UTC, Coiby
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-98141 0 None None None 2021-09-24 09:27:24 UTC

Description Coiby 2021-09-24 09:26:47 UTC
Created attachment 1825874 [details]
nm trace log when failed to bring up bonding network

Description of problem:


In the real root file system, there is an active network interface. I created a bonding network using this network inferface as the slave and copied the three .nmconnection files to the initramfs and trigger sysrq to boot into kdump kernel. But in initrd the bonding network failed to be brought up. Attached is the NM trace log. While in the normal kernel, the boding network could be brought up successfully each time after rebooting. 


Version-Release number of selected component (if applicable):


How reproducible:

always

Steps to Reproduce:
1. There is an active network interface specified in e.g. /etc/NetworkManager/system-connections/eno1.nmconnection,
```
[connection]
id=eno1
uuid=2a180551-3c17-4cee-b184-787a9069fc29
type=ethernet
interface-name=eno1
permissions=

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=eui64
dns-search=
method=auto

[proxy]

```

2. create a bonding network over this network interface, 
```
nmcli con add type bond ifname mybond0
nmcli con add type ethernet ifname eth0 master mybond0
nmcli c up bond-slave-eth0
``` 

3. Now NM created two new files,
```
[root@hpe-dl320egen8-02 system-connections]# cat bond-mybond0.nmconnection 
[connection]
id=bond-mybond0
uuid=1e489223-9a7c-4f30-859a-1586dc3142c8
type=bond
interface-name=mybond0
permissions=

[bond]
mode=balance-rr

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=stable-privacy
dns-search=
method=auto

[proxy]
[root@hpe-dl320egen8-02 system-connections]# cat bond-slave-eno1.nmconnection 
[connection]
id=bond-slave-eno1
uuid=3d0472d6-ccfa-438a-990f-828eee9528fe
type=ethernet
interface-name=eno1
master=mybond0
permissions=
slave-type=bond

[ethernet]
mac-address-blacklist=

```
3. Make kexec-tools copy the three .nmconnection files to the initramfs and trigger sysrq


Actual results:

NM failed to bring up the bonding network.

Expected results:

NM should bring up the bonding network in initrd successfully as in the normal kernel

Additional info:

1. If I didn't copy the origial .nmconnection e.g. /etc/NetworkManager/system-connections/eno1.nmconnection to initrd, the bonding network could be brought up successfully.

2. I first met this problem when trying to set up bonding network for a z/vm s390x machine. The z/vm s390x machine use znet and depends on /usr/lib/udev/ccw_init to activiate the network interface. ccw_init in turn depends on ifcfg-enc8000 or enc8000.nmconnection to extract s390-subchannels, s390-nettype and s390-options. So I have to copy enc8000.nmconnection to initrd as well. But if I simply remove the lines not containing s390-subchannels, s390-nettype or s390-options in enc8000.nmconnection. The bonding network could be brought up successfully.

3. nm-initrd-generator doesn't create a bond-slave-*.nmconnection like NM,

```
$ /usr/libexec/nm-initrd-generator -s -- rd.znet=qeth,0.0.8000,0.0.8001,0.0.8002,layer2=1,portno=0 ip=mybond0:dhcp ifname=enc8000:02:de:ad:be:ef:64 bond=mybond0:enc8000: nameserver=127.0.0
.53 rd.neednet

*** Connection 'enc8000' ***

[connection]
id=enc8000
uuid=aefc0e62-29e5-4cbd-ae9a-388cea2da78f
type=ethernet
autoconnect-retries=1
interface-name=enc8000
master=b4e9d4b0-5fa4-4ccf-9973-94b73d0d6e1a
multi-connect=1
permissions=
slave-type=bond
wait-device-timeout=60000

[ethernet]
mac-address-blacklist=
s390-nettype=qeth
s390-subchannels=0.0.8000;0.0.8001;0.0.8002;

[ethernet-s390-options]
layer2=1
portno=0

[user]
org.freedesktop.NetworkManager.origin=nm-initrd-generator

*** Connection 'mybond0' ***

[connection]
id=mybond0
uuid=b4e9d4b0-5fa4-4ccf-9973-94b73d0d6e1a
type=bond
autoconnect-retries=1
interface-name=mybond0
multi-connect=1
permissions=

[ethernet]
mac-address-blacklist=

[bond]
mode=balance-rr

[ipv4]
dhcp-timeout=90
dns=127.0.0.53;
dns-search=
may-fail=false
method=auto

[ipv6]
addr-gen-mode=eui64
dhcp-timeout=90
dns-search=
method=auto

[proxy]

[user]
org.freedesktop.NetworkManager.origin=nm-initrd-generator

```

 If I copy the two .nmconnection files to initrd, the bonding network could be brought up successlly.

Comment 1 Coiby 2021-09-24 11:55:23 UTC
This bug also applies to bridging network.

Comment 2 Beniamino Galvani 2021-09-27 07:38:44 UTC
> 1. If I didn't copy the origial .nmconnection e.g. /etc/NetworkManager/system-connections/eno1.nmconnection to initrd, the bonding network could be brought up successfully.

The problem is that there are two connection files for the same device eno1. One connection (the one copied from real root) configures eno1 as standalone interface, the other configures it as port of 'mybond0'. These two configuration are conflicting and NM chooses one (standalone), which is not the one you expect.

> 3. nm-initrd-generator doesn't create a bond-slave-*.nmconnection like NM,

Right, with `ip=mybond0:dhcp bond=mybond0:enc8000:` the generator should only create 2 connections: one for the bond and one for enc8000 (configured as port of the bond).

Comment 3 Beniamino Galvani 2021-09-27 14:04:57 UTC
(In reply to Beniamino Galvani from comment #2)
> > 1. If I didn't copy the origial .nmconnection e.g. /etc/NetworkManager/system-connections/eno1.nmconnection to initrd, the bonding network could be brought up successfully.
> 
> The problem is that there are two connection files for the same device eno1.
> One connection (the one copied from real root) configures eno1 as standalone
> interface, the other configures it as port of 'mybond0'. These two
> configuration are conflicting and NM chooses one (standalone), which is not
> the one you expect.

In case this wasn't clear, you should copy only the 'bond-mybond0' and the 'bond-slave-eno1' connections to the initrd. Otherwise, if you copy also the 'eno1' connection from real root, the device eno1 will not be put under the bond, and the bond will not be able to get an address via DHCP.

Comment 4 Coiby 2021-09-28 05:32:36 UTC
(In reply to Beniamino Galvani from comment #2)
> > 1. If I didn't copy the origial .nmconnection e.g. /etc/NetworkManager/system-connections/eno1.nmconnection to initrd, the bonding network could be brought up successfully.
> 
> The problem is that there are two connection files for the same device eno1.
> One connection (the one copied from real root) configures eno1 as standalone
> interface, the other configures it as port of 'mybond0'. These two
> configuration are conflicting and NM chooses one (standalone), which is not
> the one you expect.

Although I still don't know what leads to the difference between real root system and initrd, I find "nmcli connection modify --temporary ID connection.autoconnect false" would make NM to not bring up specific connnection thus bypass this issue. Will you recommend it? Btw, adjusting connection.autoconnection-priority could also make NM bring up the connection I want but it doesn't work for znet network device.

> 
> > 3. nm-initrd-generator doesn't create a bond-slave-*.nmconnection like NM,
> 
> Right, with `ip=mybond0:dhcp bond=mybond0:enc8000:` the generator should
> only create 2 connections: one for the bond and one for enc8000 (configured
> as port of the bond).

I know /usr/lib/udev/ccw_init need to extract the values of SUBCHANNELS, NETTYPE and LAYER2 from /etc/sysconfig/network-scripts/ifcfg-* or /etc/NetworkManager/system-connections/*.nmconnection to activate znet network device. But the info needed by ccw_init isn't contained in the connection profile bond-slave-enc8000.nmconnection like NM, created via "nmcli con add type ethernet ifname enc8000 master mybond0". This is why I need to copy enc8000.nmconnection to initrd as well which led me to find this bug.

Comment 5 Coiby 2021-09-28 05:37:46 UTC
(In reply to Beniamino Galvani from comment #3)
> (In reply to Beniamino Galvani from comment #2)
> > > 1. If I didn't copy the origial .nmconnection e.g. /etc/NetworkManager/system-connections/eno1.nmconnection to initrd, the bonding network could be brought up successfully.
> > 
> > The problem is that there are two connection files for the same device eno1.
> > One connection (the one copied from real root) configures eno1 as standalone
> > interface, the other configures it as port of 'mybond0'. These two
> > configuration are conflicting and NM chooses one (standalone), which is not
> > the one you expect.
> 
> In case this wasn't clear, you should copy only the 'bond-mybond0' and the
> 'bond-slave-eno1' connections to the initrd. Otherwise, if you copy also the

Thanks for the clarification. Is znet device the only exception as explained in Comment #4? 

> 'eno1' connection from real root, the device eno1 will not be put under the
> bond, and the bond will not be able to get an address via DHCP.

Comment 6 Thomas Haller 2021-10-14 15:17:31 UTC
> Although I still don't know what leads to the difference between real root system and initrd, I find "nmcli connection modify --temporary ID connection.autoconnect false" would make NM to not bring up specific connnection thus bypass this issue. Will you recommend it? Btw, adjusting connection.autoconnection-priority could also make NM bring up the connection I want but it doesn't work for znet network device.

In a simple case, there is one device and one suitable profile to autoconnect, and what happens iis clear.

If you have multiple profiles that are applicable at a time on the device (i.e. they are able to autoconnect because all the circumstances are right) then:

- if you configure different "connection.autoconnect-priority", then that determines which profile is chosen. Likewise, if you set `connection.autoconnect=false`, that of course disables autoconnect for the other profile, also resolving the tie.

- in case there are still multiple candidates, then NM first chooses the one with the more recent timestamp in /var/lib/NetworkManager/timestamps (which gets updated whenever you activate a new profile). But that timestamp information is not directly accessible or under your control. A human user can somewhat control that, by explicitly activate the profile they want. But for a non-interactive tool, there is real no solution what you can do. The solution is: don't configure conflicting/unsuitable things in NetworkManager if you want that automatically the right thing happens.


> I know /usr/lib/udev/ccw_init need to extract the values of SUBCHANNELS, NETTYPE and LAYER2 from /etc/sysconfig/network-scripts/ifcfg-* or /etc/NetworkManager/system-connections/*.nmconnection to activate znet network device. But the info needed by ccw_init isn't contained in the connection profile bond-slave-enc8000.nmconnection like NM, created via "nmcli con add type ethernet ifname enc8000 master mybond0". This is why I need to copy enc8000.nmconnection to initrd as well which led me to find this bug.


znet uses a udev rule which parses NetworkManager profiles to configure the interfaces. The way of doing that is in my opinion wrong and a hack. In particular, because the general idea of NetworkManager is that you create profiles and activate them (for the configuration to take effect). The udev rules only run once per interface. It's doubly odd that the udev rule likes to parse NetworkManager configuration. If you would write a udev rule (and configure it in the rule, or via some script), then that is all fine. But here the udev rule re-uses some NetworkManager files for its own configuration. I don't know how to solve that. I suggest a hack :)

if you have a profile for the sole purpose to configure the udev rule, then you probably don't want to configure that one to autoconnect.... well, it depends again on what purpose the user has for this profile, which a non-interactive tool can only guess.

Proper znet support from NetworkManager might be interesting. But a large effort.

Comment 9 Beniamino Galvani 2022-12-12 09:01:59 UTC
(In reply to Coiby from comment #7)
> Thanks for the explanation.  In some cases, I find NM doesn't choose the one with the more recent timestamp,

> Could any other factors influence NM's behaviour on choosing which profile to activate?

No, the order is determined by the autoconnect-priority value; in case of a tie the timestamp is compared. As a last resort, the UUID is used.


> 2. create a bonding network over this network interface, 
> 
> nmcli con add type bond ifname mybond0
> nmcli con add type ethernet ifname eth0 master mybond0
> nmcli c up bond-slave-eth0

> nm-initrd-generator doesn't create a bond-slave-*.nmconnection like NM,

>  If I copy the two .nmconnection files to initrd, the bonding network could be brought up successfully.

That's because you are not specifying the s390 parameters for the ethernet connection. You can use this command:

  nmcli con add type ethernet \
                ifname eth0 \
                master mybond0 \
                ethernet.s390-nettype qeth \
                ethernet.s390-subchannels "0.0.8000 0.0.8001 0.0.8002;" \
                ethernet.s390-options "layer2=1 portno=0"

to add the ethernet as a bond port and also set the s390 parameters. With this you will need only two connection profiles in the initrd.

Comment 10 Thomas Haller 2022-12-16 21:14:29 UTC
I think Beniamino answered already in comment 9.

sorry for being so late to answer this request.

Is there anything else missing on this issue? It seems to me, there is no bug here.

Comment 11 Coiby 2022-12-21 01:50:51 UTC
(In reply to Beniamino Galvani from comment #9)
> (In reply to Coiby from comment #7)
> > Thanks for the explanation.  In some cases, I find NM doesn't choose the one with the more recent timestamp,
> 
> > Could any other factors influence NM's behaviour on choosing which profile to activate?
> 
> No, the order is determined by the autoconnect-priority value; in case of a
> tie the timestamp is compared. As a last resort, the UUID is used.

So there are three factors for NM to determine which profile to activate. Thanks for the clarification!

Comment 12 Coiby 2022-12-21 01:51:34 UTC
(In reply to Thomas Haller from comment #10)
> I think Beniamino answered already in comment 9.
> 
> sorry for being so late to answer this request.
> 
> Is there anything else missing on this issue? It seems to me, there is no
> bug here.

Yes, this not a bug. Thanks!


Note You need to log in before you can comment on or make changes to this bug.