RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1959961 - [SNO][assisted-operator][nmstate] Bond Interface is down when booting from the discovery ISO
Summary: [SNO][assisted-operator][nmstate] Bond Interface is down when booting from th...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 8.5
Assignee: NetworkManager Development Team
QA Contact: Vladimir Benes
URL:
Whiteboard: AI-Team-Core KNI-EDGE-4.8
Depends On:
Blocks: 1965337
TreeView+ depends on / blocked
 
Reported: 2021-05-12 17:45 UTC by nshidlin
Modified: 2021-11-10 06:59 UTC (History)
25 users (show)

Fixed In Version: NetworkManager-1.32.0-0.4.el8
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1965337 (view as bug list)
Environment:
Last Closed: 2021-11-09 19:30:32 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nmconnection files (1.67 KB, text/plain)
2021-05-12 17:45 UTC, nshidlin
no flags Details
nmstateconfig (985 bytes, text/plain)
2021-05-12 17:46 UTC, nshidlin
no flags Details
cluster CRs (985 bytes, text/plain)
2021-05-12 17:48 UTC, nshidlin
no flags Details
Boot logs rhcos 48.84.202105062123-0 (101.28 KB, text/plain)
2021-05-18 05:14 UTC, nshidlin
no flags Details
Boot logs rhcos 47.83.202103251640-0 (99.54 KB, text/plain)
2021-05-19 07:22 UTC, nshidlin
no flags Details
Reproducer for QE (2.25 KB, application/x-shellscript)
2021-06-14 14:56 UTC, Beniamino Galvani
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:4361 0 None None None 2021-11-09 19:31:12 UTC

Description nshidlin 2021-05-12 17:45:17 UTC
Created attachment 1782500 [details]
nmconnection files

Created attachment 1782500 [details]
nmconnection files

Created attachment 1782500 [details]
nmconnection files

Versions:
OCP image: quay.io/openshift-release-dev/ocp-release:4.8.0-fc.3-x86_64
rhcos image: "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/4.8.0-fc.3/rhcos-4.8.0-fc.3-x86_64-live.x86_64.iso

Description of problem:
When configuring SNO to use bonded interface using nmstateconfig the bond interface is created but is in a DOWN state. The slave interfaces revive IPs from dhcp. 

Steps to Reproduce:
1. Create nmstateconfig to configure bonded interface (attached)
2. Create clusterdeployment, agentclusterinstall ,infraenv and bmh (attached)
3. Wait for host to to be booted by bmh

Actual results:
bond interface is down and the slave interfaces receive IPs from dhcp:

[core@sno-bond ~]$ ip a
2: enp0s4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000                                                                       
    link/ether 52:54:00:47:ed:62 brd ff:ff:ff:ff:ff:ff
    inet 192.168.123.97/24 brd 192.168.123.255 scope global dynamic noprefixroute enp0s4                                                                                    
       valid_lft 3416sec preferred_lft 3416sec
    inet6 fe80::5054:ff:fe47:ed62/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000                                                                       
    link/ether 52:54:00:f8:17:7d brd ff:ff:ff:ff:ff:ff
    inet 192.168.123.132/24 brd 192.168.123.255 scope global dynamic noprefixroute enp1s0                                                                                   
       valid_lft 3416sec preferred_lft 3416sec
    inet6 fe80::5054:ff:fef8:177d/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
4: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000                                                              
    link/ether d2:ad:14:b2:74:b0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.123.151/24 brd 192.168.123.255 scope global noprefixroute bond0
       valid_lft forever preferred_lft forever

Both slave interfaces are the same wired connection:
[core@sno-bond ~]$ nmcli con
NAME              UUID                                  TYPE      DEVICE 
Wired Connection  f45d33f2-325d-41b4-9fcc-d87bc333c417  ethernet  enp0s4 
Wired Connection  f45d33f2-325d-41b4-9fcc-d87bc333c417  ethernet  enp1s0 
bond0             2a1a44cc-3dc8-46c1-b495-5d3a2d3c002a  bond      bond0  
bond0             95484701-597e-4e02-a96f-b0e01c990f9d  ethernet  --     
enp0s4            d8803372-facf-42f1-9bb6-4af2c28e0666  ethernet  --       
enp1s0            608ae7c6-1ef9-47b3-9794-02fe0fbb4e7d  ethernet  --    

Bond interface has no slaves:
/sys/class/net/bond0/bonding/slaves is empty 


Expected results:
The 2 slaves should be bound into bond0 and bond0 should have the MAC address of the active slave


Additional info:
nmconnection files attached

Comment 1 nshidlin 2021-05-12 17:46:18 UTC
Created attachment 1782501 [details]
nmstateconfig

Comment 2 nshidlin 2021-05-12 17:48:33 UTC
Created attachment 1782502 [details]
cluster CRs

Comment 4 Fernando F. Mancera 2021-05-13 16:41:17 UTC
Hi! I am looking into this. As this is happening with 4.8 but not with 4.7.7 it seems it is a regression from Nmstate or NM. I am trying to reproduce this locally with the nmconnection files, thank you!

Comment 5 nshidlin 2021-05-13 16:46:17 UTC
(In reply to Fernando F. Mancera from comment #4)
> Hi! I am looking into this. As this is happening with 4.8 but not with 4.7.7
> it seems it is a regression from Nmstate or NM. I am trying to reproduce
> this locally with the nmconnection files, thank you!
assited-service pins nmstate to 1.0.2 https://github.com/openshift/assisted-service/blob/master/Dockerfile.assisted-service#L55

Comment 6 Alexander Chuzhoy 2021-05-14 01:18:40 UTC
Reproduced on 48.84.202105112120-0

Comment 7 Fernando F. Mancera 2021-05-17 08:20:53 UTC
(In reply to nshidlin from comment #5)
> (In reply to Fernando F. Mancera from comment #4)
> > Hi! I am looking into this. As this is happening with 4.8 but not with 4.7.7
> > it seems it is a regression from Nmstate or NM. I am trying to reproduce
> > this locally with the nmconnection files, thank you!
> assited-service pins nmstate to 1.0.2
> https://github.com/openshift/assisted-service/blob/master/Dockerfile.
> assisted-service#L55

Hello!

I've been able to reproduce this issue using the nmconnection files and NetworkManager-1.30. But I've not been able to reproduce this with NetworkManager 1.26. Thomas, Beniamino could you take a look to this? Thank you!

Comment 8 Beniamino Galvani 2021-05-17 09:42:31 UTC
The bond port connections created by nmstate are not activated because
there is already a connection ("Wired Connection") active on those
ethernet devices. The active connection was probably created by
dracut.

Do you have a log of the boot (if possible, captured with "rd.debug"
in the kernel command line and level=TRACE in the [logging] section
of /etc/NetworkManager/NetworkManager.conf)?

Comment 9 nshidlin 2021-05-18 05:14:50 UTC
Created attachment 1784311 [details]
Boot logs rhcos 48.84.202105062123-0

Comment 10 nshidlin 2021-05-19 07:22:30 UTC
Created attachment 1784684 [details]
Boot logs rhcos 47.83.202103251640-0

Comment 11 nshidlin 2021-05-19 07:24:35 UTC
@bgalvani (In reply to Beniamino Galvani from comment #8)
> The bond port connections created by nmstate are not activated because
> there is already a connection ("Wired Connection") active on those
> ethernet devices. The active connection was probably created by
> dracut.
> 
> Do you have a log of the boot (if possible, captured with "rd.debug"
> in the kernel command line and level=TRACE in the [logging] section
> of /etc/NetworkManager/NetworkManager.conf)?

I attached boot logs for both the 4.8 and 4.7 envs

Comment 12 Beniamino Galvani 2021-05-19 08:10:02 UTC
The difference I see between 4.7 and 4.8 log is that in the latter
there is a "Wired Connection" created from the "ip=dhcp,dhcp6" kernel
command line. This connection preempts the "enp0s4" and "enp1s0"
connections that are pre-deployed.

Having those conflicting connections in 4.8 looks like a configuration
error. I don't know why the "Wired Connection" is not present in 4.7;
that doesn't depend on NetworkManager.

I think this needs more investigation from the CoreOS team.

Comment 13 Luca BRUNO 2021-05-19 09:59:31 UTC
Noteworthy for the NM stack here, 4.7 image contains RHEL 8.3 content while the 4.8 one contains RHEL 8.4 content.

Comparing the 4.7 and 4.8 logs, it looks like both were booting with the same `ip=dhcp,dhcp6` kargs:

```
# 4.7
[    5.861476] dracut-cmdline[453]: dracut-47.83.202103251640-0 dracut-049-95.git20200804.el8_3.4
[    5.862178] dracut-cmdline[453]: Using kernel command line parameters: ip=dhcp,dhcp6 rd.driver.pre=dm_multipath BOOT_IMAGE=/images/pxeboot/vmlinuz random.trust_cpu=on ignition.firstboot ignition.platform.id=metal coreos.live.rootfs_url=https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-live-rootfs.x86_64.img

# 4.8
[    4.984415] dracut-cmdline[445]: dracut-48.84.202105062123-0 dracut-049-135.git20210121.el8
[    4.987850] dracut-cmdline[445]: Using kernel command line parameters: ip=dhcp,dhcp6 rd.driver.pre=dm_multipath BOOT_IMAGE=/images/pxeboot/vmlinuz random.trust_cpu=on ignition.firstboot ignition.platform.id=metal coreos.live.rootfs_url=http://assisted-service-assisted-installer.apps.ocp-edge-cluster-assisted-0.qe.lab.redhat.com/api/assisted-install/v1/boot-files?file_type=rootfs.img&openshift_version=4.8
```

So it could be that some behavior may have actually changed in either nm-initrd-generator or in NM running in the initrd.

Digging deeper, I see that:
 * in 4.7 logs, NM seems to immediately enslave the two interfaces to bond0 after activating the connections, skipping the default DHCP ("Wired Connection") for the enslaved interfaces.
 * in 4.8 logs, NM seems to first activate all connections, including the two additional default DHCP "Wired Connection". At that point, it does not proceed to enslave the two interfaces anymore.

Relevant snippets:
```
# 4.7 
[    9.683827] NetworkManager[888]: <info>  [1621408036.1282] policy: auto-activating connection 'enp0s4' (f53b244f-d62c-49b9-afb8-1543616e8240)
[    9.687255] NetworkManager[888]: <info>  [1621408036.1286] policy: auto-activating connection 'enp1s0' (e71b6646-9872-4beb-bf32-f70da6a5b385)
[    9.689757] NetworkManager[888]: <info>  [1621408036.1288] policy: auto-activating connection 'bond0' (cf666337-9b36-4b82-9970-8558d32b2dfa)
[    9.690187] bond0: (slave enp0s4): making interface the new active one
[    9.692116] NetworkManager[888]: <info>  [1621408036.1291] device (enp0s4): Activation: starting connection 'enp0s4' (f53b244f-d62c-49b9-afb8-1543616e8240)
[    9.693365] bond0: (slave enp0s4): Enslaving as an active interface with an up link
[    9.695434] NetworkManager[888]: <info>  [1621408036.1291] device (enp1s0): Activation: starting connection 'enp1s0' (e71b6646-9872-4beb-bf32-f70da6a5b385)
[    9.698883] NetworkManager[888]: <info>  [1621408036.1293] device (bond0): Activation: starting connection 'bond0' (cf666337-9b36-4b82-9970-8558d32b2dfa)
[    9.699295] bond0: (slave enp1s0): Enslaving as a backup interface with an up link

# 4.8
[    8.528237] NetworkManager[895]: <info>  [1621312978.4840] policy: auto-activating connection 'enp0s4' (2c518c81-b725-42dc-991e-b64b426cf5ac)
[    8.530436] NetworkManager[895]: <info>  [1621312978.4843] policy: auto-activating connection 'enp1s0' (43715223-16e0-4180-a6b7-e15fa4700de1)
[    8.532866] NetworkManager[895]: <info>  [1621312978.4845] policy: auto-activating connection 'bond0' (79a0e750-a98f-4140-9eee-25cc62d7ab23)
[    8.535505] NetworkManager[895]: <info>  [1621312978.4849] device (bond0): Activation: starting connection 'bond0' (79a0e750-a98f-4140-9eee-25cc62d7ab23)
[    8.538202] NetworkManager[895]: <info>  [1621312978.4850] policy: auto-activating connection 'Wired Connection' (7305acd5-80fa-4eb9-84da-d7278adaae41)
[    8.540525] NetworkManager[895]: <info>  [1621312978.4852] policy: auto-activating connection 'Wired Connection' (7305acd5-80fa-4eb9-84da-d7278adaae41)
[    8.542661] NetworkManager[895]: <info>  [1621312978.4854] device (enp0s4): Activation: starting connection 'enp0s4' (2c518c81-b725-42dc-991e-b64b426cf5ac)
[    8.545163] NetworkManager[895]: <info>  [1621312978.4855] device (enp1s0): Activation: starting connection 'enp1s0' (43715223-16e0-4180-a6b7-e15fa4700de1)
```

Comment 14 Beniamino Galvani 2021-05-19 13:24:32 UTC
> So it could be that some behavior may have actually changed in either nm-initrd-generator or in NM running in the initrd.

You are right. It wasn't clear from the logs at info level that
connection profiles were the same. I did a git bisect and this change
added in 1.29.10 is the culprit:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/33b9fa3a3cafa2eddea3bd1774847ce9424f921d

It is fixed in git by:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/e694f2cec1a0e7bc188776c8573e07a4d57851dc

Comment 15 Beniamino Galvani 2021-05-19 13:26:15 UTC
If I prepare a scratch build with the fix, would somebody be able to test it?

Comment 16 nshidlin 2021-05-19 13:37:29 UTC
(In reply to Beniamino Galvani from comment #15)
> If I prepare a scratch build with the fix, would somebody be able to test it?

Scratch build of NetworkManager?

Comment 17 Beniamino Galvani 2021-05-19 13:38:34 UTC
Yes.

Comment 18 nshidlin 2021-05-19 13:59:22 UTC
(In reply to Beniamino Galvani from comment #17)
> Yes.

Yes, but I will need some guidance on how to install it in rhcos

Comment 19 Dusty Mabe 2021-05-19 14:26:27 UTC
While there may be a legitimate bug in NM that Beniamino has fixed upstream, I think it's piggybacking on top of another (more fundamental) bug.

In your logs we see that dracut only sees network kernel arguments of `ip=dhcp,dhcp6`:

```
[    4.987850] dracut-cmdline[445]: Using kernel command line parameters: ip=dhcp,dhcp6 rd.driver.pre=dm_multipath BOOT_IMAGE=/images/pxeboot/vmlinuz random.trust_cpu=on ignition.firstboot ignition.platform.id=metal coreos.live.rootfs_url=http://assisted-service-assisted-installer.apps.ocp-edge-cluster-assisted-0.qe.lab.redhat.com/api/assisted-install/v1/boot-files?file_type=rootfs.img&openshift_version=4.8
```

But somehow `nm-initrd-generator` (or something else) created 4 nmconnection files:

```
[   20.671708] coreos-teardown-initramfs[1212]: info: propagating initramfs networking config to the real root
[   20.673687] coreos-teardown-initramfs[1212]: '/run/NetworkManager/system-connections/bond0.nmconnection' -> '/sysroot/etc/NetworkManager/system-connections/bond0.nmconnection'
[   20.675617] coreos-teardown-initramfs[1212]: '/run/NetworkManager/system-connections/default_connection.nmconnection' -> '/sysroot/etc/NetworkManager/system-connections/default_connection.nmconnection'
[   20.678389] coreos-teardown-initramfs[1212]: '/run/NetworkManager/system-connections/enp0s4.nmconnection' -> '/sysroot/etc/NetworkManager/system-connections/enp0s4.nmconnection'
[   20.680146] coreos-teardown-initramfs[1212]: '/run/NetworkManager/system-connections/enp1s0.nmconnection' -> '/sysroot/etc/NetworkManager/system-connections/enp1s0.nmconnection'
```

This bug mentions nmstate, but I don't understand how that comes into play here (I need to do some learning), especially this early in the boot.

Can someone tell me what all is the assisted installer doing related to network configuration?

Comment 20 nshidlin 2021-05-19 14:34:49 UTC
(In reply to Dusty Mabe from comment #19)
> While there may be a legitimate bug in NM that Beniamino has fixed upstream,
> I think it's piggybacking on top of another (more fundamental) bug.
> 
> In your logs we see that dracut only sees network kernel arguments of
> `ip=dhcp,dhcp6`:
> 
> ```
> [    4.987850] dracut-cmdline[445]: Using kernel command line parameters:
> ip=dhcp,dhcp6 rd.driver.pre=dm_multipath BOOT_IMAGE=/images/pxeboot/vmlinuz
> random.trust_cpu=on ignition.firstboot ignition.platform.id=metal
> coreos.live.rootfs_url=http://assisted-service-assisted-installer.apps.ocp-
> edge-cluster-assisted-0.qe.lab.redhat.com/api/assisted-install/v1/boot-
> files?file_type=rootfs.img&openshift_version=4.8
> ```
> 
> But somehow `nm-initrd-generator` (or something else) created 4 nmconnection
> files:
> 
> ```
> [   20.671708] coreos-teardown-initramfs[1212]: info: propagating initramfs
> networking config to the real root
> [   20.673687] coreos-teardown-initramfs[1212]:
> '/run/NetworkManager/system-connections/bond0.nmconnection' ->
> '/sysroot/etc/NetworkManager/system-connections/bond0.nmconnection'
> [   20.675617] coreos-teardown-initramfs[1212]:
> '/run/NetworkManager/system-connections/default_connection.nmconnection' ->
> '/sysroot/etc/NetworkManager/system-connections/default_connection.
> nmconnection'
> [   20.678389] coreos-teardown-initramfs[1212]:
> '/run/NetworkManager/system-connections/enp0s4.nmconnection' ->
> '/sysroot/etc/NetworkManager/system-connections/enp0s4.nmconnection'
> [   20.680146] coreos-teardown-initramfs[1212]:
> '/run/NetworkManager/system-connections/enp1s0.nmconnection' ->
> '/sysroot/etc/NetworkManager/system-connections/enp1s0.nmconnection'
> ```
> 
> This bug mentions nmstate, but I don't understand how that comes into play
> here (I need to do some learning), especially this early in the boot.
> 
> Can someone tell me what all is the assisted installer doing related to
> network configuration?

assisted-installer uses nmstate to generate the nmconnection files and then bakes them into the ISO

Comment 21 Dusty Mabe 2021-05-19 15:02:42 UTC
> assisted-installer uses nmstate to generate the nmconnection files and then bakes them into the ISO

Can you show me the set of steps they use (with command invocations) on how they do that? There might need to be some tweaks.

Comment 22 nshidlin 2021-05-19 15:07:58 UTC
(In reply to Dusty Mabe from comment #21)
> > assisted-installer uses nmstate to generate the nmconnection files and then bakes them into the ISO
> 
> Can you show me the set of steps they use (with command invocations) on how
> they do that? There might need to be some tweaks.

@yshnaidm might be a better source for that

Comment 24 yevgeny shnaidman 2021-05-19 17:16:07 UTC
we are using 'nmstatectl gc' command to produce nmconnections files. This is done in the preparation step, before nodes are booted. The produced nmconnection files are then updated: the correct interfaces names are set in and autoconnect-priority=1 is added to each nmconnection file. After that those files are set into the ignition of the ISO, so that they are present under /etc/NetworkManager/system-connections before NM is brought up. @nshidlin maybe you can also provide the nmstate yaml that you have used for configuring the static network?

Comment 25 nshidlin 2021-05-19 17:19:56 UTC
(In reply to yevgeny shnaidman from comment #24)
> we are using 'nmstatectl gc' command to produce nmconnections files. This is
> done in the preparation step, before nodes are booted. The produced
> nmconnection files are then updated: the correct interfaces names are set in
> and autoconnect-priority=1 is added to each nmconnection file. After that
> those files are set into the ignition of the ISO, so that they are present
> under /etc/NetworkManager/system-connections before NM is brought up.
> @nshidlin maybe you can also provide the nmstate yaml that you
> have used for configuring the static network?

All ready attached

Comment 26 Dusty Mabe 2021-05-19 18:23:36 UTC
(In reply to yevgeny shnaidman from comment #24)

<snip> 

> After that
> those files are set into the ignition of the ISO, so that they are present
> under /etc/NetworkManager/system-connections before NM is brought up.

Can you show me the code that does this part?

Comment 27 yevgeny shnaidman 2021-05-19 21:08:55 UTC
(In reply to Dusty Mabe from comment #26)
> (In reply to yevgeny shnaidman from comment #24)
> 
> <snip> 
> 
> > After that
> > those files are set into the ignition of the ISO, so that they are present
> > under /etc/NetworkManager/system-connections before NM is brought up.
> 
> Can you show me the code that does this part?

This is the code that sets the generated nmconnections into the ignition:
https://github.com/openshift/assisted-service/blob/a714f40bf720c30d8e491bdf54e847fb468c0654/internal/ignition/ignition.go#L1281

the ignition format itself is defined here:
https://github.com/openshift/assisted-service/blob/a714f40bf720c30d8e491bdf54e847fb468c0654/internal/ignition/ignition.go#L128

Comment 28 Dusty Mabe 2021-05-20 00:14:41 UTC
I don't think anything going on in Ignition is the problem. Can you tell me when this code runs? 

https://github.com/openshift/assisted-service/blob/a714f40bf720c30d8e491bdf54e847fb468c0654/internal/constants/scripts.go#L54
https://github.com/openshift/assisted-service/blob/a714f40bf720c30d8e491bdf54e847fb468c0654/internal/constants/scripts.go#L139-L156

If that's running in the initrd on bringup and copying files into `/run/NetworkManager/system-connections` then it's going to cause problems for the logic that we've already baked in.

Comment 29 yevgeny shnaidman 2021-05-20 11:48:38 UTC
(In reply to Dusty Mabe from comment #28)
> I don't think anything going on in Ignition is the problem. Can you tell me
> when this code runs? 
> 
> https://github.com/openshift/assisted-service/blob/
> a714f40bf720c30d8e491bdf54e847fb468c0654/internal/constants/scripts.go#L54
> https://github.com/openshift/assisted-service/blob/
> a714f40bf720c30d8e491bdf54e847fb468c0654/internal/constants/scripts.go#L139-
> L156
> 
> If that's running in the initrd on bringup and copying files into
> `/run/NetworkManager/system-connections` then it's going to cause problems
> for the logic that we've already baked in.

yes, this is the scripts that runs in the initrd. it's purpose is to set the correct network interfaces for the current host and copy nmconnection files to the NM runtime dir. We need it in case of running with minimal ISO, since we need to configure the network in order to download the real rootfs.
how does it causes the problem? AFAIK this scripts is executed before NM

Comment 30 Dusty Mabe 2021-05-20 15:54:53 UTC
(In reply to yevgeny shnaidman from comment #29)

> 
> yes, this is the scripts that runs in the initrd. it's purpose is to set the
> correct network interfaces for the current host and copy nmconnection files
> to the NM runtime dir. We need it in case of running with minimal ISO, since
> we need to configure the network in order to download the real rootfs.
> how does it causes the problem? AFAIK this scripts is executed before NM


It causes the prolem because it doesn't consider the rest of what happens when
networking is requested in the initrd. `nm-initrd-generator` gets run by the 
`network-manager` dracut module [1] during `dracut-cmdline.service`. It creates
the default.nmconnection file, which is the extra file that you don't want.

In the future maybe we (RHCOS) can better support your needs here by allowing you
to copy files in the initrd into a specific directory that we can then pick up
and shepherd the rest of the way into initrd and real root networking configuration.
I opened a ticket to discuss that: https://github.com/coreos/fedora-coreos-tracker/issues/841


For now, the assisted installer service will need to do something like we do for the
coreos-copy-firstboot-network.service. We explicitly run after `dracut-cmdline.service`
and before `dracut-initqueue.service`:

https://github.com/coreos/fedora-coreos-config/blob/08e72dfc25ac3b2d95a8b4fdf8e90ff1de383f71/overlay.d/05core/usr/lib/dracut/modules.d/35coreos-network/coreos-copy-firstboot-network.service#L35-L36

and we delete the files that were created by `nm-initrd-generator`: 

https://github.com/coreos/fedora-coreos-config/blob/08e72dfc25ac3b2d95a8b4fdf8e90ff1de383f71/overlay.d/05core/usr/lib/dracut/modules.d/35coreos-network/coreos-copy-firstboot-network.sh#L21-L23


The assisted installer will need to do the same.

Comment 32 yevgeny shnaidman 2021-05-20 16:01:45 UTC
(In reply to Dusty Mabe from comment #30)
> (In reply to yevgeny shnaidman from comment #29)
> 
> > 
> > yes, this is the scripts that runs in the initrd. it's purpose is to set the
> > correct network interfaces for the current host and copy nmconnection files
> > to the NM runtime dir. We need it in case of running with minimal ISO, since
> > we need to configure the network in order to download the real rootfs.
> > how does it causes the problem? AFAIK this scripts is executed before NM
> 
> 
> It causes the prolem because it doesn't consider the rest of what happens
> when
> networking is requested in the initrd. `nm-initrd-generator` gets run by the 
> `network-manager` dracut module [1] during `dracut-cmdline.service`. It
> creates
> the default.nmconnection file, which is the extra file that you don't want.
> 
> In the future maybe we (RHCOS) can better support your needs here by
> allowing you
> to copy files in the initrd into a specific directory that we can then pick
> up
> and shepherd the rest of the way into initrd and real root networking
> configuration.
> I opened a ticket to discuss that:
> https://github.com/coreos/fedora-coreos-tracker/issues/841
> 
> 
> For now, the assisted installer service will need to do something like we do
> for the
> coreos-copy-firstboot-network.service. We explicitly run after
> `dracut-cmdline.service`
> and before `dracut-initqueue.service`:
> 
> https://github.com/coreos/fedora-coreos-config/blob/
> 08e72dfc25ac3b2d95a8b4fdf8e90ff1de383f71/overlay.d/05core/usr/lib/dracut/
> modules.d/35coreos-network/coreos-copy-firstboot-network.service#L35-L36
> 
> and we delete the files that were created by `nm-initrd-generator`: 
> 
> https://github.com/coreos/fedora-coreos-config/blob/
> 08e72dfc25ac3b2d95a8b4fdf8e90ff1de383f71/overlay.d/05core/usr/lib/dracut/
> modules.d/35coreos-network/coreos-copy-firstboot-network.sh#L21-L23
> 
> 
> The assisted installer will need to do the same.

@dustymabe we are especially setting autoconnect-priority property so that NM will ignore the default.nmconnection file for the interfaces that we specify. Or has that implementation been changed?

Comment 33 Dusty Mabe 2021-05-20 16:26:58 UTC
(In reply to yevgeny shnaidman from comment #32)

> 
> @dustymabe we are especially setting autoconnect-priority
> property so that NM will ignore the default.nmconnection file for the
> interfaces that we specify. Or has that implementation been changed?

I think that might be what the NM fix is about the Beniamino mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1959961#c14

Wouldn't it would be more appropriate to not have the default.nmconnection at all?

Comment 34 yevgeny shnaidman 2021-05-20 17:48:54 UTC
(In reply to Dusty Mabe from comment #33)
> (In reply to yevgeny shnaidman from comment #32)
> 
> > 
> > @dustymabe we are especially setting autoconnect-priority
> > property so that NM will ignore the default.nmconnection file for the
> > interfaces that we specify. Or has that implementation been changed?
> 
> I think that might be what the NM fix is about the Beniamino mentioned in
> https://bugzilla.redhat.com/show_bug.cgi?id=1959961#c14
> 
> Wouldn't it would be more appropriate to not have the default.nmconnection
> at all?

Actually , since we have a proper mechanism to prioritize our connections over default, it is easier just to set the priority and not to worry whether default connection was created and to schedule a script to delete it. Also, it allows us/user to pick specific interfaces to handle and let the default configuration take care of the rest

Comment 36 nshidlin 2021-05-25 05:14:25 UTC
@yevgensm @alazar see https://bugzilla.redhat.com/show_bug.cgi?id=1959961#c35

Comment 37 Ronnie Lazar 2021-05-25 13:18:17 UTC
@

Comment 38 Ronnie Lazar 2021-05-25 13:21:28 UTC
@dustymabe when can we get expect this patch to merged.
Is it possible to have this released as part of 8.4.0?

Comment 39 Dusty Mabe 2021-05-25 13:31:21 UTC
We'll have to find out that information from the NetworkManager team.

Beniamino, do you know?

Comment 48 Beniamino Galvani 2021-06-14 14:56:41 UTC
Created attachment 1791019 [details]
Reproducer for QE

Comment 52 errata-xmlrpc 2021-11-09 19:30:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: NetworkManager security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4361


Note You need to log in before you can comment on or make changes to this bug.