Bug 2089707

Summary: RFE: nm-initrd-generator profiles to have lower than default autoconnect-priority and ability to clean state generated by NM when run by dracut
Product: Red Hat Enterprise Linux 8 Reporter: Jaime Caamaño Ruiz <jcaamano>
Component: NetworkManagerAssignee: Thomas Haller <thaller>
Status: CLOSED ERRATA QA Contact: Filip Pokryvka <fpokryvk>
Severity: unspecified Docs Contact: Mayur Patil <maypatil>
Priority: unspecified    
Version: 8.6CC: bgalvani, bnemec, dustymabe, lrintel, maypatil, pamoedom, rkhan, sfaye, sukulkar, thaller, till, vbenes, yprokule
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: NetworkManager-1.40.2-1.el8 Doc Type: Enhancement
Doc Text:
.The `nm-initrd-generator` profiles now have lower priority than autoconnect profiles The `nm-initrd-generator` early boot NetworkManager configuration generator utility generates and configures connection profiles by using the NetworkManager instance running in the boot loader's initialized `initrd` RAM disk. The `nm-initrd-generator` utility generated profiles now have a lower autoconnect priority than the default connection autoconnect priority. This enables generated network profiles in `initrd` to coexist with user configuration in default root account. [NOTE] ===== After switching from `initrd` root account to default root, the same profile stays activated and no new autoconnect happens. =====
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-16 09:04:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2130221    

Description Jaime Caamaño Ruiz 2022-05-24 09:49:47 UTC
This is a request for a NM enhancement so that nm-initrd-generator profiles are defined with an auto-connect priority lower than the default of 0. 

Currently other NM generated profiles have an auto-connect priority of -999.

This is for an openshift baremetal use case.

With openshift baremetal, nodes boot with 'ip=dhcp' karg. It is generic because it is completely unknown which are the node interfaces on the image build stage.

Then at a later stage, the user or other components are given the option to deploy and activate their own NM profiles, frequently to enable specific use cases like bonding. They usually use the default auto-connect priority of 0. This works just fine.

But this setup frequently leads to problematic scenarios when the network is operated with administratively at a later runtime stage where the nm-initrd-generator profiles randomly might take precedence over the other deployed profiles.

While it would be wise for those third party profiles to use a higher auto-connect priority, we think it would make sense as well for the nm-initrd-generator profiles to use a lower auto-connect priority than the default, even if at boot stage they are explicitly activated, in consideration of other administrative operations that might take place later at runtime.

Comment 2 Thomas Haller 2022-05-24 10:11:43 UTC
> Currently other NM generated profiles have an auto-connect priority of -999.

What are those other profiles? Which tool/component is creating them?

Comment 3 Jaime Caamaño Ruiz 2022-05-24 11:52:30 UTC
(In reply to Thomas Haller from comment #2)
> > Currently other NM generated profiles have an auto-connect priority of -999.
> 
> What are those other profiles? Which tool/component is creating them?

I might have phrased this incorrectly. What I meant is that NM already generates dhcp profiles with -999, for example if I just run a fedora VM I get:

❯ nmcli -f all c show 
NAME                UUID                                  TYPE      TIMESTAMP   TIMESTAMP-REAL                   AUTOCONNECT  AUTOCONNECT-PRIORITY ...
Wired connection 1  315cbcb3-2e46-3b98-92b7-64dd596af63b  ethernet  1653392712  Tue 24 May 2022 11:45:12 AM UTC  yes          -999

We would like something similar for nm-initrd-generator generated profiles, not precisely -999, but at least something lower than default 0.

Comment 4 Beniamino Galvani 2022-06-30 07:35:51 UTC
Note that generating initrd profiles with a lower priority is not enough to have persistent ones active in real root; NM from initrd saves the current state to /run, and NM from real root reads it to activate the same profiles that were active in initrd (also see https://bugzilla.redhat.com/show_bug.cgi?id=2100181#c11 and later comments).

So, we need a mechanism to prevent the propagation of the state from the two NM runs. It could be either:
 - a service that runs before NM and deletes the state files in /run
 - a new parameter in NM conf to ignore the state (but only at the first run in real root, otherwise that would break service restarts)
 - a profile property that allows it to activate immediately when the profile is loaded, even if there is another profile active (with lower priority)

We already have configuration options "keep-configuration=no" and "allowed-connections=except:origin:nm-initrd-generator", but those doesn't exactly cover this use case because they prevent the initrd-generated profiles from activating in real root.

Comment 5 Jaime Caamaño Ruiz 2022-06-30 08:37:17 UTC
Thanks for adding this Beniamino. The lower auto-connect priority does help in other scenarios. Right now we are requiring customers to deploy their keyfiles with higher auto-connect priority than default, while it would probably make sense for the generated ones to have lower priority than default.

So the RFE asks for two things then:

- ability for nm-initrd-generator to generate profiles with lower auto-connect priority than default.
- ability for NM to clean up NM state generated by dracut NM run

Comment 6 Thomas Haller 2022-09-12 10:02:47 UTC
(In reply to Beniamino Galvani from comment #4)

> - a service that runs before NM and deletes the state files in /run
> - a new parameter in NM conf to ignore the state (but only at the first run in real root, otherwise that would break service restarts)

These two seem the same thing (having a similar result).

With those you probably also would set "keep-configuration=no", otherwise you likely get an external connection if the interface is already configured.

Ignoring information from the state file seems a sensible feature (via an option in NetworkManager.conf, where initrd could drop a snippet).

In any case, there is already the workaround of just deleting the files before starting NetworkManager.

>  - a profile property that allows it to activate immediately when the
> profile is loaded, even if there is another profile active (with lower
> priority)

Currently autoconnect only happens on devices that are currently disconnected.
This would be a new feature, where a forced-autoconnecting profile could replace
an already activate profile. That seems quite drastic.

In any case, it doesn't seem requested from this RFE.

> We already have configuration options "keep-configuration=no" and "allowed-connections=except:origin:nm-initrd-generator", but those doesn't exactly cover this use case because they prevent the initrd-generated profiles from activating in real root.

Right. But your previous points also don't seem to directly fix the reports use-case, do they?



In general, it makes sense to me that generated profiles would have a lower autoconnect-priority.
Since autoconnect does not replace an already active profilie, and since the state-file in /run rememebers the previously activated profile, it means that in most cases (e.g. after switch-root) the autoconnect mechanism doesn't hit and the priority doesn't matter.

I could imagine very constructed scenarios where this might actually matter, and then I guess, it makes sense that user provided profiles have (by default) a higher autoconnect priority.




(In reply to Jaime Caamaño Ruiz from comment #5)
> Thanks for adding this Beniamino. The lower auto-connect priority does help
> in other scenarios. Right now we are requiring customers to deploy their
> keyfiles with higher auto-connect priority than default, while it would
> probably make sense for the generated ones to have lower priority than
> default.
> 
> So the RFE asks for two things then:
> 
> - ability for nm-initrd-generator to generate profiles with lower
> auto-connect priority than default.
> - ability for NM to clean up NM state generated by dracut NM run


This makes sense to me. Note that if you have a profile activated in initrd, most of the time it just stays activated. The user would have to manually activate another profile (or to unplug the cable), to even come in the situation where autoconnect matters.

My point is, this change might not be very relevant.

Still, I agree that a lower priority makes probably more sense in the few cases I can imagine.

WIP here: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1376

Comment 7 Thomas Haller 2022-09-15 16:26:46 UTC
this request is fixed in 1.41.2+ with commit https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/98575bd5138afdb61a3837a2a73436eb05490f4a

There was discussion, whether NetworkManager could do anything beyond this, to help that the right profiles take over (in particular after restart of switch-root from initrd). I guess, something could be done, but it's not very clear to me. In particular the "keep-configuration" setting and the ability to remove /run/NetworkManager/devices file, seems already cover most usecases. If something else is missing, please report a new bug and explain the problem in detail.

Comment 11 Filip Pokryvka 2022-11-02 12:48:33 UTC
Covered in unit tests of initrd generator.

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/28ab5356177a8413561619ad48952795befc642c

Comment 12 Dusty Mabe 2022-11-02 20:37:31 UTC
The bug description explicitly mentions OpenShift so I'll chime in here.

Fedora CoreOS and Red Hat CoreOS are a bit of a special case when it comes to networking configuration. On the Ignition boot of a machine RHCOS will try to propagate any initramfs networking into the real root of the machine (i.e. make it persistent state). It also will attempt to completely bring down networking on the transition from the initrd to the real root (to simulate subsequent boots). The code that handles this transition lives here:

https://github.com/coreos/fedora-coreos-config/blob/61b9d774fb5cbdd5c76aed6ce47bc12e7d74c04d/overlay.d/05core/usr/lib/dracut/modules.d/35coreos-ignition/coreos-teardown-initramfs.sh#L212-L234


@jcaamano - all of that is to say that some assumptions may need to be adjusted based on the above. Ultimately the best thing to do is to not have conflicting configurations live on the machine if you can.

Comment 13 Jaime Caamaño Ruiz 2022-11-03 11:16:15 UTC
(In reply to Dusty Mabe from comment #12)
> @jcaamano - all of that is to say that some assumptions may need
> to be adjusted based on the above. Ultimately the best thing to do is to not
> have conflicting configurations live on the machine if you can.

I understand this and agree with it. But coreos-teardown-initramfs only happens on first boot if I am not mistaken the scenarios we have had to deal with did not involve providing any network configuration for that first boot:

* We have seen the network implicitly activated with initrd on normal boots in ways that were difficult to predict (bugs but also modules that for some reason thought that they needed the network, probably buggy as well but who knows). The moment this happens, network configuration might be generated that, even if it is completely unused or unneeded, will interfere with real root configuration.

* There are users that explicitly need the network on boot, but would like to have a simple configuration just for that, and then have a more complex network configuration on the real root. These users want this because they find it complicated to come up with the final network configuration by the time they would have to provide it in their workflow or pipeline to be made available for that first boot.

So, all in all, it made sense to provide an option to allow the user decide how to transition from initrd to real root network config, perhaps an option to make the real root NM fully re-activate the network on startup.

In the end, this RFE has only changed the priority of the initrd generated profiles, which makes sense I think. In OpenShift, we are handling the network re-activation downstream specifically.

I appreciate any extra input you can provide though.

Comment 14 Jaime Caamaño Ruiz 2022-11-03 11:25:08 UTC
Another option would probably have been for NM to consider and use real root network configuration as well if available when activated in initrd.

Comment 15 Dusty Mabe 2022-11-03 18:57:40 UTC
(In reply to Jaime Caamaño Ruiz from comment #13)
> 
> * We have seen the network implicitly activated with initrd on normal boots
> in ways that were difficult to predict (bugs but also modules that for some
> reason thought that they needed the network, probably buggy as well but who
> knows). The moment this happens, network configuration might be generated
> that, even if it is completely unused or unneeded, will interfere with real
> root configuration.

Interesting. Usually it behaves predictably. i.e. if you don't have
rd.neednet=1 on the kernel command line and you don't have your rootfs
on network storage then it won't try to bring up network.

> 
> * There are users that explicitly need the network on boot, but would like
> to have a simple configuration just for that, and then have a more complex
> network configuration on the real root. These users want this because they
> find it complicated to come up with the final network configuration by the
> time they would have to provide it in their workflow or pipeline to be made
> available for that first boot.

For the first boot (Ignition boot) of a machine it can make sense to have a
simple Networking configuration in kernel arguments (to retrieve the Ignition
config) and then a more complex configuration defined in the real root.

Beyond first boot (Ignition boot) I think the value of having them split diminishes.
If it were me and I needed networking on every boot (one example is clevis/tang
encryption) I'd try hard to define everything in kernel arguments and not have
the configuration split.

> 
> So, all in all, it made sense to provide an option to allow the user decide
> how to transition from initrd to real root network config, perhaps an option
> to make the real root NM fully re-activate the network on startup.
> 
> In the end, this RFE has only changed the priority of the initrd generated
> profiles, which makes sense I think. In OpenShift, we are handling the
> network re-activation downstream specifically.

Right. I'm hoping it's harmless enough. Time will tell.

Comment 25 errata-xmlrpc 2023-05-16 09:04:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2968