Bug 1895979 - Unable to get coreos-installer with --copy-network to work
Summary: Unable to get coreos-installer with --copy-network to work
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.7.0
Assignee: Jonathan Lebon
QA Contact: Michael Nguyen
Depends On:
Blocks: 1899286
TreeView+ depends on / blocked
Reported: 2020-11-09 15:12 UTC by Jonas Nordell
Modified: 2021-03-17 21:50 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Network-related service units were not strictly ordered correctly. Consequence: Sometimes, network configurations copied using `-copy-network` did not take effect on the first reboot into the installed system. Fix: The ordering of the relevant service units has been fixed. Result: Network configurations copied using `--copy-network` now always take effect on the first reboot into the installed system.
Clone Of:
: 1899286 (view as bug list)
Last Closed: 2021-02-24 15:31:28 UTC
Target Upstream Version:

Attachments (Terms of Use)
screenshot picture2.png (7.85 KB, image/png)
2020-11-09 15:12 UTC, Jonas Nordell
no flags Details
screenshot picture3.png (6.92 KB, image/png)
2020-11-09 15:12 UTC, Jonas Nordell
no flags Details

System ID Private Priority Status Summary Last Updated
Github coreos fedora-coreos-config pull 733 0 None closed coreos-copy-firstboot-network: order after coreos-enable-network 2021-02-15 14:16:17 UTC
Red Hat Knowledge Base (Solution) 5572141 0 None None None 2020-11-13 15:16:47 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:31:58 UTC

Description Jonas Nordell 2020-11-09 15:12:00 UTC
Created attachment 1727859 [details]
screenshot picture2.png

Description of problem:

In the documentation [1] it states that changes made with nmcli and/or nmtui in the Live ISO environment can be persisted with the use of --copy-network together with coreos-installer.

But when I try this nothing is persisted and after the first reboot the network configuration does not contain any of my customization. 

[1] https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal/installing-bare-metal-network-customizations.html#installation-user-infra-machines-advanced_network_installing-bare-metal-network-customizations

Version-Release number of selected component (if applicable):


How reproducible:
Every time

Steps to Reproduce:
1. Load Live ISO image
2. Change Network settings with nmcli
   - sudo nmcli con mod "Wired Connection" ipv4.addresses
   - sudo nmcli con mod "Wired Connection" ipv4.gateway
   - sudo nmcli con mod "Wired Connection" ipv4.dns 
3. Verify NetworkManager configuration, se attached screenshoot picture2.png
4. Run coreos-installer, see attached screenshoot picture3.png
5. reboot
6. Verify NetworkManager configuration

Actual results:

[core@localhost ~]$ sudo cat /etc/NetworkManager/system-connections/default_connection.nmconnection 
id=Wired Connection





Expected results:

/etc/NetworkManager/system-connections/default_connection.nmconnection should look like before the reboot.

Additional info:

Comment 1 Jonas Nordell 2020-11-09 15:12:45 UTC
Created attachment 1727860 [details]
screenshot picture3.png

Comment 2 Micah Abbott 2020-11-09 16:18:43 UTC
I was unable to reproduce this in a local VM test.  I used the same `nmcli` commands and observed that the NM file was correctly written.

Is it possible that your Ignition config is also writing the `/etc/NetworkManager/system-connections/default_connection.nmconnection`?

Could you provide the full journal from the host showing the boot after the install was done?  That would give us insight as to if Ignition is writing out the same file.

A copy of the Ignition configuration would be useful, too.

Comment 7 Jonathan Lebon 2020-11-10 19:01:55 UTC
The boot logs show that coreos-copy-firstboot-network and coreos-teardown-network are picking up and propagating the injected NM config:

[    6.861733] coreos-copy-firstboot-network[698]: info: copying files from /mnt/boot_partition/coreos-firstboot-network to /run/NetworkManager/system-connections/
[    6.870657] coreos-copy-firstboot-network[698]: '/mnt/boot_partition/coreos-firstboot-network/default_connection.nmconnection' -> '/run/NetworkManager/system-connections/default_connection.nmconnection'
[   17.888523] coreos-teardown-initramfs[1105]: info: no networking config is defined in the real root
[   17.891753] coreos-teardown-initramfs[1105]: info: propagating initramfs networking config to the real root
[   17.906937] coreos-teardown-initramfs[1105]: /usr/bin/coreos-relabel
[   18.085890] coreos-teardown-initramfs[1105]: Relabeled /sysroot//etc/NetworkManager/system-connections/default_connection.nmconnection from (null) to system_u:object_r:NetworkManager_etc_rw_t:s0

(I opened https://github.com/coreos/fedora-coreos-config/pull/732 to make it easier to tell what files coreos-teardown-initramfs actually copied.)

One test worth doing is booting with `rd.break` and inspecting `/sysroot//etc/NetworkManager/system-connections/default_connection.nmconnection`. If it has the correct contents, then it means that something in the real root is modifying the config (possibly NM itself?). If it doesn't, then it's something in the initrd.

Comment 8 Jonathan Lebon 2020-11-10 19:26:42 UTC
As mentioned in https://github.com/coreos/fedora-coreos-config/pull/733#issuecomment-724914891, a workaround for this is to boot with `rd.neednet=1`. You can do this with `coreos-installer install --firstboot-args 'rd.neednet=1'`. Can you verify that this fixes the issue?

Comment 9 Jonas Nordell 2020-11-11 07:29:37 UTC
I can confirm that adding "--firstboot-args 'rd.neednet=1'" solved my issue and the node booted with IP I had setup with nmcli before running coreos-installer.

Comment 11 Sebastian Jug 2020-11-25 20:35:36 UTC
Another verification that adding "--firstboot-args 'rd.neednet=1'" fixed this issue for me as well.

Comment 12 Jonathan Lebon 2020-11-26 16:37:56 UTC
Fix for this is in https://github.com/openshift/installer/pull/4414.

Comment 13 Micah Abbott 2020-12-05 16:02:02 UTC
This is pending the merge of the installer PR; setting UpcomingSprint to appease the bots.

Comment 14 Micah Abbott 2020-12-08 21:33:18 UTC
The Installer PR is merged; moving to MODIFIED

Comment 16 Micah Abbott 2020-12-10 16:22:34 UTC
Verified with RHCOS 47.83.202012072242-0

From the Live ISO:
   - sudo nmcli con mod "Wired Connection" ipv4.addresses
   - sudo nmcli con mod "Wired Connection" ipv4.gateway
   - sudo nmcli con mod "Wired Connection" ipv4.dns

Confirmed the /etc/NetworkManager/system-connections/default_connection.nmconnection was configured properly

Installed RHCOS via:
  - sudo coreos-installer install --copy-network --insecure-ignition --ignition-url= /dev/vda

Inspected system after install

[core@localhost ~]$ rpm-ostree status                   
State: idle                              
● ostree://d70e44dde4765c2b59cedae6c399c7255a4bb877cc80b1be5c93cbe614b1d395 
                   Version: 47.83.202012072242-0 (2020-12-07T22:46:11Z)     
[core@localhost ~]$ sudo cat /etc/NetworkManager/system-connections/default_connection.nmconnection 
id=Wired Connection




$ cat /usr/lib/dracut/modules.d/15coreos-network/coreos-copy-firstboot-network.service 
# This unit will run early in boot and detect if the user copied
# in firstboot networking config files into the installed system
# (most likely by using `coreos-installer install --copy-network`).
# Since this unit is modifying network configuration there are some
# dependencies that we have:
# - Need to look for networking configuration on the /boot partition
#     - i.e. after /dev/disk/by-label/boot is available
#     - and after the ignition-dracut GPT generator (see below)
# - Need to run before networking is brought up.
#     - This is done in nm-run.sh [1] that runs as part of dracut-initqueue [2]
#     - i.e. Before=dracut-initqueue.service
# - Need to make sure karg networking configuration isn't applied
#     - There are two ways to do this.
#         - One is to run *before* the nm-config.sh [3] that runs as part of
#           dracut-cmdline [4] and `ln -sf /bin/true /usr/libexec/nm-initrd-generator`.
#             - i.e. Before=dracut-cmdline.service
#         - Another is to run *after* nm-config.sh [3] in dracut-cmdline [4]
#           and just delete all the files created by nm-initrd-generator.
#             - i.e. After=dracut-cmdline.service, but Before=dracut-initqueue.service
#     - We'll go with the second option here because the need for the /boot
#       device (mentioned above) means we can't start before dracut-cmdline.service
# [1] https://github.com/dracutdevs/dracut/blob/master/modules.d/35network-manager/nm-run.sh
# [2] https://github.com/dracutdevs/dracut/blob/master/modules.d/35network-manager/module-setup.sh#L37
# [3] https://github.com/dracutdevs/dracut/blob/master/modules.d/35network-manager/nm-config.sh
# [4] https://github.com/dracutdevs/dracut/blob/master/modules.d/35network-manager/module-setup.sh#L36
Description=Copy CoreOS Firstboot Networking Config
# Any services looking at mounts need to order after this
# because it causes device re-probing.
# Since we are mounting /boot/, require the device first
# Need to run after coreos-enable-network since it may re-run the NM cmdline
# hook which will generate NM configs from the network kargs, but we want to
# have precedence.

# The MountFlags=slave is so the umount of /boot is guaranteed to happen
# /boot will only be mounted for the lifetime of the unit.

$ journalctl -b | grep coreos-copy
Dec 10 16:15:52 localhost coreos-copy-firstboot-network[704]: info: copying files from /mnt/boot_partition/coreos-firstboot-network to /run/NetworkManager/system-connections/
Dec 10 16:15:52 localhost coreos-copy-firstboot-network[704]: '/mnt/boot_partition/coreos-firstboot-network/default_connection.nmconnection' -> '/run/NetworkManager/system-connections/default_connection.nmconnection'
Dec 10 16:15:55 localhost systemd[1]: coreos-copy-firstboot-network.service: Succeeded.
Dec 10 16:15:56 localhost systemd[1]: coreos-copy-firstboot-network.service: Succeeded.


Comment 19 errata-xmlrpc 2021-02-24 15:31:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.