Bug 1958930 - network config in machine configs prevents addition of new nodes with static networking via kargs
Summary: network config in machine configs prevents addition of new nodes with static ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6.z
Hardware: x86_64
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Dusty Mabe
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1962850
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-10 12:46 UTC by sperezto
Modified: 2023-09-15 01:06 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:07:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:08:08 UTC

Comment 5 Dusty Mabe 2021-05-10 15:50:30 UTC
Hey sperezto. Unfortunately you can't define networking information in multiple ways and have them merged together.

You provide some via Ignition:


```
[   61.790396] ignition[1827]: INFO     : files: createFilesystemsFiles: createFiles: op(1c): [started]  writing file "/sysroot/etc/sysconfig/network-scripts/route-bond0"
```


and others via kernel command line (i.e. ip= bond=). If the system detects any networking was provided via Ignition it will not propagate any initramfs networking configuration.

```
[  OK  ] Reached target Switch Root.
[   67.299615] coreos-teardown-initramfs[1983]: info: networking config is defined in the real root
         Starting Switch Root...
[   67.317105] coreos-teardown-initramfs[1983]: info: will not attempt to propagate initramfs networking
```

Unfortunately you'd need to provide the default route information via kernel command line options OR provide the full networking configuration via Ignition files.

Comment 7 sperezto 2021-05-11 11:05:59 UTC
(In reply to Dusty Mabe from comment #5)
> Hey sperezto. Unfortunately you can't define networking
> information in multiple ways and have them merged together.
> 
> You provide some via Ignition:
> 
> 
> ```
> [   61.790396] ignition[1827]: INFO     : files: createFilesystemsFiles:
> createFiles: op(1c): [started]  writing file
> "/sysroot/etc/sysconfig/network-scripts/route-bond0"
> ```
> 
> 
> and others via kernel command line (i.e. ip= bond=). If the system detects
> any networking was provided via Ignition it will not propagate any initramfs
> networking configuration.
> 
> ```
> [  OK  ] Reached target Switch Root.
> [   67.299615] coreos-teardown-initramfs[1983]: info: networking config is
> defined in the real root
>          Starting Switch Root...
> [   67.317105] coreos-teardown-initramfs[1983]: info: will not attempt to
> propagate initramfs networking
> ```
> 
> Unfortunately you'd need to provide the default route information via kernel
> command line options OR provide the full networking configuration via
> Ignition files.


Hi @Dusty,

Thanks for the quick reply.

We've been checking the policies[1][2] and code to contrast it with our scenario. We think we might have a chicken egg problem here.

I'd like to give you a brief summary what the customer is facing. They have a really large cluster, around 90 nodes, and they want to add another 40 nodes. The first 90 nodes has been updated from previous OCP versions to 4.6. These 90 node's network configuration has been configured using kernel parameter plus the routes configuration into the ignition file config, which is in a machineconfig object. 

When trying to add new nodes, they're not able to do it because what you've mentioned, basically by the policy to propagate_initramfs_networking via ignition or dracut:

if ignition:
     use ignition
else
    if dracut
        pass through dracut to real root


Being said that, what options do we have? How we can provide specific node network configuration such as IP NAMESERVER BOND parameters, etc? 

This is the issue we have:

if only ignition files:
  that means that we should create a new role, new mc. Specific network parameters must be added on each new node and must be unique. 

else
  if only dracut:
    We need to delete the MC which creates the static route, because if we dont, won't be taken into account the kernel paramenters. 
    

So, what will happen with the nodes we've created before?, that ones that are in use already and are under the MC static route config?. What would be the best and less invasive approach? 

Thanks in advance,   
    

[1].-https://github.com/coreos/fedora-coreos-tracker/issues/394#issuecomment-599721173
[2].-https://github.com/coreos/fedora-coreos-config/blob/rhcos-4.6/overlay.d/05core/usr/lib/dracut/modules.d/30ignition-coreos/coreos-teardown-initramfs.sh#L36-L63

Comment 10 Dusty Mabe 2021-05-11 21:00:56 UTC
This turns out to be a bigger problem than we originally anticipated. The scenario
here is that you have a cluster that was installed some time ago. After initial
install, you realize you need some slight networking tweak (like added static route).
You add this static route to existing nodes via the MCO (i.e. writing out
`/etc/sysconfig/network-scripts/route-bond0`).

Later you try to deploy a new worker node. Since the `route-bond0` file gets
written out by Ignition during the initramfs, the initramfs network propagation
code sees "networking configuration" that was written by Ignition and decides to
not propagate any initramfs networking configuration to the real root (kernel args).
So you can't deploy new nodes.

We'd like to not have this problem in the future so we are discussing it more widely
amongst our teams. For now there are a few possible workarounds you have that shouldn't
require manually running commands on invidual nodes. Of course, manually running commands
on individual nodes is still an option if you prefer. Here is what we came up with:

A. Write a systemd unit that checks for the existence of the `route-bond0` file
   and creates it if it doesn't exist. 
       - The systemd unit is delivered as a machine config.
       - The systemd unit runs in the real root before NetworkManager is started.
       - You'll also remove the existing machine config `route-bond0` file entry.
       - This works around the check for networking configuration that happens
         at the end of the initramfs (allows karg networking to persist).
       - Will need to make sure the created file has appropriate selinux context.
       - To minimize disruption (less reboots), the best way to do so is:
           - pause the corresponding machineconfigpool
           - delete the MC with the file entry
           - add the new MC with the systemd unit
           - unpause the machineconfigpool

B. Machine Config Pool Musical Chairs: the idea here is to move all existing nodes
   into a new custom machineconfigpool, which will have the `route-bond0` MC entry.
   New nodes joining the cluster can then boot into a worker pool without that MC,
   such that the karg-provided networking configuration gets propagated. The new
   nodes can then also be moved into the custom pool to add `route-bond0`.
       - This doesn't require any new scripts, but has more steps.
       - This requires you to move new nodes to the custom pool when you boot them
       - To minimize disruption, the best way to do so is:
           - Create a custom machineconfigpool (henceforth named custom1).
           - Add the same MC with `route-bond0` to the custom pool.
           - Move all current worker nodes into the custom pool, by adding the
             custom1 role label.
           - Remove the MC with `route-bond0` from the worker pool.
           - When you join a new node to the cluster, join it as worker, and
             then move to custom pool.
           - For details see https://github.com/openshift/machine-config-operator/blob/master/docs/custom-pools.md

Comment 13 sperezto 2021-05-12 14:36:22 UTC
Hi Dusty,

Thanks again for your suggestions. The environment has another particularity, as you said  the cluster was installed some time ago, so all the network configuration is under the directory /etc/sysconfig/network-scripts (legacy networking), because all of their interfaces and nw configuration were create in previous ocp/rhco versions. For new nodes deployments, this configuration will be under /etc/NetworkManager/system-connections/, so the route-bond0 won't have any effect.


After thinking about it, we might have another option here. We'd like to know your opinion. Basically, we'll create a new MachineConfigPool with data taken from the original MCP, taking away the route network configuration. This option allow us to deploy the new node and after the node is deployed, will be under the worker MachineConfigPool. Please check the procedure we've followed to test it.  
 

1.- Export the rendered-worker MachineConfig that is currently being used by the "worker" MachineConfigPool.

$ oc get mc $(oc get mcp worker -ojson|jq -r '.spec.configuration.name') -oyaml > machine_config_pool_custom_worker.yml

2.- Change the ManchieConfig render name  

$ sed -i s/$(oc get mcp worker -ojson|jq -r '.spec.configuration.name')/rendered-worker-custom-temporary/g machine_config_pool_custom_worker.yml  
  
$ grep rendered-worker machine_config_pool_custom_worker.yml 


3.- Delete or comment out the block that reference to the route-bond0 file. 

$ vim machine_config_pool_custom_worker.yml
...
...
...
     # - contents:
     #     source: data:text/plain;charset=utf-8;base64,QUREUkVTUzA9OC44LjguOApORVRNQVNLMD0yNTUuMjU1LjI1NS4wCkdBVEVXQVkwPTEwLjAuMTEzLjEK
     #   mode: 420
     #   path: /etc/sysconfig/network-scripts/route-bond0
...
...
...

4.- Create a new MachineConfigPool dummy, which points to a non-existent label, so as not to affect any node. The configuration will be taken from the redered we created in the last step.

cat << EOF |oc apply -f -
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: workercustomtemporal
spec:
  configuration:
    name: rendered-worker-temporal
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker02,worker03]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker02: ""
...

5.- Check the route network configuration is not in the MachineConfigPool definition, consulting the API:

$ export APIURL=api-int.docp4.lab.bcnconsulting.com      

$ curl -k https://$APIURL:22623/config/workercustomtemporal |jq -r '.storage.files[]|select(.path=="/etc/sysconfig/network-scripts/ens3-route")'

NOTE: As you can see there is no configuration, you can check the MachineConfigPool worker to see the differences.

$ curl -k https://$APIURL:22623/config/worker |jq -r '.storage.files[]|select(.path=="/etc/sysconfig/network-scripts/ens3-route")'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  185k  100  185k    0     0   942k      0 --:--:-- --:--:-- --:--:--  948k
{
  "filesystem": "root",
  "overwrite": false,
  "path": "/etc/sysconfig/network-scripts/ens3-route",
  "contents": {
    "source": "data:text/plain;charset=utf-8;base64,QUREUkVTUzA9OC44LjguOApORVRNQVNLMD0yNTUuMjU1LjI1NS4wCkdBVEVXQVkwPTEwLjAuMTEzLjEK"
  },
  "mode": 420
}

6.- Change the path into the ignition file that will be fetch in the new node.

$ cat /var/lib/libvirt/images/openshift/ocp-dworker02.ign| tr '\r\n' ' '|jq
{
  "ignition": {
    "config": {
      "merge": [
        {
          "source": "https://api-int.docp4.lab.bcnconsulting.com:22623/config/workercustomtemporal"
        }
      ]
    },
    "security": {
      "tls": {
        "certificateAuthorities": [
          {
            "source": "data:text/plain;charset=utf-8;base64,...LS1CRUdJTiBDR.."
          }
        ]
      }
    },
    "version": "3.1.0"
  }
}

7.- Start the new node and check logs, that should be similar to:

[    6.990992] systemd[1]: Started dracut pre-mount hook.
[    7.261979] ignition[743]: GET https://api-int.docp4.lab.bcnconsulting.com:22623/config/worker03: attempt #3
[    7.265327] ignition[743]: GET error: Get "https://api-int.docp4.lab.bcnconsulting.com:22623/config/workercustomtemporal": dial tcp: lookup api-int.docp4.lab.bcnconsulting.com on [::1]:53: read udp [::1]:37018->[::1]:53: read: connection refused
[    8.063681] ignition[743]: GET https://api-int.docp4.lab.bcnconsulting.com:22623/config/worker03: attempt #4
[    8.066457] ignition[743]: GET error: Get "https://api-int.docp4.lab.bcnconsulting.com:22623/config/workercustomtemporal": dial tcp: lookup api-int.docp4.lab.bcnconsulting.com on [::1]:53: read udp [::1]:53736->[::1]:53: read: connection refused
[    9.665212] ignition[743]: GET https://api-int.docp4.lab.bcnconsulting.com:22623/config/worker03: attempt #5
[    9.668339] ignition[743]: GET error: Get "https://api-int.docp4.lab.bcnconsulting.com:22623/config/workercustomtemporal": dial tcp: lookup api-int.docp4.lab.bcnconsulting.com on [::1]:53: read udp [::1]:59600->[::1]:53: read: connection refused
[**    ] A start job is running for Ignition (fetch) (9s / no limit)[   12.866524] ignition[743]: GET https://api-int.docp4.lab.bcnconsulting.com:22623/config/workercustomtemporal: attempt #6
[   12.934271] ignition[743]: GET result: OK
[   12.972592] ignition[743]: Adding "root-ca" to list of CAs
[   12.974326] ignition[743]: Adding "root-ca" to list of CAs
[   12.985327] ignition[743]: fetched base config from "system"
[   12.988217] ignition[743]: fetched user config from "qemu"
[   12.989914] ignition[743]: fetch: fetch complete
[  OK  ] Started Ignition (fetch)[   12.992768] ignition[743]: fetched referenced user config from "/config/workercustomtemporal"
.
[   12.995119] ignition[743]: fetch: fetch passed
         Startin[   12.996686] systemd[1]: Started Ignition (fetch).
g Check for FIPS mode...
[   12.998451] ignition[743]: Ignition finished successfully
[   12.999964] systemd[1]: Starting Check for FIPS mode...
[   13.180125] rhcos-fips[797]: Found /etc/ignition-machine-config-encapsulated.json in Ignition config
[   13.212998] rhcos-fips[797]: FIPS mode not requested
[   13.222819] systemd[1]: Started Check for FIPS mode.
[  OK  ] Started Check for FIPS mode.


Thanks in advance,

Comment 14 Dusty Mabe 2021-05-12 20:40:54 UTC
(In reply to sperezto from comment #13)

> For new nodes deployments, this configuration will be under
> /etc/NetworkManager/system-connections/,

It is true that for new deployments (4.6+) will create files under
`/etc/NetworkManager/system-connections/` when propagating kernel
argument networking configuration forward...

> so the route-bond0 won't have any effect.

..but this isn't true. NetworkManager will still read and use
`/etc/sysconfig/network-scripts/route-bond0` if it exists. I just tested
this locally. Does this make option A a little more attractive now?

> 
> 
> After thinking about it, we might have another option here. We'd like to
> know your opinion. Basically, we'll create a new MachineConfigPool with data
> taken from the original MCP, taking away the route network configuration.
> This option allow us to deploy the new node and after the node is deployed,
> will be under the worker MachineConfigPool. Please check the procedure we've
> followed to test it.  

I've talked with a colleague of mine on this. Basically what you've
described seems to be working (quite nicely in fact), but there is
no guarantee it will work in the future. For example, pulling from
https://$APIURL:22623/config/workercustomtemporal may or may not
work in the future as it wasn't intended to be able to be pulled now
(according to my colleague). 

I did mention before that we think this situation is something we want
to make better in the future so hopefully the workaround you do end up
using is temporary. Either way we should probably steer you towards
something that is a little more future proof than what you proposed.
We think option A or B will probably give you that.

Comment 15 sperezto 2021-05-13 10:17:13 UTC
Hi Dusty,

Thanks for comments. We know that the workaround we've suggested is temporary, is a workaround, just to let them add nodes quickly without modify their current configuration/environments.

Regarding the options you mentioned, we totally agreed with you, option A would be the best option thinking forward, but there is a few things to consider according customer's needs.

- They need to add nodes quickly, by the end of this week, so option A implies that around 90 nodes must be rebooted to getting working.
- As to route file, I might be doing something wrong with the route file because I've tried again and it doesn't work. This could be fixed adding a systemd unit with nmcli or ip route add as it's described in kcs[1].

I'll explain the scenario I've tested to add the route and it doesn't work:

I think what is happening is that when you have an interface created by the new model, its configuration should reside under "/etc/NetworkManager/system-connections/" and the route file we created under /etc/sysconfig/network-scripts. This casuistry, I've tried a few times with same result.
   

1)  The device ens3 is created by the coreos/ocp installation under system-connections with the connection.id: "Wired Connection".

# nmcli conn
NAME              UUID                                  TYPE      DEVICE 
Wired Connection  97cabb7f-1045-41c8-a8f7-4270494fa132  ethernet  ens3  
   
# cat /etc/NetworkManager/system-connections/default_connection.nmconnection 
[connection]
id=Wired Connection
uuid=97cabb7f-1045-41c8-a8f7-4270494fa132
type=ethernet
multi-connect=3
permissions=
timestamp=1620663753

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=eui64
dns-search=
method=auto

[proxy]

2) I've tried to add a route to the interface ens3 with three different files names under the "/etc/sysconfig/network-scripts" directory: 
   - "route-Wired Connection"
   - route-Wired_Connection
   - route-ens3
   
   The file contains a simply route to be added:

	ADDRESS0=192.168.1.0
	NETMASK0=255.255.255.0
	GATEWAY0=10.0.113.1
	
	
3) I've tried to load nmcli config file, restart NetworkManager and reboot the server.	

# nmcli conn load /etc/sysconfig/network-scripts/route-ens3
	
	
4) Check routes	

# ip route
default via 10.0.113.1 dev ens3 proto dhcp metric 100 
10.0.113.0/24 dev ens3 proto kernel scope link src 10.0.113.102 metric 100 
172.0.0.0/16 dev tun0 scope link 
172.255.0.0/16 dev tun0 


The result was the same, no routes.

However, if I create the interface under /etc/sysconfig/network-scripts, then I'm able to create the route either,  with the file under /etc/sysconfig/network-scripts or through nmcli. 

[root@dworker01 network-scripts]# nmcli conn down "Wired Connection"

[root@dworker01 network-scripts]# cat ifcfg-ens3 
BOOTPROTO=none
DEFROUTE=yes
DEVICE=ens3
NAME=ens3
GATEWAY=10.0.113.1
IPADDR=10.0.113.102
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=Ethernet
USERCTL=no

[root@dworker01 network-scripts]# cat route-ens3 
ADDRESS0=192.168.1.0
NETMASK0=255.255.255.0
GATEWAY0=10.0.113.1

[root@dworker01 network-scripts]# nmcli conn up ens3 

]# nmcli conn
NAME              UUID                                  TYPE      DEVICE 
ens3              21d47e65-8523-1a06-af22-6f121086f085  ethernet  ens3   
Wired Connection  97cabb7f-1045-41c8-a8f7-4270494fa132  ethernet  -


[root@dworker01 network-scripts]# ip route
default via 10.0.113.1 dev ens3 proto static metric 100 
10.0.113.0/24 dev ens3 proto kernel scope link src 10.0.113.102 metric 100 
172.0.0.0/16 dev tun0 scope link 
172.255.0.0/16 dev tun0 
192.168.1.0/24 via 10.0.113.1 dev ens3 proto static metric 100 


Let me know your thoughts,


[1].-https://access.redhat.com/solutions/5876771

Thanks in advance,

Comment 16 Dusty Mabe 2021-05-13 12:20:21 UTC
(In reply to sperezto from comment #15)
> Hi Dusty,
> 
> Thanks for comments. We know that the workaround we've suggested is
> temporary, is a workaround, just to let them add nodes quickly without
> modify their current configuration/environments.

Understood.
> 
> Regarding the options you mentioned, we totally agreed with you, option A
> would be the best option thinking forward, but there is a few things to
> consider according customer's needs.

Understood.

> 
> - They need to add nodes quickly, by the end of this week, so option A
> implies that around 90 nodes must be rebooted to getting working.

Understood.

> - As to route file, I might be doing something wrong with the route file
> because I've tried again and it doesn't work. This could be fixed adding a
> systemd unit with nmcli or ip route add as it's described in kcs[1].
> 
> I'll explain the scenario I've tested to add the route and it doesn't work:
> 
> I think what is happening is that when you have an interface created by the
> new model, its configuration should reside under
> "/etc/NetworkManager/system-connections/" and the route file we created
> under /etc/sysconfig/network-scripts. This casuistry, I've tried a few times
> with same result.

I apologize. I could swear in my local testing this was working
yesterday, but I just tried to re-confirm and it does not look like
it is working. I just tried with 4.6 and 4.7. I must have been
mistaken yesterday.

This makes option A less attractive as we'd have to change it up a
bit.

I guess since you've got the constraints you listed above you'll need
to with the other strategy for now anyway.

Comment 17 sperezto 2021-05-13 12:38:20 UTC
Hey Dusty,

Thanks again. I think option A still is the good one thinking forward. One thing to take into account is that it can be done through "nmcli or ip route add" just like support suggest in KCS[1], no matter how the interface is created.

Regarding if we'are going to use that workaround, we'll wait to support.

[1].-https://access.redhat.com/solutions/5876771 - Create static routes post cluster installation for a specific worker pool

Cheers,

Comment 27 Dusty Mabe 2021-06-08 13:52:39 UTC
For problems like this in the future we have implemented upstream a `coreos.force_persist_ip` kernel argument that can be used to force propagation of initramfs networking configuration (ip= kernel arguments) even if some Network configuration was defined in Ignition. Using that kernel argument (once it is in an openshift release) should make it so the customer could get unstuck much easier.

Upstream Issue/PR:

- https://github.com/coreos/fedora-coreos-tracker/issues/853
- https://github.com/coreos/fedora-coreos-config/pull/1045

Comment 30 Micah Abbott 2021-06-28 17:04:54 UTC
The associated boot image BZ had code merged and moved to VERIFIED; moving this to MODIFIED

Comment 33 Michael Nguyen 2021-07-01 19:24:36 UTC
Verified on registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-07-01-043852 which has RHCOS 48.84.202106091622-0 as the boot image.  Using steps outlined in https://bugzilla.redhat.com/show_bug.cgi?id=1958930#c29 I was able to get the kernel ip arguments to persist while concurrently having the ignition network configuration laid out.

State: idle
Deployments:
* ostree://457db8ff03dda5b3ce1a8e242fd91ddbe6a82f838d1b0047c3d4aeaf6c53f572
                   Version: 48.84.202106091622-0 (2021-06-09T16:25:42Z)

Comment 35 errata-xmlrpc 2021-07-27 23:07:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 36 Red Hat Bugzilla 2023-09-15 01:06:19 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.