Bug 1922417 - Issue configuring nodes with VLAN and teaming
Summary: Issue configuring nodes with VLAN and teaming
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.7.z
Assignee: Luca BRUNO
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1917773 1935174
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-29 17:00 UTC by Frederic Giloux
Modified: 2021-03-31 15:15 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-30 04:46:29 UTC
Target Upstream Version:
Embargoed:
fgiloux: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:0957 0 None None None 2021-03-30 04:47:52 UTC

Description Frederic Giloux 2021-01-29 17:00:10 UTC
Description of problem:
Trying to add bare-metal nodes to an existing cluster, the network configuration with teaming and VLAN tagging does not work after reboot.
The same procedure and fcc where VLAN tagging is configured without teaming works.

Version-Release number of selected component (if applicable):
RHCOS 4.6

How reproducible:
Always

Steps to Reproduce:
Use a live iso with the following fcc file:

variant: fcos
version: 1.1.0

# Use FCCT to translate this file into an ignition file:
# $ podman run -i --rm quay.io/coreos/fcct:v0.7.0 --pretty --strict < network-config.fcc > rhcos-install/network-config.ign
# The ignition file can get embedded into an iso image with the following command:
# $ sudo podman run --pull=always --privileged --rm \
#  -v ./rhcos-install:/data -w /data \
#  quay.io/coreos/coreos-installer:v0.7.2 \
#  iso ignition embed -i /data/network-config.ign /data/rhcos-46.82.202009222340-0-live.x86_64.iso
# where ./rhcos-install is the directory containing the ignition file and the iso image
# For trial a VM can then get created using the new iso, or a bare-metal server installed with it:
# $ sudo virt-install --name ${CLUSTER_NAME}-worker \
#   --mac=52:54:00:72:83:34 \
#   --disk size=50 --ram 8192 --cpu host --vcpus 2 \
#   --os-type linux --os-variant rhel7 \
#   --network network=${VIR_NET} \
#   --network network=${VIR_NET} \
#   --noreboot --noautoconsole \
#   --cdrom /VirtualMachines/rhcos-46.82.202009222340-0-live.x86_64.iso

# The systemd unit just specifies that the shell script below will be executed during installation
systemd:
  units:
    - name: install.service
      enabled: true
      contents: |
        [Unit]
        Description=Run CoreOS Installer
        Requires=coreos-installer-pre.target
        After=coreos-installer-pre.target
        OnFailure=emergency.target
        OnFailureJobMode=replace-irreversibly

        After=network-online.target
        Wants=network-online.target

        [Service]
        Type=oneshot
        ExecStart=/usr/local/bin/run-coreos-installer
        ExecStartPost=/usr/bin/systemctl --no-block reboot
        StandardOutput=kmsg+console
        StandardError=kmsg+console

        [Install]
        RequiredBy=default.target

storage:
  files:
    - path: /etc/NetworkManager/system-connections/team0.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=team0
          type=team
          interface-name=team0
          [team]
          config={"device":"team0","runner":{"name":"loadbalance","tx_balancer":{"name":"basic"}},"link_watch":{"name":"ethtool"}}
          [ipv4]
          method=disabled
          [ipv6]
          method=ignore
    - path: /etc/NetworkManager/system-connections/team0-slave-eno1.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=team0-slave-eno1
          type=ethernet
          interface-name=eno1
          master=team0
          slave-type=team
          [ipv4]
          method=disabled
          [ipv6]
          method=ignore
    - path: /etc/NetworkManager/system-connections/team0-slave-enp131s0f0.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=team0-slave-enp131s0f0
          type=ethernet
          interface-name=enp131s0f0
          master=team0
          slave-type=team
          [ipv4]
          method=disabled
          [ipv6]
          method=ignore
    - path: /etc/NetworkManager/system-connections/team0.56.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=team0.56
          type=vlan
          interface-name=team0.56
          [vlan]
          id=56
          parent=team0
          flags=1
          egress-priority-map=
          ingress-priority-map=
          [ipv4]
          address1=10.28.56.251/22,10.28.56.1
          dns=10.186.112.131;10.186.16.35;10.186.106.46
          method=manual
          may-fail=false
          [ipv6]
          method=ignore
    - path: /usr/local/bin/run-coreos-installer
      mode: 0755
      contents:
        inline: |
          #!/usr/bin/bash
          set -x
          main() {
              # Parameters are hardcoded here but could easily be automatically populated
              #ignition_url='http://10.186.56.4/openshift/server.company.com/worker.ign'
              # ignition_url='http://192.168.122.1:1234/install_dir/worker.ign'
              # Alternatively the ignition file could be directly added to the iso
              # ignition_file='/home/core/config.ign'
              #install_device=/dev/disk/by-path/pci-0000:00:11.5-ata-5
              # Some custom arguments for firstboot
              #firstboot_args='console=tty0,rd.neednet=1'
              # firstboot_args='ttyS0,115200n8'
              # Setting root password for debugging purpose < does not work, root account is locked
              echo "root:redhat01" | chpasswd
              # In some provisioning environments it can be useful to
              # post some status information to the environment to let
              # it know the install completed successfully. This could be added here
              # Triggers the installation, making use of the network settings that have just been set
              #cmd="coreos-installer install --copy-network"
              #cmd+=" --firstboot-args 'rd.neednet=1'"
              #cmd+=" --firstboot-args=${firstboot_args}"
              #cmd+=" --ignition-url=${ignition_url} --insecure-ignition"
              #cmd+=" ${install_device}"
              #cmd="coreos-installer install --copy-network  --firstboot-args 'rd.neednet=1' --ignition-url=http://10.186.56.4/openshift/domain.company.com/worker.ign --insecure-ignition /dev/disk/by-path/pci-0000:00:11.5-ata-5"
              #echo "Will run command $cmd"
              #sleep 10
              coreos-installer install --copy-network --firstboot-args "rd.neednet=1 coreos.autologin=tty1 console=tty0" --ignition-url=http://10.28.58.125/openshift/domain.company.com/worker.ign --insecure-ignition /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:0:0
              #if $cmd; then
              #    echo "Install Succeeded!"
              #    eject
              #    return 0
              #else
              #    echo "Install Failed!"
              #    eject
              #    return 1
              #fi
          }
          main


Actual results:
In 1 out of 4 cases the network configuration is successfully applied and the installer is able to download the ignition from the http server.
The machine is then rebooted and tries to download the ignition served by the cluster machine config server but fails in a perpetual loop.
If hostname is used in the URL by hostname resolution:
A start job is running for Ignition (fetch) (1h 9min 14s / no limit) [ 4163.605484] ignitjion[1111]: GET "https://api-int.sandbox-ocp.company.com/config/worker": dial tcp: lookup api-int.sandbox-ocp.company.com on [::1]:56651->[::1]:53: read: connection refused

It also fails if an IP is used instead of the hostname.

In 3 out of 4 cases it seems that the script tries to download the ignition served by the http server before the network is operational. This may be worked around with a sleep.

The same fcc and VLAN configuration without teaming (single interface) works just fine and the node is able to join the cluster.

Expected results:
The network is working after the reboot
The ignition served by the machine config server is downloaded
The node joins the cluster

Additional info:
We have tried multiple variations without success.

Comment 2 Micah Abbott 2021-01-30 17:05:53 UTC
It would be useful to get the full console log of a system that successfully installs and one that fails.  We would need to see any messages from NetworkManager that show the networking being configured and activated.

Additionally, appending `rd.break` to the kernel args, so that the applied networking configuration could be gathered would be useful; i.e. the contents of `nmcli con show <interface>` for all the configured interfaces.

@dornelas Could we get sbr-networking involved with this BZ/case, too?

Comment 3 Dusty Mabe 2021-01-31 19:33:32 UTC
I've read through this only briefly but it reminds me of some issues I hit a while back with Teaming in the initrd.

https://bugzilla.redhat.com/show_bug.cgi?id=1784363

More context in: https://bugzilla.redhat.com/show_bug.cgi?id=1758162#c11

My understanding is that anything less than or equal to 4.6 can't use teaming appropriately in the initrd so retrieving the configuration on first boot from https://api-int.sandbox-ocp.company.com/config/worker won't work if you're using teaming. As you've found it works if you don't use teaming.

4.7 has a newer NetworkManager version, which I think has this issue fixed.

Comment 4 Frederic Giloux 2021-02-01 06:48:35 UTC
Hi Micah, Dusty,

thanks for looking at that over the weekend.

Micah it is a bit challenging to get console logs over iLO. Plenty of screenshots have been attached to the case I will see whether I can find something that matches your request and also ask the customer to call the installer with rd.break as parameter to the kernel.

Dusty, it seems that it perfectly matches what my customer is experiencing. I will ask the customer to experiment with setting up single interface at day 1 and to configure teaming at day 2 through the MCO. That said it would not be the preferred approach as the customer is also using static IPs and there is no IPAM with MCO.

I have looked at RHBA-2020:4499 [1] which fixes the bug you reference. It comes with package version NetworkManager-1.26.0-8.el8.x86_64.rpm. Release candidate for 4.7 [2] contains version 1.26.0 release/dist 12.1.rhaos4.7.el8. Could you confirm whether it is a more recent version than 1.26.0-8 so that we can confirm it has the fix?

[1] https://access.redhat.com/errata/RHBA-2020:4499 
[2] has https://releases-rhcos-art.cloud.privileged.psi.redhat.com/contents.html?stream=releases%2Frhcos-4.7&release=47.83.202101301239-0

Comment 10 Frederic Giloux 2021-02-02 20:15:18 UTC
Quick summary of the current status:
- customer tried 4.7 release candidate but experienced the same issue as with 4.6
- setting rd.break=initqueue it was possible to get a shell before the never ending http get loop for the ignition served by the machine config server. The hardware network interfaces were visible but no network configuration files under /etc/*.
- it was required to set "sleep 20" before rhcos-installer command in the fcc script to have the first ignition reliably pulled.

Comment 11 Dusty Mabe 2021-02-02 20:32:16 UTC
(In reply to Frederic Giloux from comment #10)
> Quick summary of the current status:
> - customer tried 4.7 release candidate but experienced the same issue as with 4.6
> - setting rd.break=initqueue it was possible to get a shell before the never ending http get loop for the ignition served by the machine config server. The hardware network interfaces were visible but no network configuration files under /etc/*.

In the initrd the files will be under /run/NetworkManager/system-connections/. They later get "propagated" to /etc/NetworkManager/system-connections/ in the real root. When you break in the initrd do you see anything under /run/NetworkManager/system-connections/? 

> - it was required to set "sleep 20" before rhcos-installer command in the fcc script to have the first ignition reliably pulled.

Comment 12 Frederic Giloux 2021-02-03 07:47:05 UTC
(In reply to Dusty Mabe from comment #11)
> (In reply to Frederic Giloux from comment #10)
> > Quick summary of the current status:
> > - customer tried 4.7 release candidate but experienced the same issue as with 4.6
> > - setting rd.break=initqueue it was possible to get a shell before the never ending http get loop for the ignition served by the machine config server. The hardware network interfaces were visible but no network configuration files under /etc/*.
> 
> In the initrd the files will be under
> /run/NetworkManager/system-connections/. They later get "propagated" to
> /etc/NetworkManager/system-connections/ in the real root. When you break in
> the initrd do you see anything under
> /run/NetworkManager/system-connections/? 
> 

The customer confirmed that the files are under /run/NetworkManager/system-connections/ not under /etc/NetworkManager/system-connections/ and it seems that RHCOS is already trying to download the ignition file from the config server.

> > - it was required to set "sleep 20" before rhcos-installer command in the fcc script to have the first ignition reliably pulled.

Comment 18 Micah Abbott 2021-02-07 20:15:02 UTC
Requires additional investigation

Comment 34 Luca BRUNO 2021-02-16 12:19:18 UTC
Thanks, those 4.7 logs contain some good insights, I agree that it looks like something is going wrong regarding the IP configuration of the `team0.56` interface.

> I am wondering whether the system is not trying to use dhcp because of the `ip=dhcp,dhcp6` parameter.
> Is it possible to workaround it by passing the following arguments to the installer: "--delete-karg ip --append-karg ip=none"?

I agree this looks incorrect, but I don't think it is the real issue.

That parameters is auto-generated as the default if there aren't other `ip=` parameters provided through the bootloader.
You cannot delete it via `--delete-karg`, but indeed you should be able to avoid it via `--append-karg ip=none`.

While you can quickly experiment with adding `ip=none`, I don't think it is going to improve the situation, as from the logs NetworkManager does not try to perform DHCP auto-configuration.

I'll start digging on this to exclude one possible source of issues, but in the best case this is only a cosmetical bug.

> It seems that the interface configuration, team and vlan are in place but the IP configuration is not applied.

From the logs, it looks like NetworkManager gets confused while handling the team+VLAN interface:


```
[   25.545108] localhost NetworkManager[1177]: <info>  [1613466132.5464] manager: (team0.56): new VLAN device (/org/freedesktop/NetworkManager/Devices/15)
[   25.664463] localhost NetworkManager[1177]: <info>  [1613466132.6791] device (team0.56): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
[   25.664579] localhost kernel: IPv6: ADDRCONF(NETDEV_UP): team0.56: link is not ready
[   25.672021] localhost kernel: IPv6: ADDRCONF(NETDEV_UP): team0.56: link is not ready
...
[   31.742106] localhost.localdomain NetworkManager[1177]: <info>  [1613466138.7141] exiting (success)
...
[   33.996403] localhost.localdomain kernel: igb 0000:83:00.0 enp131s0f0: igb: enp131s0f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   34.008696] localhost.localdomain kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp131s0f0: link becomes ready
[   34.017031] localhost.localdomain teamd_team0[1243]: enp131s0f0: ethtool-link went up.
[   34.016997] localhost.localdomain kernel: IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
[   34.031821] localhost.localdomain systemd-journald[592]: Missed 1 kernel messages
[   34.024844] localhost.localdomain kernel: IPv6: ADDRCONF(NETDEV_CHANGE): team0.56: link becomes ready
[   34.050375] localhost.localdomain kernel: igb 0000:01:00.0 eno1: igb: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   34.061504] localhost.localdomain kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[   34.069263] localhost.localdomain teamd_team0[1243]: eno1: ethtool-link went up.
```

In particular, NM creates all the virtual team & VLAN interfaces, however it never tries to configure addresses on `team0.56` (neither static nor DHCP), marking it as 'unavailable' instead. It then proceeds and eventually exit.

By the time the interfaces are all marked as up, NetworkManager is not running anymore.

Final recap:
 * there is a wrong ip= parameter injected via dracut. I'll start by investigating/eliminating this as a possible source of troubles, but there is a good chance this is only cosmetical.
 * for some reason NM does not seem to perform IP configuration on the upper team+VLAN interface in the initramfs. I'll try to loop some NM folks to dig into this.

Comment 35 Frederic Giloux 2021-02-16 13:07:41 UTC
@lucab you are right --append-karg ip=none did not help.

Comment 36 Beniamino Galvani 2021-02-17 09:10:29 UTC
At startup NM waits that each interface gets carrier. The default
timeout is 6 seconds. From what I see in logs, the VLAN is brought up
at 25.664579 and NM waits until 31.742106 for it to get carrier. Of
course, the VLAN gets carrier only when the team gets it, which
happens at 34.016997. The reason why it takes so long is that the
ethernet is brought down when it is added to the team and probably the
link negotiation is slow.

Since the VLAN has carrier only after NM has quit, no IP
configuration is performed on it.

In NM 1.26.0-12.1.rhaos4.7.el8 it's not possible to change the
carrier timeout from the dracut command line. This was recently
implemented in [1]. You need to write a configuration file like this:

 # cat /etc/NetworkManager/conf.d/10-carrier-timeout.conf
 [device-10-carrier-timeout]
 match-device=*
 carrier-wait-timeout=30000

to increase the timeout e.g. to 30 seconds.

It would be useful if the customer can provide NM logs with 'rd.debug'
in the kernel command line, to confirm what I just said, because
without TRACE logs it's a bit difficult to fully understand what
happens. Thanks.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/730

Comment 44 Frederic Giloux 2021-03-19 14:36:13 UTC
@lucab my understanding is that there is now a version of NetworkManager, which supports carrier-wait-timeout that is consumable by RHCOS. Can we expect to have it available for OpenShift 4.7 before the end of the month?

Comment 45 Luca BRUNO 2021-03-19 15:56:23 UTC
The fix for this is now in NetworkManager 1:1.26.0-14.1.rhaos4.7.el8.
It first landed in RHCOS 47.83.202103181343-0.

I've manually verified on that image that the initrd cmdline translation logic works as expected:

```
# grep -o rd.net.timeout.carrier='[[:alnum:]]*' /proc/cmdline
rd.net.timeout.carrier=30

# cat /run/NetworkManager/conf.d/15-carrier-timeout.conf
[device-15-carrier-timeout]
match-device=*
carrier-wait-timeout=30000
```

This now needs a bootimage bump for OCP 4.7.

Comment 46 Luca BRUNO 2021-03-19 16:31:08 UTC
We already have a 4.7 bootimage bump in progress at https://bugzilla.redhat.com/show_bug.cgi?id=1935174, so I've added a note there to also include this BZ.

PR is at https://github.com/openshift/installer/pull/4746, but it needs to be refreshed to consume newer artifacts.

Comment 47 Frederic Giloux 2021-03-19 16:35:43 UTC
Thanks a million Bruno for progressing that promptly

Comment 48 Micah Abbott 2021-03-19 19:09:06 UTC
Setting this back to MODIFIED to allow the bots to attach it to a z-stream errata.

Comment 51 Michael Nguyen 2021-03-26 17:26:33 UTC
Verified the net carrier timeout option is available and working on rhcos-47.83.202103191543-0

Booted the live ISO and added kargs `rd.net.timeout.carrier=30` and `rd.break` and `console=ttyS0` on libvirt.  Once in the emergency shell the following was shown

# cat /run/NetworkManager/conf.d/15-carrier-timeout.conf 
[device-15-carrier-timeout]
match-device=*
carrier-wait-timeout=30000

# grep -o rd.net.timeout.carrier='[[:alnum:]]*' /proc/cmdline 
rd.net.timeout.carrier=30

Comment 53 errata-xmlrpc 2021-03-30 04:46:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.4 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0957


Note You need to log in before you can comment on or make changes to this bug.