Bug 1789601 - bonding configuration no longer works as documented
Summary: bonding configuration no longer works as documented
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.3.0
Assignee: Micah Abbott
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1791279
Blocks: 1186913 1792022
TreeView+ depends on / blocked
 
Reported: 2020-01-09 21:36 UTC by Steve Milner
Modified: 2020-02-01 09:07 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1791279 1792022 (view as bug list)
Environment:
Last Closed: 2020-01-23 11:20:00 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github coreos ignition-dracut pull 148 None closed dracut: down the bonded interfaces properly (BZ#1789601) 2020-02-18 17:49:55 UTC
Github openshift installer pull 2914 None closed Bug 1789601: rhcos: Bump to 43.81.202001141554.0 2020-02-18 17:49:55 UTC
Red Hat Bugzilla 1758091 None None None 2020-01-09 21:36:06 UTC
Red Hat Bugzilla 1767771 None None None 2020-01-09 21:49:13 UTC
Red Hat Product Errata RHBA-2020:0062 None None None 2020-01-23 11:20:31 UTC

Description Steve Milner 2020-01-09 21:36:07 UTC
Description of problem:
https://bugzilla.redhat.com/show_bug.cgi?id=1758091 added support for bonding interfaces. However, in recent builds it seems that a regression has occurred and the instructions that worked previously no longer have the same result.

How reproducible:

Currently, every time with recent 4.3 builds.

Steps to Reproduce:
1. Use a a recent 4.3 RHCOS
2. Follow instructions at https://bugzilla.redhat.com/show_bug.cgi?id=1758091#c20
3. Note the bond doesn't come up until the machine reboots

Actual results:

The bond is created, but no longer starts up right away. Instead, it starts up after rebooting.


Expected results:

The bond starts up immediately.

Comment 1 Eric Rich 2020-01-09 21:48:56 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1767771 is also connected to this.

Comment 2 Dave Cain 2020-01-09 21:59:28 UTC
Seeing similar results as Steve with 202001072253.0 this week.  Reboots before the December holiday were not necessary and provisioned machines with bonded interfaces passed through the /proc/cmdline interface to dracut functioned normally.

Comment 3 Micah Abbott 2020-01-10 15:30:29 UTC
While we continue to debug this issue, the workaround is to include a systemd unit file in the initial Ignition config that ups the bond interface on first boot.

```
$ cat bond-up.sh
#!/usr/bin/env bash
nmcli connection up bond0
touch /var/lib/bond-up

$ cat bond-up.service
[Unit]
Before=multi-user.target
After=network-online.target
ConditionPathExists=!/var/lib/bond-up
[Service]
Type=oneshot
ExecStart=/usr/local/bin/bond-up.sh
[Install]
WantedBy=multi-user.target
```

Example Ignition snippet:

{
  "ignition": {                                                                                           
    "config": {},
    "security": {
      "tls": {}
    },
    "timeouts": {},
    "version": "2.2.0"
  },
  "networkd": {},
  "passwd": {
    "users": [
      {
        "groups": [
          "sudo",
          "wheel",
          "adm",
          "systemd-journal"
        ],
        "name": "core",
        "passwordHash": "$6$xXXXXXX...",
        "sshAuthorizedKeys": [
          "ssh-rsa AAAAB3NzaC1XXXXX..."
        ]
      }
    ]
  },
  "storage": {
    "files": [
      {
        "contents": {
          "source": "data:text/plain;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaApubWNsaSBjb25uZWN0aW9uIHVwIGJvbmQwCnRvdWNoIC92YXIvbGliL2JvbmQtdXAK"
        },
        "filesystem": "root",
        "mode": 509,
        "path": "/usr/local/bin/bond-up.sh"
      }
    ]
  },
  "systemd": {
    "units": [
      {
        "contents": "[Unit]\nBefore=multi-user.target\nAfter=network-online.target\nConditionPathExists=!/var/lib/bond-up\n[Service]\nType=oneshot\nExecStart=/usr/local/bin/bond-up.sh\n[Install]\nWantedBy=multi-us
er.target\n",
        "enabled": true,
        "name": "bond-up.service"
      }
    ]
  }
}

```

```

Comment 4 Micah Abbott 2020-01-10 22:49:24 UTC
We think the fix in BZ#1758091 is incomplete.

After numerous tests, we believe we need to enhance the `coreos-teardown-initramfs-network.service` to properly down + remove the bonded interfaces while in the initramfs.

The current code in that service will just skip the `/sys/class/net/bonding_masters` entry and will only do `ip link set bond0 down` as part of the teardown process.

https://github.com/coreos/ignition-dracut/blob/spec2x/dracut/30ignition/coreos-teardown-initramfs-network.sh#L13

However, this still leaves the bonded interface defined under `/sys/class/net/` which seems to confuse NetworkManager in the real root.

If we modify the service to properly remove the bonded interface definition under `/sys/class/net`, NetworkManager in the real root is able to properly start the interface.

See https://github.com/torvalds/linux/blob/master/Documentation/networking/bonding.txt#L1427-L1436 for more info on removing bond configurations.

Comment 17 Steve Milner 2020-01-13 22:30:51 UTC
PR merged

Comment 18 Micah Abbott 2020-01-14 14:18:17 UTC
The PR is included in `ignition-0.34.0-1.rhaos4.3.git92f874c.el8` which in turn is included in RHCOS 43.81.202001140253.0

We'll need to bump the boot images in the installer to properly include this fix.

Comment 20 Steve Milner 2020-01-14 14:44:32 UTC
https://github.com/openshift/installer/pull/2914 houses the fix reference for the installer.

Comment 23 Steve Milner 2020-01-15 15:04:20 UTC
Moving to modified per request.

Comment 25 Michael Nguyen 2020-01-17 17:49:22 UTC
Verified on 43.81.202001141554.0

bonded interface comes up after setting it up using kargs

Comment 26 Micah Abbott 2020-01-21 17:11:50 UTC
Clearing all the NEEDINFOs as we have a fix that has been verified

Comment 28 errata-xmlrpc 2020-01-23 11:20:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.