Bug 2103080 - br-ex not created due to default bond interface having a different mac address than expected
Summary: br-ex not created due to default bond interface having a different mac addres...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.11.0
Assignee: Jaime Caamaño Ruiz
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On: 2096413
Blocks: 2098626
TreeView+ depends on / blocked
 
Reported: 2022-07-01 11:57 UTC by OpenShift BugZilla Robot
Modified: 2022-12-09 17:19 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: For ovn-kubernetes, when setting up br-ex on boot with a bond or team interface, mac addresses of br-ex and the bond interface might not match. Consequence: On bare metal or some virtual platforms like vSphere, 100% of traffic might be dropped due to nic driver dropping traffic to an unexpected br-ex mac address. Fix: Properly use the same mac address for br-ex and the bond interface. Result: Traffic no longer dropped. Please note that active-backup link aggregation with fail_over_mac is not properly supported as changes on the bond interface mac address are not propagated to br-ex.
Clone Of:
Environment:
Last Closed: 2022-08-10 11:19:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 3216 0 None Merged [release-4.11] Bug 2103080: configure-ovs: set mac only for non fail_over_mac bonds 2022-07-07 20:13:20 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:20:05 UTC

Comment 2 Ross Brattain 2022-07-06 05:54:32 UTC
4.11.0-0.ci.test-2022-07-05-144909-ci-ln-ndm4ggb-latest

UPI vSphere static IP fail_over_mac=0


bond0,br-ex works after primary slave is disconnected.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens192: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN mode DEFAULT group default qlen 1000\    link/ether 00:50:56:ac:ec:2b brd ff:ff:ff:ff:ff:ff
3: ens224: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000\    link/ether 00:50:56:ac:ec:2b brd ff:ff:ff:ff:ff:ff permaddr 00:50:56:ac:13:23
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 6e:49:84:3a:fc:34 brd ff:ff:ff:ff:ff:ff
5: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000\    link/ether d2:e3:07:3c:56:0a brd ff:ff:ff:ff:ff:ff
6: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\    link/ether e2:81:ae:b7:cc:22 brd ff:ff:ff:ff:ff:ff
7: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether 1a:6a:e5:20:8a:95 brd ff:ff:ff:ff:ff:ff
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000\    link/ether 00:50:56:ac:ec:2b brd ff:ff:ff:ff:ff:ff
11: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\    link/ether 00:50:56:ac:ec:2b brd ff:ff:ff:ff:ff:ff



RHEL 8 vSphere DHCP fail_over_mac=0

bond0,br-ex works after primary slave is disconnected.
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens192: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN mode DEFAULT group default qlen 1000\    link/ether 00:50:56:ac:ef:2f brd ff:ff:ff:ff:ff:ff
3: ens224: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000\    link/ether 00:50:56:ac:ef:2f brd ff:ff:ff:ff:ff:ff permaddr 00:50:56:ac:f3:83
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether be:b4:ee:23:54:9d brd ff:ff:ff:ff:ff:ff
5: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000\    link/ether 9e:00:17:17:cb:92 brd ff:ff:ff:ff:ff:ff
6: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\    link/ether 96:cd:68:e6:42:a8 brd ff:ff:ff:ff:ff:ff
7: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN mode DEFAULT group default qlen 1000\    link/ether b6:29:74:25:56:ab brd ff:ff:ff:ff:ff:ff
10: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000\    link/ether 00:50:56:ac:ef:2f brd ff:ff:ff:ff:ff:ff
11: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\    link/ether 00:50:56:ac:ef:2f brd ff:ff:ff:ff:ff:ff


RHEL 8 vSphere DHCP fail_over_mac=1, changing the bond0 MAC on vmxnet3 doesn't work as expected.


fail_over_mac=1 parsing.

Jul 05 19:51:08 rhel-0 configure-ovs.sh[1842]: ++ nmcli --get-values bond.options conn show d958e36e-ed47-3f4d-8a8c-4c4b5ba8e3f8
Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + bond_opts=mode=active-backup,fail_over_mac=1,miimon=100,primary=ens192
Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + '[' -n mode=active-backup,fail_over_mac=1,miimon=100,primary=ens192 ']'
Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + extra_phys_args+=(bond.options "${bond_opts}")
Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + MODE_REGEX='(^|,)mode=active-backup(,|$)'
Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + MAC_REGEX='(^|,)fail_over_mac=(1|active|2|follow)(,|$)'
Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + [[ mode=active-backup,fail_over_mac=1,miimon=100,primary=ens192 =~ (^|,)mode=active-backup(,|$) ]]
Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + [[ mode=active-backup,fail_over_mac=1,miimon=100,primary=ens192 =~ (^|,)fail_over_mac=(1|active|2|follow)(,|$) ]]
Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + clone_mac=0
Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + '[' '!' 0 = 0 ']'
Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + nmcli connection show ovs-if-phys0

Comment 7 Jaime Caamaño Ruiz 2022-07-06 12:33:15 UTC
(In reply to Ross Brattain from comment #2)
> 
> 
> RHEL 8 vSphere DHCP fail_over_mac=1, changing the bond0 MAC on vmxnet3
> doesn't work as expected.
> 
> 
> fail_over_mac=1 parsing.
> 
> Jul 05 19:51:08 rhel-0 configure-ovs.sh[1842]: ++ nmcli --get-values
> bond.options conn show d958e36e-ed47-3f4d-8a8c-4c4b5ba8e3f8
> Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: +
> bond_opts=mode=active-backup,fail_over_mac=1,miimon=100,primary=ens192
> Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + '[' -n
> mode=active-backup,fail_over_mac=1,miimon=100,primary=ens192 ']'
> Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: +
> extra_phys_args+=(bond.options "${bond_opts}")
> Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: +
> MODE_REGEX='(^|,)mode=active-backup(,|$)'
> Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: +
> MAC_REGEX='(^|,)fail_over_mac=(1|active|2|follow)(,|$)'
> Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + [[
> mode=active-backup,fail_over_mac=1,miimon=100,primary=ens192 =~
> (^|,)mode=active-backup(,|$) ]]
> Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + [[
> mode=active-backup,fail_over_mac=1,miimon=100,primary=ens192 =~
> (^|,)fail_over_mac=(1|active|2|follow)(,|$) ]]
> Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + clone_mac=0
> Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + '[' '!' 0 = 0 ']'
> Jul 05 19:51:08 rhel-0 configure-ovs.sh[1555]: + nmcli connection show
> ovs-if-phys0

Not sure I understand this one. The parsing is working as expected and with fail_over_mac=1 we don't intend to clone the mac address to bond0?

Comment 11 Ross Brattain 2022-07-06 15:16:22 UTC
Gathering NM logs with

nmcli g logging level trace
echo -e '[logging]\nlevel=TRACE\ndomains=ALL' > /etc/NetworkManager/conf.d/logging.conf

Comment 12 Ross Brattain 2022-07-06 22:05:32 UTC
We think the IPI bond0 issues from comment 5 is an NM issue, I tested rebooting with MN loglevel trace a few times and I don't think it has reproduced, but I will file a NM BZ with the logs.

The vSphere fail_over_mac=1 bond0 MAC != br-ex MAC is a new test configuration not necessarily expected to work with these fixes.  I will file a separate BZ for that.


Calling this Verified on 4.11.0-0.ci.test-2022-07-05-144909-ci-ln-ndm4ggb-latest

Comment 14 errata-xmlrpc 2022-08-10 11:19:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.