Bug 1860479
Summary: | Unable to attach VLAN-based logical networks to a bond | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Mark R. <rhbugzilla> | |
Component: | kernel | Assignee: | Jonathan Toppins <jtoppins> | |
kernel sub component: | NIC Drivers | QA Contact: | LiLiang <liali> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | unspecified | CC: | amusil, anantha.subramanyam, ashutosh.kumar, astupnik, atragler, bgalvani, bugs, cgaynor, dholler, dhoward, forestia, gcase, gconsalv, goutham-konaghatta.vijayakumar, ivecera, jbainbri, jbenc, jcastran, jiji, jtoppins, kzhang, liali, linville, markus.falb, mchan, network-qe, pelauter, ptalbert, rmetrich, simon, tredaelli, vasundhara-v.volam | |
Version: | 8.2 | Keywords: | ZStream | |
Target Milestone: | rc | |||
Target Release: | 8.4 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | kernel-4.18.0-240.6.el8 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1886017 (view as bug list) | Environment: | ||
Last Closed: | 2021-05-18 13:54:36 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1886017, 1906080 | |||
Attachments: |
Description
Mark R.
2020-07-24 18:09:14 UTC
Created attachment 1702376 [details]
Output from NetworkManager during attempt
Hi, would it be possible for you to run the following commands before step 2: nmcli general logging level TRACE ip -ts monitor link > ip-link.txt Then, after the failure, stop the monitoring with Control-C, attach the ip-link.txt file and the output of 'journalctl --since="5 minutes ago"'. Thank you. Created attachment 1702671 [details]
ouput of 'ip monitor link' during attempt to attach network
Created attachment 1702672 [details]
output of 'journalctl' for the 5 minute window around network addition
Both requested logs attached, happy to help pursue the issue however I can. Thanks!
The problem seems related to the failure to add the VLAN to the bridge reported by kernel: NetworkManager[2401]: <debug> [1595947746.3755] platform: (bond0.22) link: enslaving to master 'DMZ' kernel: DMZ: port 1(bond0.22) entered blocking state kernel: DMZ: port 1(bond0.22) entered disabled state NetworkManager[2401]: <debug> [1595947746.3758] platform-linux: do-request-link: 30 NetworkManager[2401]: <trace> [1595947746.3758] platform-linux: event-notification: RTM_NEWLINK, flags 0, seq 0: 30: bond0.22@14 <UP,LOWER_UP;broadcast,multicast,up,running,lowerup> mtu 1500 master 29 arp 1 vlan* not-init addrgenmode none addr BC:97:E1:24:BA:60 brd FF:FF:FF:FF:FF:FF rx:0,0 tx:0,0; vlan 22 flags 0x1 NetworkManager[2401]: <debug> [1595947746.3758] platform: (bond0.22) signal: link changed: 30: bond0.22@14 <UP,LOWER_UP;broadcast,multicast,up,running,lowerup> mtu 1500 master 29 arp 1 vlan* init addrgenmode none addr BC:97:E1:24:BA:60 brd FF:FF:FF:FF:FF:FF driver vlan rx:0,0 tx:0,0 NetworkManager[2401]: <debug> [1595947746.3758] device[e22d004ce19669d6] (bond0.22): queued link change for ifindex 30 NetworkManager[2401]: <trace> [1595947746.3758] platform-linux: event-notification: RTM_NEWLINK, flags 0, seq 0: 30: bond0.22@14 <UP,LOWER_UP;broadcast,multicast,up,running,lowerup> mtu 1500 arp 1 vlan* not-init addrgenmode none addr BC:97:E1:24:BA:60 brd FF:FF:FF:FF:FF:FF rx:0,0 tx:0,0; vlan 22 flags 0x1 NetworkManager[2401]: <debug> [1595947746.3759] platform: (bond0.22) signal: link changed: 30: bond0.22@14 <UP,LOWER_UP;broadcast,multicast,up,running,lowerup> mtu 1500 arp 1 vlan* init addrgenmode none addr BC:97:E1:24:BA:60 brd FF:FF:FF:FF:FF:FF driver vlan rx:0,0 tx:0,0 NetworkManager[2401]: <debug> [1595947746.3759] platform-linux: netlink: recvmsg: error message from kernel: No data available (61) for request 526 ENODATA seems an unusual error code, I don't understand where it comes from. Also, looking at the iproute output, bond0.22 is added to the bridge for less than 1ms and then removed immediately: [2020-07-28T14:49:06.463305] 31: bond0.22@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default link/ether bc:97:e1:24:ba:60 brd ff:ff:ff:ff:ff:ff [2020-07-28T14:49:06.494630] 31: bond0.22@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master DMZ state UNKNOWN group default link/ether bc:97:e1:24:ba:60 brd ff:ff:ff:ff:ff:ff [2020-07-28T14:49:06.495180] 31: bond0.22@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default link/ether bc:97:e1:24:ba:60 brd ff:ff:ff:ff:ff:ff Could you please try again after running these commands: nmcli general logging level TRACE echo 'file net/bridge/* +p' > /sys/kernel/debug/dynamic_debug/control echo 'file net/8021q/* +p' > /sys/kernel/debug/dynamic_debug/control The last two commands should add more information to the kernel output in the journal. Then, attach the journal as usual. I wonder if the same error happens when configuring interfaces manually. Could you please try these commands starting from a clean configuration (no bridges, vlan, bonds) and attach the output?: DEV1=eno33np0 DEV2=ens2f0np0 ip link add ovirtmgmt type bridge ip link add DMZ0 type bridge ip link add bond0 type bond mode 802.3ad ip link add link bond0 bond0.22 type vlan id 22 ip link set ovirtmgmt up ip link set DMZ0 up ip link set bond0 up ip link set bond0.22 up ip link set $DEV1 down ip link set $DEV2 down ip link set $DEV1 master bond0 ip link set $DEV2 master bond0 ip link set $DEV1 up ip link set $DEV2 up ip link set bond0 master ovirtmgmt ip link set bond0.22 master DMZ0 echo ip link Thank you. Created attachment 1702711 [details]
Output from journalctl after adding some further debugging, then attaching network to bond
Here's the journalctl output with enhanced debugging. I'll send along the results of the manual interface setup soon.
Created attachment 1702718 [details]
Manually configuring the interfaces
I dropped all of the requested commands into a script. After destroying all bridges/bonds and then running the script, the bond wasn't up/ready for a second or so, but did settle and I was then able to throw an IP address onto ovirtmgmt and had all normal network connectivity. Brought up the hosted-engine and could access it and tcpdump showed expected broadcast-type traffic arriving on bond0.22.
Thanks for the additional information. Unfortunately there are no useful messages from kernel. We'll need to add some probes using perf to see why the bridge port addition fails. Can you try these commands? dnf --enablerepo=base-debuginfo install kernel-debuginfo-common kernel-debuginfo perf -y perf probe -m bridge --add 'br_add_if%return ret=$retval' perf probe -m bridge --add 'br_sysfs_addif%return ret=$retval' perf probe --add 'netdev_rx_handler_register%return ret=$retval' perf probe --add 'netdev_master_upper_dev_link%return ret=$retval' perf probe -m bridge --add 'nbp_switchdev_mark_set%return ret=$retval' perf probe -m bridge --add 'nbp_vlan_init%return ret=$retval' perf record -e probe:br_add_if__return,probe:br_sysfs_addif__return,probe:netdev_rx_handler_register__return,probe:netdev_master_upper_dev_link__return,probe:nbp_switchdev_mark_set__return,probe:nbp_vlan_init__return -aR sleep 300 # now from another console configure the network with nmstate/NM and wait it finishes # then interrupt perf recording with Ctrl-C and attach the output of 'perf script' Thank you. In the instructions above, note that "perf record -e ... -aR sleep 300" is all on the same line. Hello Beniamino, Can you just clarify this for me, "from another console configure the network with nmstate/NM and wait it finishes"... is the hope here to capture the same failure seen when modifying the network via oVirt? When the configuration is done from the command line it works, so once I have completed the 'perf record ....' step should I attempt the changes in oVirt (hitting the issue) or command line via 'ip' command? Thanks, Mark > is the hope here to capture the same failure seen when modifying the network via oVirt? Yes, the idea is to capture why the configuration fails. > should I attempt the changes in oVirt (hitting the issue) Yes, please do the config through oVirt. Created attachment 1702829 [details]
Output of 'perf script' after attempting changes
OK, I suspected that was the case, just wanted to be sure. I've attached the results.
Ok, as suspected it's nbp_switchdev_mark_set() returning -ENODATA: NetworkManager 2395 [014] 683.265983: probe:nbp_switchdev_mark_set__return: (ffffffffc027e980 <- ffffffffc026c820) ret=0xffffffc3 NetworkManager 2395 [015] 683.266147: probe:br_add_if__return: (ffffffffc026c570 <- ffffffff99b292a6) ret=0xffffffc3 I don't know why the failure happens only when using NetworkManager and not with iproute; however it looks like a kernel issue and as such I think it should be reassigned to kernel to be investigated. Created attachment 1702997 [details]
Same failure using nmstatectl directly
I agree that this doesn't appear to be oVirt/VDSM, because using 'nmstatectl' directly on the host with a json file that has the desired configuration (adding VLAN 22 to the bond and creating a 'LegacyDMZ' bridge for it) fails in exactly the same way. I've attached the output of that attempt. So, NetworkManager or kernel issue, is there further info I can get from this host to help diagnose?
Created attachment 1703004 [details]
Whoops, the issue _is_ triggered with just the iproute2 utls, no NM involved...
I have to apologize for bad info on a previous post, you asked me to start from a clean slate and use the 'ip' command to create the network configuration manually. I removed bonds, bridges, vlans, etc. with 'ip link delete' but that must not have cleaned up as much as I thought.
After completely reloading the host again, just CentOS 8.2.2004 minimal and never having any bonds/bridges/vlans created, stepping through the 'ip' commands you requested _does_ indeed trigger the same issue. You can't attach a bond to a bridge, it even fails just attaching bond0 to ovirtmgmt (I kept the name for consistency, but oVirt isn't installed here).
Output is attached.
This is repeatable from the boot media for me as well. So the steps to reproduce seem to be: 1. Boot 8.2.2004 installation media 2. Switch to tty2 for a shell 3. Verify no interfaces are configured at all 4. Issue these commands, creating a bridge and a bond and attempting to attach bond to bridge: ip link add mybridge type bridge ip link add bond0 type bond mode 802.3ad ip link set mybridge up ip link set bond0 up ip link set eno33np0 down ip link set ens2f0np0 down ip link set eno33np0 master bond0 ip link set ens2f0np0 master bond0 ip link set eno33np0 up ip link set ens2f0np0 up ip link set bond0 master mybridge RTNETLINK answers: No data available Booting with either the 7.8.2003 or 8.1.1911 installation media, the above steps work and you end up with the desired network configuration. Could this be driver specific? I can do the steps above, booting 8.2.2004 install media. Create a bond from the 10/25Gb interfaces as bond0 as in the steps above. I create a second bond as bond1, also 802.3ad, using a pair of 1Gb interfaces that aren't currently connected. With both bonds up, interfaces assigned, and bridge 'mybridge' created: ip link set bond1 master mybridge # This works just fine (tg3) ip link set bond1 nomaster ip link set bond0 master mybridge RTNETLINK answers: No data available # Failure when using the 10/25Gb interfaces (bnxt_en) If I remove *either* of the bnxt_en interfaces from bond0, I can then attach/detach it from bridges at will. Once it's attached, I can re-add the removed interface and it continues to function. However, 'ip link set bond0 nomaster ; ip link set bond0 master mybridge' fails. Basically it seems if a bond has more than one bnxt_en interface, it can't be attached to a bridge, but will attach if it has only one bnxt_en, and will even attach if I use one bnxt_en and one tg3 interface (of course, that can't really succeed in LACP mode since mismatched speeds, but tried as a test and it still let me put that bond onto a bridge. Further experimenting shows this also seems to require using a bnxt_en interface from two different cards in the bond to get the failure. If I create the bond from both interfaces of a single card (using either eno33np0 + eno34np1, or ens2f0np0 + ens2f1np1) I can assign the bond to a bridge w/o an error. Using one interface from each card triggers it every time. The cards are: 63:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01) 63:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01) a1:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01) a1:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01) The bnxt_en module reports version 1.10.0, which matches the version from 8.1 and 7.8. I suppose that may squash the idea of it being a driver issue. Hi Mark, I *think* "CentOS 8.2.2004" corresponds to RHEL 8.2 so I'm assuming the issue is happening with some 4.18.0-193.el8 series kernel? If so, can you try installing an older 4.18.0-147.el8 kernel and testing with it? RHEL 8.2 includes a pretty big overhaul of switchdev which seems relevant here. For the bond and bnxt interfaces can you provide the output of: $ cat /sys/class/net/<device>/phys_switch_id Then also: # devlink dev show # devlink dev eswitch show # devlink port show nbp_switchdev_mark_set() is part of bridge forward mark support: https://www.kernel.org/doc/html/latest/networking/switchdev.html#switch-id https://www.kernel.org/doc/html/latest/networking/switchdev.html#flooding-l2-domain - When a device is added to a bridge, br_add_if() calls nbp_switchdev_mark_set(). If the call returns anything but 0 (zero) it unlinks the device from the bridge. 498 /* called with RTNL */ 499 int br_add_if(struct net_bridge *br, struct net_device *dev, 500 struct netlink_ext_ack *extack) 501 { 502 struct net_bridge_port *p; 503 int err = 0; 504 unsigned br_hr, dev_hr; 505 bool changed_addr; ..... >570 err = nbp_switchdev_mark_set(p); >571 if (err) >572 goto err6; ..... >631 err6: >632 netdev_upper_dev_unlink(dev, br->dev); 633 err5: 634 dev->priv_flags &= ~IFF_BRIDGE_PORT; 635 netdev_rx_handler_unregister(dev); 636 err4: 637 br_netpoll_disable(p); 638 err3: 639 sysfs_remove_link(br->ifobj, p->dev->name); 640 err2: 641 kobject_put(&p->kobj); 642 p = NULL; /* kobject_put frees */ 643 err1: 644 dev_set_allmulti(dev, -1); 645 put_back: 646 dev_put(dev); 647 kfree(p); 648 return err; 649 } In this case we have a bond device being added to a bridge so the net_bridge_port struct passed into nbp_switchdev_mark_set() by br_add_if() is linked to the bond's net_device. Note that with the logic here in any release, there are three basic possibilities: 1. The passed in device itself (the bond) triggers an -EOPNOTSUPP (which is ignored) and we move on without issue. 2. The passed in device itself (the bond) somehow provides a Switch ID. As far as I can tell the bonding driver can't do this. 3. The Switch IDs of the bond's lower devs are recursively probed. If at any point it is found that the Switch ID of one lower dev does not match another then -ENODATA is returned. Otherwise either the common Switch ID or -EOPNOTSUPP (which is ignored) is returned. For the bond case we know? we're going to fall into #3 and be checking the lower dev Switch IDs. Prior to RHEL 8.1 (including all of RHEL7), nbp_switchdev_mark_set() would use a call to switchdev_port_attr_get() to get the Switch ID for the given port (netdev). In RHEL 8.1, nbp_switchdev_mark_set() is updated to first check whether the netdev has a ndo_get_port_parent_id() handler set and if so, use that. But bnxt in RHEL 8.1 does not set up such a handler so we fall back to calling switchdev_port_attr_get() and the attempt to retrieve a Switch ID is performed in an identical way as to 8.0. In RHEL 8.2, further backports change the logic of nbp_switchdev_mark_set() to no longer use switchdev_port_attr_get()... Keep in mind that for all of these releases, if -EOPNOTSUPP is returned back to nbp_switchdev_mark_set() then it is ignored and the function returns 0 back to br_add_if(). Here in RHEL 8.0 (kernel-4.18.0-80.el8) we can see that nbp_switchdev_mark_set() calls switchdev_port_attr_get() at line 34: 24 int nbp_switchdev_mark_set(struct net_bridge_port *p) 25 { 26 struct switchdev_attr attr = { 27 .orig_dev = p->dev, 28 .id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID, 29 }; 30 int err; 31 32 ASSERT_RTNL(); 33 >34 err = switchdev_port_attr_get(p->dev, &attr); 35 if (err) { 36 if (err == -EOPNOTSUPP) 37 return 0; 38 return err; 39 } 40 41 p->offload_fwd_mark = br_switchdev_mark_get(p->br, p->dev); 42 43 return 0; 44 } - In switchdev_port_attr_get(), if the netdev's switchdev_ops switchdev_port_attr_get function pointer is not NULL then it is invoked and the result returned back to the caller: 177 /** 178 * switchdev_port_attr_get - Get port attribute 179 * 180 * @dev: port device 181 * @attr: attribute to get 182 */ 183 int switchdev_port_attr_get(struct net_device *dev, struct switchdev_attr *attr) 184 { 185 const struct switchdev_ops *ops = dev->switchdev_ops; 186 struct net_device *lower_dev; 187 struct list_head *iter; 188 struct switchdev_attr first = { 189 .id = SWITCHDEV_ATTR_ID_UNDEFINED 190 }; 191 int err = -EOPNOTSUPP; 192 >193 if (ops && ops->switchdev_port_attr_get) >194 return ops->switchdev_port_attr_get(dev, attr); ..... - For bnxt in RHEL8.0, switchdev_port_attr_get() is set and points to bnxt_swdev_port_attr_get(). It simply calls bnxt_port_attr_get(): 8358 static const struct switchdev_ops bnxt_switchdev_ops = { 8359 .switchdev_port_attr_get = bnxt_swdev_port_attr_get 8360 }; 8352 static int bnxt_swdev_port_attr_get(struct net_device *dev, 8353 struct switchdev_attr *attr) 8354 { 8355 return bnxt_port_attr_get(netdev_priv(dev), attr); 8356 } - Basically, bnxt_port_attr_get() is either going to copy the switch ID into the passed-in switchdev_attr struct OR return -EOPNOTSUPP: 8332 int bnxt_port_attr_get(struct bnxt *bp, struct switchdev_attr *attr) 8333 { 8334 if (bp->eswitch_mode != DEVLINK_ESWITCH_MODE_SWITCHDEV) 8335 return -EOPNOTSUPP; 8336 8337 /* The PF and it's VF-reps only support the switchdev framework */ 8338 if (!BNXT_PF(bp)) 8339 return -EOPNOTSUPP; 8340 8341 switch (attr->id) { 8342 case SWITCHDEV_ATTR_ID_PORT_PARENT_ID: 8343 attr->u.ppid.id_len = sizeof(bp->switch_id); 8344 memcpy(attr->u.ppid.id, bp->switch_id, attr->u.ppid.id_len); 8345 break; 8346 default: 8347 return -EOPNOTSUPP; 8348 } 8349 return 0; 8350 } In RHEL 8.1, nbp_switchdev_mark_set() does not just call switchdev_port_attr_get(). Instead, it checks if the netdev's net_device_ops ndo_get_port_parent_id function pointer member is set and if so, uses that. Otherwise, it falls back to calling switchdev_port_attr_get(). However, in RHEL 8.1 the bnxt driver does not have a handler for ndo_get_port_parent_id registered so the basic logic flow is the same as with 8.0. Further, how bnxt handles switchdev_port_attr_get is identical to 8.0. 25 int nbp_switchdev_mark_set(struct net_bridge_port *p) 26 { 27 const struct net_device_ops *ops = p->dev->netdev_ops; 28 struct switchdev_attr attr = { 29 .orig_dev = p->dev, 30 .id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID, 31 }; 32 int err; 33 34 ASSERT_RTNL(); 35 36 if (ops->ndo_get_port_parent_id) 37 err = dev_get_port_parent_id(p->dev, &attr.u.ppid, true); 38 else 39 err = switchdev_port_attr_get(p->dev, &attr); 40 if (err) { 41 if (err == -EOPNOTSUPP) 42 return 0; 43 return err; 44 } 45 46 p->offload_fwd_mark = br_switchdev_mark_get(p->br, p->dev); 47 48 return 0; 49 } Note then that in RHEL 8.0 & RHEL 8.1 with a bnxt PF there is no way for the above logic to result in -ENODATA bubbling back up to br_add_if(). In RHEL 8.2, nbp_switchdev_mark_set() only uses dev_get_port_parent_id(): 24 int nbp_switchdev_mark_set(struct net_bridge_port *p) 25 { 26 struct netdev_phys_item_id ppid = { }; 27 int err; 28 29 ASSERT_RTNL(); 30 31 err = dev_get_port_parent_id(p->dev, &ppid, true); 32 if (err) { 33 if (err == -EOPNOTSUPP) 34 return 0; 35 return err; 36 } 37 38 p->offload_fwd_mark = br_switchdev_mark_get(p->br, p->dev); 39 40 return 0; 41 } - Now dev_get_port_parent_id() itself checks the dev's net_device_ops for ndo_get_port_parent_id and uses it if it is set up. Next it will try devlink_compat_switch_id_get(). Otherwise it roughly integrates the same logic from the bottom of the switchdev_port_attr_get() function: 7635 /** 7636 * dev_get_port_parent_id - Get the device's port parent identifier 7637 * @dev: network device 7638 * @ppid: pointer to a storage for the port's parent identifier 7639 * @recurse: allow/disallow recursion to lower devices 7640 * 7641 * Get the devices's port parent identifier 7642 */ 7643 int dev_get_port_parent_id(struct net_device *dev, 7644 struct netdev_phys_item_id *ppid, 7645 bool recurse) 7646 { 7647 const struct net_device_ops *ops = dev->netdev_ops; 7648 struct netdev_phys_item_id first = { }; 7649 struct net_device *lower_dev; 7650 struct list_head *iter; 7651 int err; 7652 7653 if (ops->ndo_get_port_parent_id) { 7654 err = ops->ndo_get_port_parent_id(dev, ppid); 7655 if (err != -EOPNOTSUPP) 7656 return err; 7657 } 7658 7659 err = devlink_compat_switch_id_get(dev, ppid); 7660 if (!err || err != -EOPNOTSUPP) 7661 return err; 7662 7663 if (!recurse) 7664 return -EOPNOTSUPP; 7665 7666 netdev_for_each_lower_dev(dev, lower_dev, iter) { 7667 err = dev_get_port_parent_id(lower_dev, ppid, recurse); 7668 if (err) 7669 break; 7670 if (!first.id_len) 7671 first = *ppid; 7672 else if (memcmp(&first, ppid, sizeof(*ppid))) 7673 return -ENODATA; 7674 } 7675 7676 return err; 7677 } 7678 EXPORT_SYMBOL(dev_get_port_parent_id); - With RHEL 8.2, bnxt only registers a handler for ndo_get_port_parent_id for VFs, not for PFs. - When devlink_compat_switch_id_get() is called it checks the netdev's netdev_ops for a ndo_get_devlink_port handler and this is something bnxt has, its bnxt_get_devlink_port(). - The logic in bnxt_get_devlink_port() is either going to return -EOPNOTSUPP or for bnxt, whatever its private struct bnxt->dl_port member (a pointer to a devlink_port struct) has its attrs.switch_id member set to. 6834 int devlink_compat_switch_id_get(struct net_device *dev, 6835 struct netdev_phys_item_id *ppid) 6836 { 6837 struct devlink_port *devlink_port; 6838 6839 /* Caller must hold RTNL mutex or reference to dev, which ensures that 6840 * devlink_port instance cannot disappear in the middle. No need to take 6841 * any devlink lock as only permanent values are accessed. 6842 */ 6843 devlink_port = netdev_to_devlink_port(dev); 6844 if (!devlink_port || !devlink_port->attrs.switch_port) 6845 return -EOPNOTSUPP; 6846 6847 memcpy(ppid, &devlink_port->attrs.switch_id, sizeof(*ppid)); 6848 6849 return 0; 6850 } 577 static inline struct devlink_port * 578 netdev_to_devlink_port(struct net_device *dev) 579 { 580 if (dev->netdev_ops->ndo_get_devlink_port) 581 return dev->netdev_ops->ndo_get_devlink_port(dev); 582 return NULL; 583 } 11357 static const struct net_device_ops bnxt_netdev_ops = { 11358 .ndo_open = bnxt_open, 11359 .ndo_start_xmit = bnxt_start_xmit, 11360 .ndo_stop = bnxt_close, ...... 11387 .ndo_bridge_getlink = bnxt_bridge_getlink, 11388 .ndo_bridge_setlink = bnxt_bridge_setlink, >11389 .ndo_get_devlink_port = bnxt_get_devlink_port, 11390 }; 11350 static struct devlink_port *bnxt_get_devlink_port(struct net_device *dev) 11351 { 11352 struct bnxt *bp = netdev_priv(dev); 11353 11354 return &bp->dl_port; 11355 } So after all of the above, in the RHEL8.2 case with a bnxt PF we can see that the only way for nbp_switchdev_mark_set() to return an -ENODATA back to br_add_if() is if in dev_get_port_parent_id() we enter the netdev_for_each_lower_dev() "loop" at lines 7666-7674. - Here we're iterating over each lower netdev, taking its ppid and if it doesn't match the prior one (memcmp is non-zero) we return -ENODATA. ..... 7666 netdev_for_each_lower_dev(dev, lower_dev, iter) { 7667 err = dev_get_port_parent_id(lower_dev, ppid, recurse); 7668 if (err) 7669 break; 7670 if (!first.id_len) 7671 first = *ppid; 7672 else if (memcmp(&first, ppid, sizeof(*ppid))) 7673 return -ENODATA; 7674 } 7675 7676 return err; 7677 } 7678 EXPORT_SYMBOL(dev_get_port_parent_id); My *rough* assumption is that prior to RHEL 8.2 the bnxt interfaces returned -EOPNOTSUPP whereas with 8.2 the bridge is finally made aware that the two bond ports do not have the same Switch ID. But I fully expect some switchdev expert to pop in here and blow this all away. Patrick Hello Patrick, > the issue is happening with some 4.18.0-193.el8 series kernel? Correct. I grabbed RHEL 8.2 install media and verified the same issue arises with it when trying to put this specific bond onto a bridge. > installing an older 4.18.0-147.el8 kernel and testing with it? Indeed, I installed 4.18.0-147.8.1.el8 from the 8.1 repos onto this 8.2 host and adding the bond to a bridge now works as expected. I wasn't certain which kernel you wanted the additional commands run against, so I hit both of them: #-------------------------------------------- # 4.18.0-147.8.1.el8_1.x86_64: cat /sys/class/net/eno33np0/phys_switch_id cat: /sys/class/net/eno33np0/phys_switch_id: Operation not supported cat /sys/class/net/ens2f0np0/phys_switch_id cat: /sys/class/net/ens2f0np0/phys_switch_id: Operation not supported cat /sys/class/net/bond0/phys_switch_id cat: /sys/class/net/bond0/phys_switch_id: Operation not supported devlink dev show pci/0000:63:00.0 pci/0000:63:00.1 pci/0000:a1:00.0 pci/0000:a1:00.1 devlink dev eswitch show pci/0000:63:00.0 pci/0000:63:00.0: mode legacy devlink dev eswitch show pci/0000:63:00.1 pci/0000:63:00.1: mode legacy devlink dev eswitch show pci/0000:a1:00.0 pci/0000:a1:00.0: mode legacy devlink dev eswitch show pci/0000:a1:00.1 pci/0000:a1:00.1: mode legacy devlink port show #-------------------------------------------- #-------------------------------------------- # 4.18.0-193.14.2.el8_2.x86_64: cat /sys/class/net/eno33np0/phys_switch_id 60ba24feffe197bc cat /sys/class/net/ens2f0np0/phys_switch_id 40f2d0feff2826b0 cat /sys/class/net/bond0/phys_switch_id cat: /sys/class/net/bond0/phys_switch_id: Operation not supported devlink dev show pci/0000:63:00.0 pci/0000:63:00.1 pci/0000:a1:00.0 pci/0000:a1:00.1 devlink dev eswitch show pci/0000:63:00.0 pci/0000:63:00.0: mode legacy devlink dev eswitch show pci/0000:63:00.1 pci/0000:63:00.1: mode legacy devlink dev eswitch show pci/0000:a1:00.0 pci/0000:a1:00.0: mode legacy devlink dev eswitch show pci/0000:a1:00.1 pci/0000:a1:00.1: mode legacy devlink port show pci/0000:63:00.0/0: type eth netdev eno33np0 flavour physical port 0 pci/0000:63:00.1/1: type eth netdev eno34np1 flavour physical port 1 pci/0000:a1:00.0/0: type eth netdev ens2f0np0 flavour physical port 0 pci/0000:a1:00.1/1: type eth netdev ens2f1np1 flavour physical port 1 #-------------------------------------------- To get the bond and bridge functional on the 193 kernel I just have to remove one of the interfaces from the bond, add the bond to the bridge, then re-add interface to bond. Hopefully that still gets you valid info from these commands. Thanks again, let me know if there's any further info/tests you'd like to have. Mark Hi Mark, Thank you for this information. From what you have shared, I think we can say that prior to RHEL 8.2 that with these bnxt devices that when nbp_switchdev_mark_set() called switchdev_port_attr_get() that the decent would have stopped right away in bnxt_port_attr_get() because the interfaces are in legacy mode: 8332 int bnxt_port_attr_get(struct bnxt *bp, struct switchdev_attr *attr) 8333 { 8334 if (bp->eswitch_mode != DEVLINK_ESWITCH_MODE_SWITCHDEV) 8335 return -EOPNOTSUPP; Now in 8.2 we're instead calling dev_get_port_parent_id() and when it comes to these bnxt devices the call to devlink_compat_switch_id_get() actually returns a useful value. But of course, the Switch ID of two physically separate cards is not expected to be the same so it is not really surprising that the overall result is the ENODATA. Note the netdev_for_each_lower_dev logic in dev_get_port_parent_id() is taken almost verbatim from switchdev_port_attr_get(). Here, it includes a nice comment snippet with an explanation: ..... 199 /* Switch device port(s) may be stacked under 200 * bond/team/vlan dev, so recurse down to get attr on 201 * each port. Return -ENODATA if attr values don't 202 * compare across ports. 203 */ 204 205 netdev_for_each_lower_dev(dev, lower_dev, iter) { 206 err = switchdev_port_attr_get(lower_dev, attr); 207 if (err) 208 break; 209 if (first.id == SWITCHDEV_ATTR_ID_UNDEFINED) 210 first = *attr; 211 else if (memcmp(&first, attr, sizeof(*attr))) 212 return -ENODATA; 213 } ..... I will try to get Red Hat's bnxt driver maintainer involved. In my basic understanding of switchdev, it *seems* as if bnxt should still produce a EOPNOTSUPP when in Legacy mode. Patrick Moving this to NIC Drivers. Updates to the bnxt driver in RHEL 8.2 prevent the user from creating a bond out of two bnxt ports (from different physical cards) and then adding that bond to a bridge. When a new bridge port is being set up in br_add_if(), nbp_switchdev_mark_set() is called to get the Switch ID of the new port (in this case, the bond). Prior to 8.2, the bond's lower devs (the bnxt ports) do not report a Switch ID (EOPNOTSUPP) so this activity is moot. However, now in RHEL 8.2 the bnxt driver provides an ID via its new ndo_get_devlink_port() handler. Logic in dev_get_port_parent_id() returns ENODATA if the bond's ports do not all have the same switch identifier (here, phys_switch_id). This customer has two physical bnxt cards, each with two ports. When the customer creates a bond using ports that are not from the same card, adding that bond to a bridge fails with ENODATA: ip link set eno33np0 master bond0 ip link set ens2f0np0 master bond0 ip link set bond0 master mybridge RTNETLINK answers: No data available The old logic was nbp_switchdev_mark_set() -> switchdev_port_attr_get() -> switchdev_ops->switchdev_port_attr_get which for bnxt is bnxt_swdev_port_attr_get() which calls bnxt_port_attr_get(). bnxt_port_attr_get() immediately returns EOPNOTSUPP when the card is not in SWITCHDEV mode: 8332 int bnxt_port_attr_get(struct bnxt *bp, struct switchdev_attr *attr) 8333 { 8334 if (bp->eswitch_mode != DEVLINK_ESWITCH_MODE_SWITCHDEV) 8335 return -EOPNOTSUPP; ...... This customer's cards report still being in Legacy mode so it sorta seems to me that the new bnxt ndo_get_devlink_port() handler might need similar logic? I cannot find any upstream commit or netdev list discussion about this issue. Patrick Michael, is Broadcom aware of the problem described in comment #22? No, I'm not aware of this issue. These port related changes are recent upstream changes that seem to have some side effects. CC Anantha. Hello, is there any reason to keep this bug as private? It seems like other users are facing this issue and we would like to share the progress with them. (In reply to Ales Musil from comment #25) > Hello, > > is there any reason to keep this bug as private? It seems like other users > are facing this issue > and we would like to share the progress with them. I don't think you're asking me (I didn't set it private, I'm just the reporter), but just in case... sure thing, open 'er up. I'd like to move to 8.2 for my virtualization hosts but this is blocking me so anything that helps is good by me. I've left the original title intact but wonder if it should be changed to accurately reflect the issue, e.g. "Can't attach a bond made from ports on multiple cards to a bridge"? (In reply to Jonathan Toppins from comment #23) > Michael, is Broadcom aware of the problem described in comment #22? Did adding Group: broadcom_ccx changed the bug to private? Is this intended? As Ales wrote in comment 25, it would be helpful if this bug would be public. As part of the below upstream commit, switch_id is passed to devlink_port_attrs_set() which can be called only once while registering the devlink port.. --------------- commit 6605a226781eb1224c2dcf974a39eea11862b864 Author: Jiri Pirko <jiri> Date: Wed Apr 3 14:24:21 2019 +0200 bnxt: pass switch ID through devlink_port_attrs_set() Pass the switch ID down the to devlink through devlink_port_attrs_set() so it can be used by devlink_compat_switch_id_get(). Signed-off-by: Jiri Pirko <jiri> Signed-off-by: David S. Miller <davem> ---------------- And as part of following commit has removed the ndo_get_port_parent_id implementation from bnxt. ---------------- commit 56d9f4e8f70e6f47ad4da7640753cf95ae51a356 Author: Jiri Pirko <jiri> Date: Wed Apr 3 14:24:22 2019 +0200 bnxt: remove ndo_get_port_parent_id implementation for physical ports Remove implementation of get_port_parent_id ndo and rely on core calling into devlink for the information directly. Signed-off-by: Jiri Pirko <jiri> Signed-off-by: David S. Miller <davem> ---------------- And here is another commit, where they are planning to remove old ndo method altogether in future. ---------------- commit 119c0b5721da9d97f95202c4ad1be2919dac64b0 Author: Jiri Pirko <jiri> Date: Wed Apr 3 14:24:27 2019 +0200 net: devlink: add warning for ndo_get_port_parent_id set when not needed Currently if the driver registers devlink port instance, he should set the devlink port attributes as well. Then the devlink core is able to obtain switch id itself, no need for driver to implement the ndo. Once all drivers will implement devlink port registration, this ndo should be removed. This warning guides new drivers to do things as they should be done. Signed-off-by: Jiri Pirko <jiri> Signed-off-by: David S. Miller <davem> ----------------- So, it requires a general fix in devlink_compat_switch_id_get() to read the switch_id only when mode is set to SWITCHDEV mode even if driver passes the switch_id. I am planning to start a thread with the upstream community describing the issue. *** Bug 1871161 has been marked as a duplicate of this bug. *** (In reply to Vasundhara Volam from comment #32) > So, it requires a general fix in devlink_compat_switch_id_get() to read the > switch_id only when mode is set to SWITCHDEV mode even if driver passes the > switch_id. > > I am planning to start a thread with the upstream community describing the > issue. Yes exactly, this was almost the conclusion I had come too on Friday but had not let Michael know. Glad we came to similar conclusions. The current proposed upstream fix is the following: ==== diff --git a/net/core/dev.c b/net/core/dev.c index d42c9ea0c3c0..7932594ca437 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -8646,7 +8646,7 @@ int dev_get_port_parent_id(struct net_device *dev, if (!first.id_len) first = *ppid; else if (memcmp(&first, ppid, sizeof(*ppid))) - return -ENODATA; + return -EOPNOTSUPP; } return err; ==== Upstream fix is merged. https://patchwork.ozlabs.org/project/netdev/patch/20200910110127.3113683-2-idosch@idosch.org/ Please back-port the fix. A devel kernel is available here: http://people.redhat.com/jtoppins/.bz1860479/ If QE, Broadcom, and the reporter could verify this change fixes the problem with the above kernel what would be great. Question for the networking services team; There were two patches for this upstream fix, one being the actual fix, the other a selftests change. Currently the selftests patch does not apply due to the infrastructure for self tests not being updated. Do you know if there is an effort to update this infrastructure? e1b9efe6baeb ("net: Fix bridge enslavement failure") 6374a5606990 ("selftests: rtnetlink: Test bridge enslavement with different parent IDs") (In reply to Jonathan Toppins from comment #37) > There were two patches for this upstream fix, one being the actual fix, the > other a selftests change. Currently the selftests patch does not apply due > to the infrastructure for self tests not being updated. Do you know if there > is an effort to update this infrastructure? Do you mean 6374a5606990? It seems to almost apply, the rtnetlink.sh is very up to date, the only missing patch is c2a4d2747996, which was applied upstream only recently. I suggest to include c2a4d2747996 with your backport. Posted to rhel-net tree. (In reply to Jonathan Toppins from comment #37) > A devel kernel is available here: > http://people.redhat.com/jtoppins/.bz1860479/ > > If QE, Broadcom, and the reporter could verify this change fixes the problem > with the above kernel what would be great. > I've grabbed the linked RPMs and can give them a shot. Hitting two dependencies though: kernel-tools-libs = 4.18.0-236.el8.bz1860479v1 needed by kernel-tools-4.18.0-236.el8.bz1860479v1.x86_64 linux-firmware >= 20200619-99.git3890db36 needed by kernel-core-4.18.0-236.el8.bz1860479v1.x86_64 Do you know offhand where I might get packages to fulfill those? Apologies, found the necessary linux-firmware package in the repos, just looking for kernel-tools-libs-4.18.0-236.el8.bz1860479v1 at this point. You do not need to install the kernel-tools-libs package to run the kernel. (In reply to Jonathan Toppins from comment #42) > You do not need to install the kernel-tools-libs package to run the kernel. Whoops, that's what I get for assuming I should install everything in that folder. OK, I've installed the necessary bits to test and the initial test steps (from comment #5) go without a hitch. I can configure everything as listed with no errors and get a functional setup. Thanks for your patience and work on this! This issue would hit RHV-4.4 users, because RHV-4.4 hosts requires at least RHEL 8.2 and is not supported on RHEL 8.1. For this reason a backport to RHEL 8.3 would be beneficial for RHV-4.4 users. Patch(es) available on kernel-4.18.0-240.2.el8.dt1 Set Tested base on comment #43 and regression test result https://beaker.engineering.redhat.com/jobs/4626176. reproduced: [root@dell-per730-49 ~]# cat re ip link add mybridge type bridge ip link add bond0 type bond mode 802.3ad ip link set mybridge up ip link set bond0 up ip link set enp4s0f0np0 down ip link set enp7s0np0 down ip link set enp4s0f0np0 master bond0 ip link set enp7s0np0 master bond0 ip link set enp4s0f0np0 up ip link set enp7s0np0 up ip link set bond0 master mybridge [root@dell-per730-49 ~]# source re RTNETLINK answers: No data available [root@dell-per730-49 ~]# uname -r 4.18.0-240.el8.x86_64 Tested: [root@dell-per730-49 ~]# uname -r 4.18.0-240.4.el8.dt3.x86_64 [root@dell-per730-49 ~]# cat re ip link add mybridge type bridge ip link add bond0 type bond mode 802.3ad ip link set mybridge up ip link set bond0 up ip link set enp4s0f0np0 down ip link set enp7s0np0 down ip link set enp4s0f0np0 master bond0 ip link set enp7s0np0 master bond0 ip link set enp4s0f0np0 up ip link set enp7s0np0 up ip link set bond0 master mybridge [root@dell-per730-49 ~]# source re [root@dell-per730-49 ~]# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 80:18:44:e4:5d:84 brd ff:ff:ff:ff:ff:ff 3: enp4s0f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 00:0a:f7:b6:e0:80 brd ff:ff:ff:ff:ff:ff 4: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 80:18:44:e4:5d:85 brd ff:ff:ff:ff:ff:ff 5: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 80:18:44:e4:5d:86 brd ff:ff:ff:ff:ff:ff 6: enp4s0f1np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:0a:f7:b6:e0:81 brd ff:ff:ff:ff:ff:ff 7: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 80:18:44:e4:5d:87 brd ff:ff:ff:ff:ff:ff 8: enp7s0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 00:0a:f7:b6:e0:80 brd ff:ff:ff:ff:ff:ff 9: enp7s0np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:15:4d:13:7a:7e brd ff:ff:ff:ff:ff:ff 10: mybridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 00:0a:f7:b6:e0:80 brd ff:ff:ff:ff:ff:ff 11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master mybridge state UP mode DEFAULT group default qlen 1000 link/ether 00:0a:f7:b6:e0:80 brd ff:ff:ff:ff:ff:ff This reproducer requires two cards on the same system, this is not easy for our automation. Patch(es) available on kernel-4.18.0-240.6.el8 verified: [root@dell-per730-49 ~]# source re [root@dell-per730-49 ~]# cat re ip link add mybridge type bridge ip link add bond0 type bond mode 802.3ad ip link set mybridge up ip link set bond0 up ip link set enp4s0f0np0 down ip link set enp7s0np0 down ip link set enp4s0f0np0 master bond0 ip link set enp7s0np0 master bond0 ip link set enp4s0f0np0 up ip link set enp7s0np0 up ip link set bond0 master mybridge [root@dell-per730-49 ~]# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 80:18:44:e4:5d:84 brd ff:ff:ff:ff:ff:ff 3: enp4s0f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 00:0a:f7:b6:e0:80 brd ff:ff:ff:ff:ff:ff 4: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 80:18:44:e4:5d:85 brd ff:ff:ff:ff:ff:ff 5: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 80:18:44:e4:5d:86 brd ff:ff:ff:ff:ff:ff 6: enp4s0f1np1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether 00:0a:f7:b6:e0:81 brd ff:ff:ff:ff:ff:ff 7: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 80:18:44:e4:5d:87 brd ff:ff:ff:ff:ff:ff 8: enp7s0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 00:0a:f7:b6:e0:80 brd ff:ff:ff:ff:ff:ff 9: enp7s0np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:15:4d:13:7a:7e brd ff:ff:ff:ff:ff:ff 40: mybridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 00:0a:f7:b6:e0:80 brd ff:ff:ff:ff:ff:ff 41: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master mybridge state UP mode DEFAULT group default qlen 1000 link/ether 00:0a:f7:b6:e0:80 brd ff:ff:ff:ff:ff:ff [root@dell-per730-49 ~]# uname -r 4.18.0-240.6.el8.x86_64 Can someone give me a link to the kernel patch please (In reply to SimonScott from comment #60) > Can someone give me a link to the kernel patch please See comment #35, the upstream commit is: e1b9efe6baebe79019a2183176686a0e709388ae net: Fix bridge enslavement failure Thanks Jonathan, Unfortunately that means nothing to me as my git knowledge is little more than knowing it exists. Please can you advise the best way for me to apply a kernel patch to my existing oVirt environment? Regards Simon... (In reply to SimonScott from comment #62) > Thanks Jonathan, > > Unfortunately that means nothing to me as my git knowledge is little more > than knowing it exists. Please can you advise the best way for me to apply a > kernel patch to my existing oVirt environment? > Not really as it would require git and/or patch knowledge to apply the kernel patch. The patch will be included in RHEL-8.4 and is being considered for RHEL-8.3. Would recommend contacting support as they are best positioned to provide recommendations for your specific setup. Many thanks Jonathan Hello, I have a customer who is experiencing the same issue when using bnxt_en ports from different NICs. However according to the updates above, I understand that if ports from same NIC are used for the LACP bond the issue should not be present. My customer experiences the issue that when the machine reboots the slaves do not detect their links and following errors are present: Jan 8 12:12:21 redomyec kernel: bnxt_en 0000:13:00.0 eth0: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet found at mem da910000, node addr f4:03:43:ca:31:a0 Jan 8 12:12:21 redomyec kernel: bnxt_en 0000:13:00.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link) Jan 8 12:12:21 redomyec kernel: bnxt_en 0000:13:00.1 eth1: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet found at mem da900000, node addr f4:03:43:ca:31:a8 Jan 8 12:12:21 redomyec kernel: bnxt_en 0000:13:00.1: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link) Jan 8 12:12:21 redomyec kernel: bnxt_en 0000:13:00.0 ens3f0np0: renamed from eth0 Jan 8 12:12:21 redomyec kernel: bnxt_en 0000:13:00.1 ens3f1np1: renamed from eth1 Jan 8 12:12:45 redomyec kernel: bnxt_en 0000:13:00.0 ens3f0np0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit Jan 8 12:12:45 redomyec kernel: bnxt_en 0000:13:00.0 ens3f0np0: EEE is not active Jan 8 12:12:45 redomyec kernel: bnxt_en 0000:13:00.0 ens3f0np0: FEC autoneg off encodings: None Jan 8 12:12:46 redomyec kernel: bnxt_en 0000:13:00.1 ens3f1np1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit Jan 8 12:12:46 redomyec kernel: bnxt_en 0000:13:00.1 ens3f1np1: EEE is not active Jan 8 12:12:46 redomyec kernel: bnxt_en 0000:13:00.1 ens3f1np1: FEC autoneg off encodings: None Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.0 ens3f0np0: NIC Link is Down Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.0 ens3f0np0: NIC Link is Down Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1: cmdq[0x19]=0x11 status 0x1 Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1 bnxt_re1: Failed to add GID: 0xfffffff2 Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1: cmdq[0x1a]=0x11 status 0x1 Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1 bnxt_re1: Failed to add GID: 0xfffffff2 Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1 ens3f1np1: NIC Link is Down Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1 ens3f1np1: NIC Link is Down Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1: cmdq[0x1b]=0x11 status 0x1 Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1 bnxt_re1: Failed to add GID: 0xfffffff2 Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1: cmdq[0x1c]=0x11 status 0x1 Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1 bnxt_re1: Failed to add GID: 0xfffffff2 Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1: cmdq[0x1d]=0x11 status 0x1 Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1 bnxt_re1: Failed to add GID: 0xfffffff2 Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1: cmdq[0x1e]=0x11 status 0x1 Jan 8 12:12:47 redomyec kernel: bnxt_en 0000:13:00.1 bnxt_re1: Failed to add GID: 0xfffffff2 Bringing the interfaces manually up is working without issue. Is this something that is known or do we hit a different bug here? Thanks Fani (In reply to Fani Orestiadou from comment #70) > Hello, > > I have a customer who is experiencing the same issue when using bnxt_en > ports from different NICs. However according to the updates above, I > understand that if ports from same NIC are used for the LACP bond the issue > should not be present. My customer experiences the issue that when the > machine reboots the slaves do not detect their links and following errors > are present: > ... > Bringing the interfaces manually up is working without issue. > Is this something that is known or do we hit a different bug here? > > Thanks > Fani Not detecting link is different from what this bug is solving, which is allowing one to add a device to a bridge. Sounds more like bz1879840 or bz1855131. Also you are using bnxt_re on top which makes this an RDMA possible bug and complicates the issue more. I would suggest you file a new bug and let engineering determine where in the stack the problem is occurring. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1578 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |