| Summary: | Cannot bond Infiniband network interfaces (as VLAN id 0 is treated incorrectly) | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Alexander Murashkin <alexandermurashkin> |
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
| Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rawhide | CC: | gansalmon, itamar, jonathan, jstanley, kernel-maint, madhu.chinakonda |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-11-13 14:54:44 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
[mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. [mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. [mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. The same problem. The bonding works right after a boot but stops to work if the bonding interface is recycled (down/up). It works initially because bonding module is loaded before 8021q module. After 8021q is loaded enslaving stops to work. Here are relevanr lines from syslog ---- booting ------------- Mar 22 21:14:24 raptor kernel: [ 83.070301] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Mar 22 21:14:24 raptor kernel: [ 83.071132] bonding: bond2 is being created... Mar 22 21:14:24 raptor kernel: [ 83.132692] bonding: bond2: Setting MII monitoring interval to 100. Mar 22 21:14:24 raptor kernel: [ 83.132786] bonding: bond2: setting mode to active-backup (1). Mar 22 21:14:24 raptor kernel: [ 83.134247] ADDRCONF(NETDEV_UP): bond2: link is not ready Mar 22 21:14:24 raptor kernel: [ 83.173922] ib0: enabling connected mode will cause multicast packet drops Mar 22 21:14:24 raptor kernel: [ 83.176488] ib0: mtu > 2044 will cause multicast packet drops. Mar 22 21:14:24 raptor kernel: [ 83.178063] bonding: bond2: Adding slave ib0. Mar 22 21:14:24 raptor kernel: [ 83.178065] bonding: bond2: Warning: enslaved VLAN challenged slave ib0. Adding VLANs will be blocked as long as ib0 is part of bond bond2 Mar 22 21:14:24 raptor kernel: [ 83.178151] bonding: bond2: Warning: The first slave device specified does not support setting the MAC address. Setting fail_over_mac to active. Mar 22 21:14:24 raptor kernel: [ 83.179728] bonding: bond2: enslaving ib0 as a backup interface with a down link. Mar 22 21:14:24 raptor kernel: [ 83.227691] ib1: enabling connected mode will cause multicast packet drops Mar 22 21:14:24 raptor kernel: [ 83.229265] ib1: mtu > 2044 will cause multicast packet drops. Mar 22 21:14:24 raptor kernel: [ 83.231158] bonding: bond2: Adding slave ib1. Mar 22 21:14:24 raptor kernel: [ 83.231160] bonding: bond2: Warning: enslaved VLAN challenged slave ib1. Adding VLANs will be blocked as long as ib1 is part of bond bond2 Mar 22 21:14:24 raptor kernel: [ 83.232751] bonding: bond2: enslaving ib1 as a backup interface with a down link. Mar 22 21:14:24 raptor kernel: [ 83.233020] bonding: bond2: link status definitely up for interface ib0, 4294967295 Mbps full duplex. Mar 22 21:14:24 raptor kernel: [ 83.233023] bonding: bond2: making interface ib0 the new active one. Mar 22 21:14:24 raptor kernel: [ 83.233044] bonding: bond2: first active interface up! Mar 22 21:14:24 raptor kernel: [ 83.233682] ADDRCONF(NETDEV_CHANGE): bond2: link becomes ready .... Mar 22 21:14:33 raptor kernel: [ 92.201617] 8021q: 802.1Q VLAN Support v1.8 Mar 22 21:14:33 raptor kernel: [ 92.201629] 8021q: adding VLAN 0 to HW filter on device bond2 ---- ifdown bond2 ------------------ Mar 22 21:17:12 raptor kernel: [ 251.561320] bonding: bond2: Removing slave ib0. Mar 22 21:17:12 raptor kernel: [ 251.561338] bonding: bond2: releasing active interface ib0 Mar 22 21:17:12 raptor kernel: [ 251.712468] bonding: bond2: Removing slave ib1. Mar 22 21:17:12 raptor kernel: [ 251.712486] bonding: bond2: releasing backup interface ib1 Mar 22 21:17:12 raptor kernel: [ 251.712491] bonding: bond2: Warning: clearing HW address of bond2 while it still has VLANs. Mar 22 21:17:12 raptor kernel: [ 251.712494] bonding: bond2: When re-adding slaves, make sure the bond's HW address matches its VLANs'. ---- ifup bond2 -------------------- Mar 22 21:17:17 raptor kernel: [ 256.443262] bonding: bond2: Setting MII monitoring interval to 100. Mar 22 21:17:17 raptor kernel: [ 256.443400] bonding: bond2: setting mode to active-backup (1). Mar 22 21:17:17 raptor kernel: [ 256.445385] ADDRCONF(NETDEV_UP): bond2: link is not ready Mar 22 21:17:17 raptor kernel: [ 256.445390] 8021q: adding VLAN 0 to HW filter on device bond2 Mar 22 21:17:17 raptor kernel: [ 256.491592] ib0: enabling connected mode will cause multicast packet drops Mar 22 21:17:17 raptor kernel: [ 256.494235] bonding: bond2: Adding slave ib0. Mar 22 21:17:17 raptor kernel: [ 256.494238] bonding: bond2: Error: cannot enslave VLAN challenged slave ib0 on VLAN enabled bond bond2 Mar 22 21:17:17 raptor kernel: [ 256.538680] ib1: enabling connected mode will cause multicast packet drops Mar 22 21:17:17 raptor kernel: [ 256.541346] bonding: bond2: Adding slave ib1. Mar 22 21:17:17 raptor kernel: [ 256.541350] bonding: bond2: Error: cannot enslave VLAN challenged slave ib1 on VLAN enabled bond bond2 This is caused by upstream commit cc0e40700656b09d93b062ef6c818aa45429d09a and is still present in 3.6.1 upstream. I haven't had a chance to look at the affected code to see if there's something obvious, but that's the commit that it bisects to. moving to rawhide since this is in 3.6 upstream. FYI, fix pending upstream. http://patchwork.ozlabs.org/patch/191363/ http://patchwork.ozlabs.org/patch/192020/ This is fixed in the rawhide 3.7-rcX kernels. |
Description of problem: Enslaving of Infiniband interfaces does not work. An error similar to one below is printed bonding: bond2: Error: cannot enslave VLAN challenged slave ib0 on VLAN enabled bond bond2 Based on the kernel source code bonding module assumes that bond2 interface has VLAN 0 enabled. VLAN ID 0 is 802.1Q reserved value indicating that a frame does not belong to any VLAN. So it has to be treated as a special case in 8021q module and other NETIF_F_HW_VLAN_FILTER enabled modules. Specifically the following happens When bond2 is being brought up - vlan_device_event() in vlan.c is called with event NETDEV_UP - because bonding has NETIF_F_HW_VLAN_FILTER feature vlan_device_event() calls bonding ndo_vlan_rx_add_vid(vlan_id 0) - bond_vlan_rx_add_vid() in bond_main.c calls bond_add_vlan() that adds vlan_id 0 to bond->vlan_list When Infiniband interface is being enslaved - bond_enslave() in bond_main.c sees that bond->vlan_list is not empty (via bond_vlan_used()) and returns the error. See more details below. Version-Release number of selected component (if applicable): kernel-3.2.7-1.fc16.x86_64 How reproducible: Steps to Reproduce: 1. Configure bond2 with Infiniband slave ib0 (or some other device that has NETIF_F_VLAN_CHALLENGED flag) 2. ifup bond2 3. Observe echo '+ib0' > /sys/class/net/bond2/bonding/slaves failure Actual results: echo: write error: Operation not permitted bonding: bond2: Error: cannot enslave VLAN challenged slave ib0 on VLAN enabled bond bond2 Expected results: The enslaving works. bonding module does not print any errors. Additional info: static int vlan_device_event(...) { ... if ((event == NETDEV_UP) && (dev->features & NETIF_F_HW_VLAN_FILTER) && dev->netdev_ops->ndo_vlan_rx_add_vid) { pr_info("adding VLAN 0 to HW filter on device %s\n", dev->name); dev->netdev_ops->ndo_vlan_rx_add_vid(dev, 0); } ... } static void bond_vlan_rx_add_vid(struct net_device *bond_dev, uint16_t vid) { ... res = bond_add_vlan(bond, vid); ... } static int bond_add_vlan(struct bonding *bond, unsigned short vlan_id) { ... INIT_LIST_HEAD(&vlan->vlan_list); vlan->vlan_id = vlan_id; list_add_tail(&vlan->vlan_list, &bond->vlan_list); ... pr_debug("added VLAN ID %d on bond %s\n", vlan_id, bond->dev->name); ... } int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) { ... if (slave_dev->features & NETIF_F_VLAN_CHALLENGED) { pr_debug("%s: NETIF_F_VLAN_CHALLENGED\n", slave_dev->name); if (bond_vlan_used(bond)) { pr_err("%s: Error: cannot enslave VLAN challenged slave %s on VLAN enabled bond %s\n", bond_dev->name, slave_dev->name, bond_dev->name); return -EPERM; ... } Mar 19 21:22:56 raptor kernel: [528511.145669] 8021q: adding VLAN 0 to HW filter on device bond2 Mar 19 21:22:56 raptor kernel: [528511.145672] bonding: bond: bond2, vlan id 0 Mar 19 21:22:56 raptor kernel: [528511.145675] bonding: added VLAN ID 0 on bond bond2 Mar 19 21:22:56 raptor kernel: [528511.145677] bonding: event_dev: bond2, event: 1 Mar 19 21:22:56 raptor kernel: [528511.145679] bonding: IFF_MASTER Mar 19 21:22:56 raptor kernel: [528511.201175] ib0: enabling connected mode will cause multicast packet drops Mar 19 21:22:56 raptor kernel: [528511.203841] bonding: bond2: Adding slave ib0. Mar 19 21:22:56 raptor kernel: [528511.203846] bonding: ib0: NETIF_F_VLAN_CHALLENGED Mar 19 21:22:56 raptor kernel: [528511.203848] bonding: bond2: Error: cannot enslave VLAN challenged slave ib0 on VLAN enabled bond bond2 Mar 19 21:22:57 raptor kernel: [528511.249553] ib1: enabling connected mode will cause multicast packet drops Mar 19 21:22:57 raptor kernel: [528511.252253] bonding: bond2: Adding slave ib1. Mar 19 21:22:57 raptor kernel: [528511.252257] bonding: ib1: NETIF_F_VLAN_CHALLENGED Mar 19 21:22:57 raptor kernel: [528511.252260] bonding: bond2: Error: cannot enslave VLAN challenged slave ib1 on VLAN enabled bond bond2