Bug 2151982
| Summary: | [upstream] bonding: bug in "add per-port priority for failover re-selection" | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Jonathan Toppins <jtoppins> |
| Component: | kernel | Assignee: | Jonathan Toppins <jtoppins> |
| kernel sub component: | Bonding | QA Contact: | LiLiang <liali> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | haliu, jtoppins, network-qe |
| Version: | unspecified | Keywords: | Triaged |
| Target Milestone: | rc | ||
| Target Release: | 9.3 | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-12-14 18:13:10 UTC | Type: | Feature Request |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2092194, 2221092 | ||
Hangbin, do you intend to upstream the proposed fix or do you want me to? (In reply to Jonathan Toppins from comment #1) > Hangbin, do you intend to upstream the proposed fix or do you want me to? I talked with Liang yesterday. He would like to try the upstream work. So I'm waiting for his selftest patch and post with mime together. I will talk with Liang and see if we can post the patch set today. Thanks Hangbin v2 was accepted upstream. Closing this bug, it was really opened to track the upstream fix. |
This bug is an upstream bug, no RHEL product is affected at this time. Bonding: add per-port priority for failover re-selection For this patch, there is a problem. When a higher prio slave change to up, it can't be selected as the newly active slave. ``` ip link add name veth0 type veth peer name veth0_p ip link add name veth1 type veth peer name veth1_p ip link add name veth2 type veth peer name veth2_p ip link add name bond0 type bond mode 1 miimon 100 primary veth1 primary_reselect 0 ip link set bond0 up ip link set veth0 master bond0 ip link set veth1 master bond0 ip link set veth2 master bond0 ip link set dev veth0 type bond_slave prio 0 ip link set dev veth1 type bond_slave prio 10 ip link set dev veth2 type bond_slave prio 11 ip -d link show veth0 | grep -q 'prio 0' ip -d link show veth1 | grep -q 'prio 10' ip -d link show veth2 | grep -q 'prio 11' ip link set veth0 up ip link set veth1 up ip link set veth2 up ip link set veth0_p up ip link set veth1_p up ip link set veth2_p up ip link add name br0 type bridge ip link set br0 up ip link set veth0_p master br0 ip link set veth1_p master br0 ip link set veth2_p master br0 ip link add name veth3 type veth peer name veth3_p ip netns add ns1 ip link set veth3_p master br0 up ip link set veth3 netns ns1 up # current active slave should be primary slave active_slave=$(cat /sys/class/net/bond0/bonding/active_slave) test $active_slave = "veth1" || echo "BUG: current active slave is not primary slave" ip link set veth1 down ip link set veth2 down sleep 5 # current active slave should be the only one up slave veth0 active_slave=$(cat /sys/class/net/bond0/bonding/active_slave) test $active_slave = "veth0" || echo "BUG: current active slave is not the only up slave veth0" ip link set veth2 up sleep 5 # higher priority slave veth2 should become active slave active_slave=$(cat /sys/class/net/bond0/bonding/active_slave) test $active_slave = "veth2" || echo "BUG: higher priorty slave didn't become active slave" ``` "BUG: higher priorty slave didn't become active slave" this step will fail. From Hangbin: --- Oh, thanks. I didn't notice this. When post patch, I only test 1. enslave a high prio slave 2. when setting current active slave down, bond will find high prio slave in the remaining up slaves. I missed the test that when a high prio slave up, it should do failover and replace the current active slave. The following patch should fix the issue(I omit the arp monitor fix) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index b9a882f182d2..d7351c416004 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -2689,7 +2689,8 @@ static void bond_miimon_commit(struct bonding *bond) bond_miimon_link_change(bond, slave, BOND_LINK_UP); - if (!bond->curr_active_slave || slave == primary) + if (!bond->curr_active_slave || slave == primary || + slave->prio > bond->curr_active_slave->prio) goto do_failover; continue;