Bug 2151982 - [upstream] bonding: bug in "add per-port priority for failover re-selection"
Summary: [upstream] bonding: bug in "add per-port priority for failover re-selection"
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: unspecified
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: rc
: 9.3
Assignee: Jonathan Toppins
QA Contact: LiLiang
URL:
Whiteboard:
Depends On:
Blocks: 2092194 2221092
TreeView+ depends on / blocked
 
Reported: 2022-12-08 18:51 UTC by Jonathan Toppins
Modified: 2023-07-07 08:30 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-14 18:13:10 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-141704 0 None None None 2022-12-08 19:05:24 UTC

Description Jonathan Toppins 2022-12-08 18:51:28 UTC
This bug is an upstream bug, no RHEL product is affected at this time.

Bonding: add per-port priority for failover re-selection

For this patch, there is a problem.

When a higher prio slave change to up, it can't be selected as the newly active slave.


```
ip link add name veth0 type veth peer name veth0_p
ip link add name veth1 type veth peer name veth1_p
ip link add name veth2 type veth peer name veth2_p

ip link add name bond0 type bond mode 1 miimon 100 primary veth1 primary_reselect 0
ip link set bond0 up
ip link set veth0 master bond0
ip link set veth1 master bond0
ip link set veth2 master bond0
ip link set dev veth0 type bond_slave prio 0
ip link set dev veth1 type bond_slave prio 10
ip link set dev veth2 type bond_slave prio 11
ip -d link show veth0 | grep -q 'prio 0'
ip -d link show veth1 | grep -q 'prio 10'
ip -d link show veth2 | grep -q 'prio 11'

ip link set veth0 up
ip link set veth1 up
ip link set veth2 up
ip link set veth0_p up
ip link set veth1_p up
ip link set veth2_p up

ip link add name br0 type bridge
ip link set br0 up
ip link set veth0_p master br0
ip link set veth1_p master br0
ip link set veth2_p master br0
ip link add name veth3 type veth peer name veth3_p
ip netns add ns1
ip link set veth3_p master br0 up
ip link set veth3 netns ns1 up

# current active slave should be primary slave
active_slave=$(cat /sys/class/net/bond0/bonding/active_slave)
test $active_slave = "veth1" || echo "BUG: current active slave is not primary slave"

ip link set veth1 down
ip link set veth2 down
sleep 5

# current active slave should be the only one up slave veth0
active_slave=$(cat /sys/class/net/bond0/bonding/active_slave)
test $active_slave = "veth0" || echo "BUG: current active slave is not the only up slave veth0"

ip link set veth2 up
sleep 5

# higher priority slave veth2 should become active slave
active_slave=$(cat /sys/class/net/bond0/bonding/active_slave)
test $active_slave = "veth2" || echo "BUG: higher priorty slave didn't become active slave"

```

"BUG: higher priorty slave didn't become active slave" this step will fail.


From Hangbin:
---
Oh, thanks. I didn't notice this. When post patch, I only test
1. enslave a high prio slave
2. when setting current active slave down, bond will find high prio slave in the remaining up slaves.

I missed the test that when a high prio slave up, it should do failover and replace the current active slave.

The following patch should fix the issue(I omit the arp monitor fix)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index b9a882f182d2..d7351c416004 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2689,7 +2689,8 @@ static void bond_miimon_commit(struct bonding *bond)

                        bond_miimon_link_change(bond, slave, BOND_LINK_UP);

-                       if (!bond->curr_active_slave || slave == primary)
+                       if (!bond->curr_active_slave || slave == primary ||
+                           slave->prio > bond->curr_active_slave->prio)
                                goto do_failover;

                        continue;

Comment 1 Jonathan Toppins 2022-12-08 19:19:33 UTC
Hangbin, do you intend to upstream the proposed fix or do you want me to?

Comment 2 Hangbin Liu 2022-12-09 00:51:17 UTC
(In reply to Jonathan Toppins from comment #1)
> Hangbin, do you intend to upstream the proposed fix or do you want me to?

I talked with Liang yesterday. He would like to try the upstream work.
So I'm waiting for his selftest patch and post with mime together.
I will talk with Liang and see if we can post the patch set today.

Thanks
Hangbin

Comment 4 Jonathan Toppins 2022-12-14 17:52:50 UTC
v2 was accepted upstream.

Comment 5 Jonathan Toppins 2022-12-14 18:13:10 UTC
Closing this bug, it was really opened to track the upstream fix.


Note You need to log in before you can comment on or make changes to this bug.