RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2115631 - [bonding]bonding using link local ipv6 addr as ns_ip6_target can't up
Summary: [bonding]bonding using link local ipv6 addr as ns_ip6_target can't up
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: LiLiang
URL:
Whiteboard:
: 2127164 (view as bug list)
Depends On:
Blocks: 2091421
TreeView+ depends on / blocked
 
Reported: 2022-08-05 04:49 UTC by LiLiang
Modified: 2023-05-09 09:40 UTC (History)
3 users (show)

Fixed In Version: kernel-5.14.0-171.el9
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-09 07:59:50 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src/kernel centos-stream-9 merge_requests 1376 0 None opened bonding: fixes for 9.2 2022-09-17 00:40:32 UTC
Red Hat Issue Tracker RHELPLAN-130351 0 None None None 2022-08-05 04:51:04 UTC
Red Hat Product Errata RHSA-2023:2458 0 None None None 2023-05-09 08:00:31 UTC

Description LiLiang 2022-08-05 04:49:33 UTC
Description of problem:
bonding consist of bnxt_en ports using link local ipv6 addr as ns_ip6_target can't up

Version-Release number of selected component (if applicable):
[root@dell-per740-02 ns_ip6_target]# ethtool -i enp59s0f0np0
driver: bnxt_en
version: 5.14.0-138.el9.x86_64
firmware-version: 218.0.219.13/pkg 218.0.219.21
expansion-rom-version: 
bus-info: 0000:3b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
[root@dell-per740-02 ns_ip6_target]# lspci -s 0000:3b:00.0
3b:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
[root@dell-per740-02 ns_ip6_target]# uname -r
5.14.0-138.el9.x86_64

How reproducible:
always

Steps to Reproduce:
1. setup ip addr at peer system
```
# disable port ipv6 accept_ra and setup link local address manually
disable_ipv6_ra()
{
        local nic=$1
        local ipSegment1=$2
        local ipSegment2=$3
        local linkLocalAddr=$(printf "fe80::%02x:%02x" $ipSegment1 $ipSegment2)
        echo 0 > /proc/sys/net/ipv6/conf/$nic/accept_ra
        echo 1 > /proc/sys/net/ipv6/conf/$nic/keep_addr_on_down
        ip addr flush $nic
        ip addr add ${linkLocalAddr}/64 dev $nic
        ip link set $nic up
}

ip link set enp94s0f0 up
disable_ipv6_ra enp94s0f0 23 2
ip addr add 172.20.23.2/24 dev enp94s0f0
sleep 2
ip addr add 2009:23:11::3/64 dev enp94s0f0
```
# ip addr show
6: enp94s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b4:96:91:dc:73:7c brd ff:ff:ff:ff:ff:ff
    inet 172.20.23.2/24 scope global enp94s0f0
       valid_lft forever preferred_lft forever
    inet6 2009:23:11::3/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::17:2/64 scope link 
       valid_lft forever preferred_lft forever

2. setup bond at local system using bnxt_en ports as members and using link local addr as ns_ip6_target
```
# disable port ipv6 accept_ra and setup link local address manually
disable_ipv6_ra()
{
        local nic=$1
        local ipSegment1=$2
        local ipSegment2=$3
        local linkLocalAddr=$(printf "fe80::%02x:%02x" $ipSegment1 $ipSegment2)
        echo 0 > /proc/sys/net/ipv6/conf/$nic/accept_ra
        echo 1 > /proc/sys/net/ipv6/conf/$nic/keep_addr_on_down
        ip addr flush $nic
        ip addr add ${linkLocalAddr}/64 dev $nic
        ip link set $nic up
}

ip link add name bond0 type bond mode 1 ns_ip6_target fe80::17:02 arp_validate 3 arp_interval 2000
disable_ipv6_ra bond0 23 1
ip link set bond0 up
ip link set enp59s0f0np0 down
ip link set enp59s0f1np1 down
ip link set enp59s0f0np0 master bond0
ip link set enp59s0f1np1 master bond0
ip addr add 172.20.23.1/24 dev bond0
ip addr add 2009:23:11::1/64 dev bond0
```

3. bond can't up
[root@dell-per740-02 ns_ip6_target]# ip link show enp59s0f0np0
6: enp59s0f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:b7:00:90 brd ff:ff:ff:ff:ff:ff
[root@dell-per740-02 ns_ip6_target]# ip link show enp59s0f1np1
7: enp59s0f1np1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:b7:00:90 brd ff:ff:ff:ff:ff:ff permaddr 00:0a:f7:b7:00:91
[root@dell-per740-02 ns_ip6_target]# ip link show bond0
21: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:0a:f7:b7:00:90 brd ff:ff:ff:ff:ff:ff

[root@dell-per740-02 ns_ip6_target]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v5.14.0-138.el9.x86_64

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: None
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
ARP Polling Interval (ms): 2000
ARP Missed Max: 2
ARP IP target/s (n.n.n.n form):
NS IPv6 target/s (xx::xx form): fe80::17:2

Slave Interface: enp59s0f0np0
MII Status: going back
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0a:f7:b7:00:90
Slave queue ID: 0

Slave Interface: enp59s0f1np1
MII Status: going down
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0a:f7:b7:00:91
Slave queue ID: 0

Actual results:
bond down

Expected results:
bond up

Additional info:
I found no ipv6 neighbour solicitation sent from bonding member ports:
 
[root@dell-per740-02 ns_ip6_target]# tcpdump -i enp59s0f0np0 -enn icmp6 -Q out
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp59s0f0np0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C
0 packets captured
59 packets received by filter
0 packets dropped by kernel

[root@dell-per740-02 ns_ip6_target]# tcpdump -i enp59s0f1np1 -enn icmp6 -Q out
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp59s0f1np1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C
0 packets captured
9 packets received by filter
0 packets dropped by kernel

Comment 2 LiLiang 2022-08-06 02:25:05 UTC
I can reproduce this issue using veth ports, looks like this is a bonding problem.

```
# disable port ipv6 accept_ra and setup link local address manually
disable_ipv6_ra()
{
        local nic=$1
        local ipSegment1=$2
        local ipSegment2=$3
        local linkLocalAddr=$(printf "fe80::%02x:%02x" $ipSegment1 $ipSegment2)
        echo 0 > /proc/sys/net/ipv6/conf/$nic/accept_ra
        echo 1 > /proc/sys/net/ipv6/conf/$nic/keep_addr_on_down
        echo 0 > /proc/sys/net/ipv6/conf/$nic/autoconf
        ip addr flush $nic
        ip addr add ${linkLocalAddr}/64 dev $nic
        ip link set $nic up
}

# in netns, disable port ipv6 accept_ra and setup link local address manually
netns_disable_ipv6_ra()
{
        local netns=$1
        local nic=$2
        local ipSegment1=$3
        local ipSegment2=$4
        local linkLocalAddr=$(printf "fe80::%02x:%02x" $ipSegment1 $ipSegment2)
        ip netns exec $netns bash <<- EOF
	echo 0 > /proc/sys/net/ipv6/conf/$nic/accept_ra
	echo 1 > /proc/sys/net/ipv6/conf/$nic/keep_addr_on_down
	ip addr flush $nic
	ip addr add ${linkLocalAddr}/64 dev $nic
	ip link set $nic up
	EOF
}

ip link add name br0 type bridge
ip link set br0 up

ip link add name bond_slave1 type veth peer name bond_slave1_p
ip link add name bond_slave2 type veth peer name bond_slave2_p

ip link add name veth1 type veth peer name veth2
ip netns add ns1
ip link set veth1 up
ip link set veth2 up
ip link set veth1 master br0
ip link set veth2 netns ns1
ip netns exec ns1 ip link set veth2 up
sleep 1
netns_disable_ipv6_ra ns1 veth2 19 2
ip netns exec ns1 ip addr add 2009:19:11::3/64 dev veth2

# cmd "netns_disable_ipv6_ra ns1 veth2 19 2" will add this ip in netns
linkLocalAddr=$(printf "fe80::%02x:%02x" 19 2)
ip link add name bond0 type bond mode 1 ns_ip6_target $linkLocalAddr arp_validate 3 arp_interval 1000
ip link set bond0 up
ip link set bond_slave1 master bond0
ip link set bond_slave2 master bond0
ip link set bond_slave1_p master br0
ip link set bond_slave2_p master br0
ip link set bond_slave1 up
ip link set bond_slave2 up
ip link set bond_slave1_p up
ip link set bond_slave2_p up
disable_ipv6_ra bond0 19 1
ip addr add 172.20.19.1/24 dev bond0
ip addr add 2009:19:11::1/64 dev bond0
```

[root@dell-per740-56 ns_ip6_target]# source reproducer
[root@dell-per740-56 ns_ip6_target]# ip netns exec ns1 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
69: veth2@if70: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:d5:f3:12:45:25 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 2009:19:11::3/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::13:2/64 scope link 
       valid_lft forever preferred_lft forever
[root@dell-per740-56 ns_ip6_target]# ip link show bond0
71: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 46:d3:a5:4d:19:be brd ff:ff:ff:ff:ff:ff
[root@dell-per740-56 ns_ip6_target]# ip link show bond_slave1
66: bond_slave1@bond_slave1_p: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 46:d3:a5:4d:19:be brd ff:ff:ff:ff:ff:ff
[root@dell-per740-56 ns_ip6_target]# ip link show bond_slave2
68: bond_slave2@bond_slave2_p: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 46:d3:a5:4d:19:be brd ff:ff:ff:ff:ff:ff
[root@dell-per740-56 ns_ip6_target]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v5.14.0-138.el9.x86_64

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: None
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP Missed Max: 2
ARP IP target/s (n.n.n.n form):
NS IPv6 target/s (xx::xx form): fe80::13:2

Slave Interface: bond_slave1
MII Status: going down
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 46:d3:a5:4d:19:be
Slave queue ID: 0

Slave Interface: bond_slave2
MII Status: going back
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d6:62:e8:ef:c4:9d
Slave queue ID: 0
[root@dell-per740-56 ns_ip6_target]# tcpdump -i bond_slave1 -enn
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on bond_slave1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
[root@dell-per740-56 ns_ip6_target]# tcpdump -i bond_slave2 -enn
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on bond_slave2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:22:41.611185 a6:0e:62:6f:c6:b8 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 70: fe80::a40e:62ff:fe6f:c6b8 > ff02::2: ICMP6, router solicitation, length 16
22:22:55.947170 3a:b5:b6:fa:36:1a > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 70: fe80::c5e:faff:feee:54c8 > ff02::2: ICMP6, router solicitation, length 16
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel

Comment 7 Jonathan Toppins 2022-08-08 15:22:51 UTC
Looks like your feature upstream still has some bugs.

Comment 8 Jonathan Toppins 2022-08-08 15:28:30 UTC
@liali you don't need to report all the failures on other physical devices. If the problem is reproducible on veth it is likely reproducible on all hardware.

Comment 9 Jonathan Toppins 2022-08-08 15:34:42 UTC
*** Bug 2112748 has been marked as a duplicate of this bug. ***

Comment 10 LiLiang 2022-08-09 01:34:59 UTC
(In reply to Jonathan Toppins from comment #8)
> @liali you don't need to report all the failures on other
> physical devices. If the problem is reproducible on veth it is likely
> reproducible on all hardware.

OK.

But bz #2112748 is not full same with this bz.

This bz only happen when using ipv6 lladdr ad ns_ip6_target. 

bz #2112748 is triggered when using ipv6 global addr.

Comment 11 Hangbin Liu 2022-08-09 05:53:24 UTC
This is because the default route via link-local target is br0. e.g.

# ip route get fe80::13:2
fe80::13:2 dev br0 proto kernel src fe80::5417:fbff:fe47:d341 metric 256 pref medium

If you set br0 addr to 10.0.0.1 and the arp target to 10.0.0.254, you will got the same result.

If you use topo like

     br0
      |
  |------|
veth0  veth1   init ns
  |      |
-------------
  |      |
veth0  veth1   ns1
  |      |
  --------
      |
    bond0

The link-local target should work.

Comment 12 LiLiang 2022-08-09 06:05:50 UTC
(In reply to Hangbin Liu from comment #11)
> This is because the default route via link-local target is br0. e.g.
> 
> # ip route get fe80::13:2
> fe80::13:2 dev br0 proto kernel src fe80::5417:fbff:fe47:d341 metric 256
> pref medium
> 
> If you set br0 addr to 10.0.0.1 and the arp target to 10.0.0.254, you will
> got the same result.
> 
> If you use topo like
> 
>      br0
>       |
>   |------|
> veth0  veth1   init ns
>   |      |
> -------------
>   |      |
> veth0  veth1   ns1
>   |      |
>   --------
>       |
>     bond0
> 
> The link-local target should work.

Using physical NIC without bridge being configure, this issue also could be reproduced. 
See comment #0 #1 #3 #5 #6

Comment 13 LiLiang 2022-08-09 06:30:55 UTC
I will double check this using physical NIC.

Comment 14 Hangbin Liu 2022-08-09 10:05:43 UTC
The root reason is when setting link local addr as target. when the bond device up, the ll addr is tentative. ipv6_dev_get_saddr ignored the tentative addr. So we got src addr failed and ns is not sent. I will check how to fix it.

Comment 15 Hangbin Liu 2022-08-17 07:47:59 UTC
(In reply to LiLiang from comment #2)
> I can reproduce this issue using veth ports, looks like this is a bonding
> problem.
> 
> ```
> # cmd "netns_disable_ipv6_ra ns1 veth2 19 2" will add this ip in netns
> linkLocalAddr=$(printf "fe80::%02x:%02x" 19 2)
> ip link add name bond0 type bond mode 1 ns_ip6_target $linkLocalAddr
> arp_validate 3 arp_interval 1000
> ip link set bond0 up
> ip link set bond_slave1 master bond0
> ip link set bond_slave2 master bond0
> ip link set bond_slave1_p master br0
> ip link set bond_slave2_p master br0
> ip link set bond_slave1 up
> ip link set bond_slave2 up
> ip link set bond_slave1_p up
> ip link set bond_slave2_p up
> disable_ipv6_ra bond0 19 1
> ip addr add 172.20.19.1/24 dev bond0
> ip addr add 2009:19:11::1/64 dev bond0
> ```

For the veth case, you add the slaves to bond first. As the peers are not up yet.
Both bond0 and slaves are down. So the bond0's link local address is keeping tentative.

If you set the peers up first. The bond0 will have a chance to up and do lladdr DAD.

Here if you update the case like

> #!/bin/bash
54,59d54
< ip link set bond_slave1 master bond0
< ip link set bond_slave2 master bond0
< ip link set bond_slave1_p master br0
< ip link set bond_slave2_p master br0
< ip link set bond_slave1 up
< ip link set bond_slave2 up
61a57,60
> ip link set bond_slave1_p master br0
> ip link set bond_slave2_p master br0
> ip link set bond_slave1 master bond0
> ip link set bond_slave2 master bond0

Here is the new script

```
#!/bin/bash
# disable port ipv6 accept_ra and setup link local address manually
disable_ipv6_ra()
{
        local nic=$1
        local ipSegment1=$2
        local ipSegment2=$3
        local linkLocalAddr=$(printf "fe80::%02x:%02x" $ipSegment1 $ipSegment2)
        echo 0 > /proc/sys/net/ipv6/conf/$nic/accept_ra
        echo 1 > /proc/sys/net/ipv6/conf/$nic/keep_addr_on_down
        echo 0 > /proc/sys/net/ipv6/conf/$nic/autoconf
        ip addr flush $nic
        ip addr add ${linkLocalAddr}/64 dev $nic
        ip link set $nic up
}

# in netns, disable port ipv6 accept_ra and setup link local address manually
netns_disable_ipv6_ra()
{
        local netns=$1
        local nic=$2
        local ipSegment1=$3
        local ipSegment2=$4
        local linkLocalAddr=$(printf "fe80::%02x:%02x" $ipSegment1 $ipSegment2)
        ip netns exec $netns bash <<- EOF
	echo 0 > /proc/sys/net/ipv6/conf/$nic/accept_ra
	echo 1 > /proc/sys/net/ipv6/conf/$nic/keep_addr_on_down
	ip addr flush $nic
	ip addr add ${linkLocalAddr}/64 dev $nic
	ip link set $nic up
	EOF
}

ip link add name br0 type bridge
ip link set br0 up

ip link add name bond_slave1 type veth peer name bond_slave1_p
ip link add name bond_slave2 type veth peer name bond_slave2_p

ip link add name veth1 type veth peer name veth2
ip netns add ns1
ip link set veth1 up
ip link set veth2 up
ip link set veth1 master br0
ip link set veth2 netns ns1
ip netns exec ns1 ip link set veth2 up
sleep 1
netns_disable_ipv6_ra ns1 veth2 19 2
ip netns exec ns1 ip addr add 2009:19:11::3/64 dev veth2

# cmd "netns_disable_ipv6_ra ns1 veth2 19 2" will add this ip in netns
linkLocalAddr=$(printf "fe80::%02x:%02x" 19 2)
ip link add name bond0 type bond mode 1 ns_ip6_target $linkLocalAddr arp_validate 3 arp_interval 1000
ip link set bond0 up
ip link set bond_slave1_p up
ip link set bond_slave2_p up
ip link set bond_slave1_p master br0
ip link set bond_slave2_p master br0
ip link set bond_slave1 master bond0
ip link set bond_slave2 master bond0
disable_ipv6_ra bond0 19 1
ip addr add 172.20.19.1/24 dev bond0
ip addr add 2009:19:11::1/64 dev bond0

```

With the new script the bond0 could up correct.

Comment 16 LiLiang 2022-08-17 09:22:18 UTC
(In reply to Hangbin Liu from comment #15)
> With the new script the bond0 could up correct.

Thanks for this info.

As you say in comment #14, this problem is really a bug, right? 

Do we have plan to fix it?

Comment 17 Hangbin Liu 2022-08-17 10:41:14 UTC
(In reply to LiLiang from comment #16)
> 
> As you say in comment #14, this problem is really a bug, right? 
> 
> Do we have plan to fix it?

Yes, I have a patch to fix it. But it will use tentative address when sending NS.
I think upstream may not accept it. So I'm working on another patch to just use any_addr to
send NS and update the NA checking function.

I need you to help test the new patch before post to upstream.

Thanks
Hangbin.

Comment 18 LiLiang 2022-08-17 23:14:37 UTC
(In reply to Hangbin Liu from comment #17)
> (In reply to LiLiang from comment #16)
> > 
> > As you say in comment #14, this problem is really a bug, right? 
> > 
> > Do we have plan to fix it?
> 
> Yes, I have a patch to fix it. But it will use tentative address when
> sending NS.
> I think upstream may not accept it. So I'm working on another patch to just
> use any_addr to
> send NS and update the NA checking function.
> 
> I need you to help test the new patch before post to upstream.

No problem, I can help.

Comment 19 Jonathan Toppins 2022-09-13 15:09:23 UTC
Hangbin, do we have a set of validation tests for fixing this feature? If so can you upstream them to the bonding kselftests infrastructure?

Comment 20 Hangbin Liu 2022-09-15 04:07:48 UTC
(In reply to Jonathan Toppins from comment #19)
> Hangbin, do we have a set of validation tests for fixing this feature? If so
> can you upstream them to the bonding kselftests infrastructure?

I don't have a reproducer in hand. I write one today. But the script could only test commits

592335a4164c bonding: accept unsolicited NA message
b7f14132bf58 bonding: use unspecified address if no available link local address

The script could not test commit
fd16eb948ea8 bonding: add all node mcast address when slave up
as I use veth interface for testing. The veth interface will add 33:33:00:00:00:01 automatically.

Is this OK?

Comment 21 Jonathan Toppins 2022-09-15 04:19:03 UTC
(In reply to Hangbin Liu from comment #20)
> (In reply to Jonathan Toppins from comment #19)
> > Hangbin, do we have a set of validation tests for fixing this feature? If so
> > can you upstream them to the bonding kselftests infrastructure?
> 
> I don't have a reproducer in hand. I write one today. But the script could
> only test commits
> 
> 592335a4164c bonding: accept unsolicited NA message
> b7f14132bf58 bonding: use unspecified address if no available link local
> address
> 
> The script could not test commit
> fd16eb948ea8 bonding: add all node mcast address when slave up
> as I use veth interface for testing. The veth interface will add
> 33:33:00:00:00:01 automatically.
> 
> Is this OK?

Yeah I think some minimum level of test/usage demonstration would be good so if it is not comprehensive because of veth limitations, this is ok.

Comment 22 Hangbin Liu 2022-09-15 06:43:36 UTC
(In reply to Jonathan Toppins from comment #21)
> 
> Yeah I think some minimum level of test/usage demonstration would be good so
> if it is not comprehensive because of veth limitations, this is ok.

OK, I will post my script to upstream.

Comment 23 Jonathan Toppins 2022-09-15 14:41:28 UTC
*** Bug 2127164 has been marked as a duplicate of this bug. ***

Comment 25 LiLiang 2022-09-20 05:33:21 UTC
TESTED this manually with bnxt_en and mlx4_en NIC.

Regression test also pass: https://beaker.engineering.redhat.com/recipes/12624155#tasks

Comment 26 Jonathan Toppins 2022-09-20 12:04:59 UTC
(In reply to LiLiang from comment #25)
> TESTED this manually with bnxt_en and mlx4_en NIC.
> 
> Regression test also pass:
> https://beaker.engineering.redhat.com/recipes/12624155#tasks

Do we have to test manually? Or are you saying you did some extra manual tests in addition to the automated regression testing we already have for this issue?

Comment 27 LiLiang 2022-09-21 02:58:54 UTC
(In reply to Jonathan Toppins from comment #26)
> (In reply to LiLiang from comment #25)
> > TESTED this manually with bnxt_en and mlx4_en NIC.
> > 
> > Regression test also pass:
> > https://beaker.engineering.redhat.com/recipes/12624155#tasks
> 
> Do we have to test manually? Or are you saying you did some extra manual
> tests in addition to the automated regression testing we already have for
> this issue?

Yes, it's extra testing.

Comment 32 LiLiang 2022-10-17 01:21:53 UTC
Hangbin,

I found this problem still exist with some NCI drivers. e.g. bnx2x, bonding using lladdr as ns_ip6_target can't up.

By tcpdump, Looks like the bonding members are not sending NS solicitation.

8: ens3f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:d3:0c:fc brd ff:ff:ff:ff:ff:ff
    altname enp94s0f0
9: ens3f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 00:10:18:d3:0c:fc brd ff:ff:ff:ff:ff:ff permaddr 00:10:18:d3:0c:fe
    altname enp94s0f1
14: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:10:18:d3:0c:fc brd ff:ff:ff:ff:ff:ff
[root@dell-per740-25 ns_ip6_target]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v5.14.0-162.6.1.el9_1.x86_64

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: None
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
ARP Polling Interval (ms): 2000
ARP Missed Max: 2
ARP IP target/s (n.n.n.n form):
NS IPv6 target/s (xx::xx form): fe81::2e:2

Slave Interface: ens3f0
MII Status: going back
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:10:18:d3:0c:fc
Slave queue ID: 0

Slave Interface: ens3f1
MII Status: going down
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:10:18:d3:0c:fe
Slave queue ID: 0
[root@dell-per740-25 ns_ip6_target]# uname -r
5.14.0-162.6.1.el9_1.x86_64
[root@dell-per740-25 ns_ip6_target]# ethtool -i ens3f0
driver: bnx2x
version: 5.14.0-162.6.1.el9_1.x86_64
firmware-version: 5.0.12 bc 5.0.13 phy aa0.406
expansion-rom-version: 
bus-info: 0000:5e:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
[root@dell-per740-25 ns_ip6_target]# lspci -s 0000:5e:00.0
5e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57711 10-Gigabit PCIe

Comment 33 LiLiang 2022-10-17 01:34:59 UTC
atlantic NIC also has this problem:

[root@dell-per740-83 ns_ip6_target]# ethtool -i enp59s0
driver: atlantic
version: 5.14.0-162.6.1.el9_1.x86_64
firmware-version: 3.1.69
expansion-rom-version: 
bus-info: 0000:3b:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: yes

Comment 34 Hangbin Liu 2022-10-17 02:18:20 UTC
(In reply to LiLiang from comment #32)
> Hangbin,
> 
> I found this problem still exist with some NCI drivers. e.g. bnx2x, bonding
> using lladdr as ns_ip6_target can't up.

Thanks for the report, I will check it this week.

Comment 35 Hangbin Liu 2022-10-21 08:30:59 UTC
> [root@dell-per740-25 ns_ip6_target]# cat /proc/net/bonding/bond0 
> Ethernet Channel Bonding Driver: v5.14.0-162.6.1.el9_1.x86_64

You need to test this on the fixed version kernel-5.14.0-171.el9

Comment 38 errata-xmlrpc 2023-05-09 07:59:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2458


Note You need to log in before you can comment on or make changes to this bug.