RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1566981 - [ovs/mlx4_en]nic always is disabled in ovs lacp mode bonding
Summary: [ovs/mlx4_en]nic always is disabled in ovs lacp mode bonding
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch
Version: 7.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Marcelo Ricardo Leitner
QA Contact: ovs-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-13 09:00 UTC by LiLiang
Modified: 2018-05-04 07:35 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-04 07:35:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description LiLiang 2018-04-13 09:00:01 UTC
Description of problem:
I added 2 mlx4_en nics to ovs bonding.
But only 1 is enabled, the other is always disabled.

[root@hp-dl580g8-01 topo]# ovs-appctl bond/show
---- bond0 ----
bond_mode: active-backup
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
lacp_status: negotiated
lacp_fallback_ab: false
active slave mac: e4:1d:2d:1e:d5:11(ens4d1)

slave ens4: disabled
	may_enable: false

slave ens4d1: enabled
	active slave
	may_enable: true


Version-Release number of selected component (if applicable):
[root@hp-dl580g8-01 topo]# uname -r
3.10.0-860.el7.x86_64

[root@hp-dl580g8-01 topo]# ethtool -i ens4
driver: mlx4_en
version: 4.0-0
firmware-version: 2.40.7000
expansion-rom-version: 
bus-info: 0000:84:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[root@hp-dl580g8-01 topo]# ethtool -i ens4d1
driver: mlx4_en
version: 4.0-0
firmware-version: 2.40.7000
expansion-rom-version: 
bus-info: 0000:84:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

[root@hp-dl580g8-01 topo]# lspci -s 0000:84:00.0
84:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]


How reproducible:
always

+---------------------------+
|       system1             |
|                           |           
|   mlx4_nic1    mlx4_nic2  |
+---------------------------+
        |           |
        |           |
+---------------------------+
|                           |
| cisco Nexus 6004 switch   |
|                           |
+---------------------------+

Steps to Reproduce:
1.create ovs-bridge on system1
#ovs-vsctl add-br ovsbr0

2.config bonding on cisco sw
sw-c6004# configure t
Enter configuration commands, one per line.  End with CNTL/Z.
sw-c6004(config)# interface port-channel2
sw-c6004(config-if)# switchport mode trunk
sw-c6004(config-if)# switchport trunk allowed vlan 1-100
sw-c6004(config-if)# interface Eth2/5
sw-c6004(config-if)# lacp rate fast
sw-c6004(config-if)# channel-group 2 mode passive
sw-c6004(config-if)# interface Eth2/1
sw-c6004(config-if)# lacp rate fast
sw-c6004(config-if)# channel-group 2 mode passive
sw-c6004(config-if)# end


3.create ovs bond on system1, add 2 mlx4 nic to bond
ovs-vsctl add-bond ovsbr0 bond0 ens4 ens4d1 lacp=active

4.the slave ens4 always is disabled, can't be enabled
[root@hp-dl580g8-01 topo]# ip link set ens4 up
[root@hp-dl580g8-01 topo]# ovs-appctl bond/show
---- bond0 ----
bond_mode: active-backup
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
lacp_status: negotiated
lacp_fallback_ab: false
active slave mac: e4:1d:2d:1e:d5:11(ens4d1)

slave ens4: disabled
	may_enable: false

slave ens4d1: enabled
	active slave
	may_enable: true



Actual results:


Expected results:


Additional info:
The cisco Nexus 6004 switch also connected to another 2 i40e nics. I also tested this on i40e nic, no this issue on i40e.

Comment 2 Marcelo Ricardo Leitner 2018-04-13 14:39:34 UTC
This may be interesting.
As ConnectX-3 exposes a single PCI device for 2 ports, we are often resorting to specifying the port using its MAC address. But with bonding, the MAC address may get changed to the same on both ports. Though this is mostly done for DPDK, not for OVS.

Comment 3 Aniss Loughlam 2018-04-13 16:18:37 UTC
I test this topology [1] every week with [2] on both hosts, no connectivity issue there.

[1] https://github.com/jpirko/lnst/blob/master/recipes/regression_tests/phase2/virtual_ovs_bridge_2_vlans_over_active_backup_bond.README

[2] [root@wsfd-netdev38 ~]# lspci | grep Mellanox
03:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root@wsfd-netdev38 ~]# ethtool -i p6p1
driver: mlx4_en
version: 4.0-0
firmware-version: 2.42.5000
[...]


[root@wsfd-netdev38 ~]# ovs-vsctl show
15c8e56e-ee3f-446e-aadc-d1a53b229d5f
    ovs_version: "2.7.0"
[root@wsfd-netdev38 ~]# modprobe openvswitch
[root@wsfd-netdev38 ~]# ovs-vsctl add-br t_br0
[root@wsfd-netdev38 ~]# ovs-vsctl add-port t_br0 vnet2 tag=10
[root@wsfd-netdev38 ~]# ovs-vsctl add-port t_br0 vnet3 tag=20
[root@wsfd-netdev38 ~]# ovs-vsctl add-bond t_br0 bond  p6p1 p6p2  bond_mode=active-backup other_config:bond-miimon-interval=100
[root@wsfd-netdev38 ~]# ip link set p6p2 up
[root@wsfd-netdev38 ~]# ip link set p6p1 up
[root@wsfd-netdev38 ~]# ip link set t_br0 up
[root@wsfd-netdev38 ~]# ovs-appctl bond/show
---- bond ----
bond_mode: active-backup
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
lacp_status: off
active slave mac: 24:8a:07:f7:39:41(p6p2)

slave p6p1: enabled
	may_enable: true

slave p6p2: enabled
	active slave
	may_enable: true

Comment 4 Marcelo Ricardo Leitner 2018-04-13 21:06:14 UTC
LiLiang, your firmware is considerably older than Aniss'. Please try upgrading it.

Comment 5 LiLiang 2018-04-17 06:54:42 UTC
(In reply to Marcelo Ricardo Leitner from comment #4)
> LiLiang, your firmware is considerably older than Aniss'. Please try
> upgrading it.

Aniss and me are using different bond mode, this issue only occur when mode=lacp 

Me:
ovs-vsctl add-bond ovsbr0 bond0 ens4 ens4d1 lacp=active

Aniss:
ovs-vsctl add-bond t_br0 bond  p6p1 p6p2  bond_mode=active-backup other_config:bond-miimon-interval=100

I have upgraded my firmware, this issue still occur.

[root@hp-dl580g8-01 ~]# ethtool -i ens4
driver: mlx4_en
version: 4.0-0
firmware-version: 2.42.5000
expansion-rom-version: 
bus-info: 0000:84:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

Comment 6 Marcelo Ricardo Leitner 2018-04-17 17:50:23 UTC
Cc'ing Alaa and Erez, Mellanox on-site engineers:
I don't have a ConnectX-3 available right now. Can you reproduce the issue?

Comment 7 Marcelo Ricardo Leitner 2018-04-17 22:08:00 UTC
LiLang, please double-check your switch if both ports are enabled for LACP. Maybe one of the isn't?

Comment 8 Marcelo Ricardo Leitner 2018-04-17 22:18:26 UTC
Aniss and I tested this and the only way we could reproduce this is by having one of the ports on the switch not properly configured.

Comment 9 LiLiang 2018-04-18 09:40:45 UTC
(In reply to Marcelo Ricardo Leitner from comment #7)
> LiLang, please double-check your switch if both ports are enabled for LACP.
> Maybe one of the isn't?

I confirm that LACP are both enabled.

But i tested this on another systme with the same NIC modle, this issue don't occur.The nic on that system are connected to juniper switch.

But the cisco Nexus 6004 switch also connected to another 2 i40e nics. I also tested this on i40e nic, no this issue on i40e.

So i don't know if this is a kernel issue or a switch issue...

Comment 10 Marcelo Ricardo Leitner 2018-04-18 14:39:11 UTC
Or cable issue maybe?
Can you please try swapping the ports on the Nexus 6004 and see if the issue is reflected on the host?
Or the NIC has a faulty port maybe?

Comment 11 LiLiang 2018-05-04 07:35:14 UTC
I direct connect mlx4_en to i40e and test ovs bond, this issue don't occur.


Note You need to log in before you can comment on or make changes to this bug.