Bug 1878339

Summary: [DPDK][E810][vfio-pci] "NO-CARRIER" on the back to back connected peer port
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Zhiqiang Fang <zfang>
Component: DPDKAssignee: Maxime Coquelin <maxime.coquelin>
DPDK sub component: other QA Contact: liting <tli>
Status: NEW --- Docs Contact:
Severity: unspecified    
Priority: unspecified CC: ctrautma, hewang, jhsiao, ktraynor, qding
Version: FDP 20.E   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hewang@redhat.com,qding@redhat.com Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zhiqiang Fang 2020-09-12 05:01:27 UTC
Description of problem:

Port state changes to "NO-CARRIER" on the back to back connected peer system if E810 binds vfio-pci. 


Version-Release number of selected component (if applicable):

# rpm -qa | grep dpdk
dpdk-19.11-5.el8_2.x86_64
dpdk-tools-19.11-5.el8_2.x86_64

and also on version dpdk/19.11/4.el8fdb.1


How reproducible:

Topo:

  System_1 (E810, vfio-pci)   <---->  System_2

In my environment, System_1 ens1f0 (E810 port1) back to back connect to  System_2 ens2f0 (mlx5_core port1) and System_1 ens1f1 <----> System_2 ens2f1.





Steps to Reproduce:

~~~ 1. Initial state

The E810 port ens1f0 at 0000:3b:00.0 uses kernel driver.

System_1# dpdk-devbind --status

Network devices using kernel driver
===================================
0000:01:00.0 'I350 Gigabit Network Connection 1521' if=eno3 drv=igb unused= *Active*
0000:01:00.1 'I350 Gigabit Network Connection 1521' if=eno4 drv=igb unused= 
0000:19:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' if=eno1 drv=ixgbe unused= 
0000:19:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' if=eno2 drv=ixgbe unused= 
0000:3b:00.0 'Ethernet Controller E810-C for QSFP 1592' if=ens1f0 drv=ice unused= 
0000:3b:00.1 'Ethernet Controller E810-C for QSFP 1592' if=ens1f1 drv=ice unused= 
0000:5e:00.0 'MT27800 Family [ConnectX-5] 1017' if=ens2f0 drv=mlx5_core unused= 
0000:5e:00.1 'MT27800 Family [ConnectX-5] 1017' if=ens2f1 drv=mlx5_core unused= 


System_1 E810 port ens1f0 is up

System_1#ip link
...
3: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:18:fc:d8 brd ff:ff:ff:ff:ff:ff


The peer System_2 ens2f0 is up

System_2#ip link
...
7: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 98:03:9b:8e:0c:18 brd ff:ff:ff:ff:ff:ff



~~~ 2.Bind vfio-pci to E810

On System_1

System_1# modprobe vfio-pci
System_1# modprobe vfio
System_1# dpdk-devbind -b vfio-pci 0000:3b:00.0

System_1# dpdk-devbind --status

Network devices using DPDK-compatible driver
============================================
0000:3b:00.0 'Ethernet Controller E810-C for QSFP 1592' drv=vfio-pci unused=ice

Network devices using kernel driver
===================================
0000:01:00.0 'I350 Gigabit Network Connection 1521' if=eno3 drv=igb unused=vfio-pci *Active*
...


On System_2

System_2#ip link
...
7: ens2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 98:03:9b:8e:0c:18 brd ff:ff:ff:ff:ff:ff



~~~ 3. Unbind E810 port from vfio-pci and goes back to kernel driver(ice), then the peer port on System_2 comes up.
 
System_1# driverctl unset-override 0000:3b:00.0

System_2# ip link
...
7: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 98:03:9b:8e:0c:18 brd ff:ff:ff:ff:ff:ff


Actual results:

 E810's peer port shows "NO-CARRIER".

Expected results:

 E810's peer port shows up without change when vfio-pci is bound.
 


Additional info:

1. Same issue on another port of E810 card. When ens1f1 binds vfio-pci, its peer ports ens2f1 becomes "NO-CARRIER".

2. We tried to start a testpmd cross ens1f0 and ens1f1 (on System_1) when vfio-pci bound on these two ports. The peer ports ens2f0 and ens2f1 came up, and pktgen on System_2 was working fine.

3. Tested on i40e card, the issue was not seen.