Bug 1569883

Summary: [bnxt_en]ping failed on guests over ovs-qinq-dpdk after netperf with large packets
Product: Red Hat Enterprise Linux 7 Reporter: haidong li <haili>
Component: openvswitchAssignee: Open vSwitch development team <ovs-team>
Status: CLOSED DUPLICATE QA Contact: haidong li <haili>
Severity: high Docs Contact: Davide Caratti <dcaratti>
Priority: unspecified    
Version: 7.5CC: atragler, ctrautma, dcaratti, haili, qding
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-20 11:22:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description haidong li 2018-04-20 08:01:56 UTC
Description of problem:
ping failed on guests over ovs-dpdk after neterperf with large packets

Version-Release number of selected component (if applicable):
 ovs_version: "2.9.0"

How reproducible:
everytime

Steps to Reproduce:
1.add dpdk0 to ovs with bnxt_en card
2.create vhostuser port and attach to guest,create vlan interface on the port in guest
3.set vhostuser port to dot1q-tunnel mode
4.run netperf TCP_MAERTS on guest to remote,then error happened ,and the port can't successfully ping remote any more.

console log:
[root@dell-per730-16 ~]# [15062.099452] pmd71[117386]: segfault at 6672657394 ip 000055be75be0c2a sp 00007f7fa8ff0200 error 4 in ovs-vswitchd[55be759b4000+461000]
[15066.953005] vfio-pci 0000:04:00.0: enabling device (0400 -> 0402)
[15067.060455] vfio_ecap_init: 0000:04:00.0 hiding ecap 0x19@0x300
[15067.067062] vfio_ecap_init: 0000:04:00.0 hiding ecap 0x1f@0x200
[15067.194395] vfio-pci 0000:04:00.1: enabling device (0400 -> 0402)

log of steps:
[root@localhost ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:00:00:01:03:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::200:ff:fe01:302/64 scope link 
       valid_lft forever preferred_lft forever
5: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1492 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 00:00:00:01:05:02 brd ff:ff:ff:ff:ff:ff
    inet 172.31.165.1/24 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 2004::200:ff:fe01:502/64 scope global mngtmpaddr dynamic 
       valid_lft 81656sec preferred_lft 9656sec
    inet6 2001:db8:165::1/64 scope global 
       valid_lft forever preferred_lft forever
6: vlan3@eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1480 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
    link/ether 00:00:00:01:05:02 brd ff:ff:ff:ff:ff:ff
    inet 172.31.145.1/24 scope global vlan3
       valid_lft forever preferred_lft forever
    inet6 2001:db8:145::1/24 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::200:ff:fe01:502/64 scope link 
       valid_lft forever preferred_lft forever
[root@localhost ~]# 
[root@localhost ~]# netperf -H 172.31.145.2 
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.31.145.2 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    3989.52   
[root@localhost ~]# netperf -t  TCP_MAERTS -H 172.31.145.2  -- -m 16384
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.31.145.2 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    11.89       0.03   
[root@localhost ~]# ping 172.31.145.2
PING 172.31.145.2 (172.31.145.2) 56(84) bytes of data.
64 bytes from 172.31.145.2: icmp_seq=1 ttl=64 time=0.329 ms
64 bytes from 172.31.145.2: icmp_seq=6 ttl=64 time=0.277 ms
64 bytes from 172.31.145.2: icmp_seq=7 ttl=64 time=0.261 ms
64 bytes from 172.31.145.2: icmp_seq=11 ttl=64 time=0.258 ms
64 bytes from 172.31.145.2: icmp_seq=35 ttl=64 time=0.265 ms
64 bytes from 172.31.145.2: icmp_seq=36 ttl=64 time=0.249 ms
From 172.31.145.1 icmp_seq=45 Destination Host Unreachable
From 172.31.145.1 icmp_seq=46 Destination Host Unreachable
From 172.31.145.1 icmp_seq=47 Destination Host Unreachable
From 172.31.145.1 icmp_seq=48 Destination Host Unreachable


Additional info:
job link:
https://beaker.engineering.redhat.com/recipes/5043839#task70709538

Comment 2 Davide Caratti 2018-04-20 08:21:17 UTC
(In reply to haidong li from comment #0)
> 
> console log:
> [root@dell-per730-16 ~]# [15062.099452] pmd71[117386]: segfault at
> 6672657394 ip 000055be75be0c2a sp 00007f7fa8ff0200 error 4 in
> ovs-vswitchd[55be759b4000+461000]

broadcom just provided a fix for PMD segfault. Not sure if it's the same issue, but it's worth asking: can you check if the problem still happens using RPM at https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=15845936 ?

thanks!
--
davide

Comment 3 haidong li 2018-04-20 09:21:54 UTC
(In reply to Davide Caratti from comment #2)
> (In reply to haidong li from comment #0)
> > 
> > console log:
> > [root@dell-per730-16 ~]# [15062.099452] pmd71[117386]: segfault at
> > 6672657394 ip 000055be75be0c2a sp 00007f7fa8ff0200 error 4 in
> > ovs-vswitchd[55be759b4000+461000]
> 
> broadcom just provided a fix for PMD segfault. Not sure if it's the same
> issue, but it's worth asking: can you check if the problem still happens
> using RPM at
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=15845936 ?
> 
> thanks!
> --
> davide

I have just tried with the packet you provided,the qinq-dpdk works normally now:

[root@dell-per730-16 ovs_qinq_dpdk]# rpm -qa | grep openvswitch
kernel-kernel-networking-openvswitch-ovs_qinq_dpdk-1.3-19.noarch
openvswitch-2.9.0-15.el7fdp.bz1567634.x86_64

[root@localhost ~]# netperf -4 -t TCP_STREAM -H 172.31.145.2   -- -m 16384
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.31.145.2 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.01    3958.68   
[root@localhost ~]# echo $?
0
[root@localhost ~]# logout

Red Hat Enterprise Linux Server 7.5 Beta (Maipo)
Kernel 3.10.0-855.el7.x86_64 on an x86_64

localhost login: 
spawn virsh console g1
Connected to domain g1
Escape character is ^]

Red Hat Enterprise Linux Server 7.5 Beta (Maipo)
Kernel 3.10.0-855.el7.x86_64 on an x86_64

localhost login: root
Password: 
Last login: Fri Apr 20 04:51:53 on ttyS0
[root@localhost ~]# netperf -4 -t TCP_MAERTS -H 172.31.145.2   -- -m 16384
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.31.145.2 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    4915.25   
[root@localhost ~]# echo $?


||  TCP_STREAMv4  ||  TCP_MAERTSv4  ||  UDP_STREAMv4  ||  TCP_STREAMv6  ||  TCP_MAERTSv6  ||  UDP_STREAMv6  || 
||  ------------  ||  ------------  ||  ------------  ||  ------------  ||  ------------  ||  ------------  || 
||  3958.68       ||  4915.25       ||  688.80        ||  3335.31       ||  4722.67       ||  1004.57       ||

Comment 4 Davide Caratti 2018-04-20 11:22:10 UTC
(In reply to haidong li from comment #3)
> (In reply to Davide Caratti from comment #2)
> > (In reply to haidong li from comment #0)
> > > 
> > > console log:
> > > [root@dell-per730-16 ~]# [15062.099452] pmd71[117386]: segfault at
> > > 6672657394 ip 000055be75be0c2a sp 00007f7fa8ff0200 error 4 in
> > > ovs-vswitchd[55be759b4000+461000]
> > 
> > broadcom just provided a fix for PMD segfault. Not sure if it's the same
> > issue, but it's worth asking: can you check if the problem still happens
> > using RPM at
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=15845936 ?
> > 
> > thanks!
> > --
> > davide
> 
> I have just tried with the packet you provided,the qinq-dpdk works normally
> now:
> 

hello haidong, thanks for doing the test!

that's good news, then the fix is the same as the series for bz1567634.

*** This bug has been marked as a duplicate of bug 1567634 ***