Bug 1567634 - OVS daemon crashed while running netperf TCP_STREAM between guests over OVS/dpdk/bnxt
Summary: OVS daemon crashed while running netperf TCP_STREAM between guests over OVS/d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch
Version: 7.5
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Davide Caratti
QA Contact: Jean-Tsung Hsiao
URL:
Whiteboard:
: 1569883 (view as bug list)
Depends On:
Blocks: 1560628
TreeView+ depends on / blocked
 
Reported: 2018-04-15 16:49 UTC by Jean-Tsung Hsiao
Modified: 2018-05-03 14:37 UTC (History)
11 users (show)

Fixed In Version: openvswitch-2.9.0-18.el7fdn
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-03 14:37:49 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Jean-Tsung Hsiao 2018-04-15 16:49:30 UTC
Description of problem: OVS daemon crashed while running netpef TCP_STREAM between guests over OVS/dpdk/bnxt

Apr 15 12:08:35 netqe22 kernel: traps: pmd100[121395] general protection ip:557f82b29aea sp:7efbfaffc200 error:0 in ovs-vswitchd[557f82903000+459000]
Apr 15 12:08:35 netqe22 ovs-ctl: 2018-04-15T16:08:35Z|00001|unixctl|WARN|failed to connect to /var/run/openvswitch/ovs-vswitchd.121231.ctl
Apr 15 12:08:35 netqe22 ovs-appctl: ovs|00001|unixctl|WARN|failed to connect to /var/run/openvswitch/ovs-vswitchd.121231.ctl
Apr 15 12:08:35 netqe22 ovs-ctl: ovs-appctl: cannot connect to "/var/run/openvswitch/ovs-vswitchd.121231.ctl" (Connection refused)

Version-Release number of selected component (if applicable):

openvswitch-2.9.0-15.el7fdp.x86_64
[root@netqe22 proc]# uname -a
Linux netqe22.knqe.lab.eng.bos.redhat.com 3.10.0-862.el7.bz1558328.x86_64 #1 SMP Thu Mar 22 11:10:59 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

How reproducible: Reproducible


Steps to Reproduce:
Host: H1 and H2
1. Install 10/25 Gb bnxt on H1
2. Back to back connected to H2 via 10 Gb ixgbe
3. Configure OVS-dpdk vxlan tunnel with vhostuser guests
4. Normal ping should work
5. But, should failed on netper TCP_STREAM


Actual results: Failed


Expected results: Should work


Additional info:

Comment 2 Davide Caratti 2018-04-18 14:51:14 UTC
[root@wsfd-netdev85 ~]# systemctl restart openvswitch.service                                            
[root@wsfd-netdev85 ~]# pgrep vswitchd                                                                   
8223                                                                                                                                      
[root@wsfd-netdev85 ~]# gdb 8223     
(gdb) thread 60           
[Switching to thread 60 (Thread 0x7f528f7fe700 (LWP 8293))]                                              
#0  netdev_dpdk_rxq_recv (rxq=0x7f4f908dae80, batch=0x7f528f7fd760) at lib/netdev-dpdk.c:1880            
1880        struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);                                     
(gdb) c                   
Continuing.               
[New Thread 0x7f5357679c80 (LWP 8430)]              
[Thread 0x7f5357679c80 (LWP 8430) exited]           

Program received signal SIGSEGV, Segmentation fault.                                                     
miniflow_extract (packet=packet@entry=0x7f4f90db4200, dst=dst@entry=0x7f528f7fd3c8) at lib/flow.c:709    
709                 miniflow_push_macs(mf, dl_dst, data);                                                
(gdb) bt                  
#0  miniflow_extract (packet=packet@entry=0x7f4f90db4200, dst=dst@entry=0x7f528f7fd3c8)                  
    at lib/flow.c:709     
#1  0x000055bf4f352b75 in emc_processing (port_no=3, md_is_valid=false, n_batches=0x7f528f7fd6d8,        
    batches=0x7f528f7fd290, keys=<optimized out>, packets_=0x7f528f7fd760, pmd=0x7f5344358010)           
    at lib/dpif-netdev.c:5027                       
#2  dp_netdev_input__ (pmd=pmd@entry=0x7f5344358010, packets=packets@entry=0x7f528f7fd760,               
    md_is_valid=md_is_valid@entry=false, port_no=port_no@entry=3) at lib/dpif-netdev.c:5256              
#3  0x000055bf4f353226 in dp_netdev_input (port_no=3, packets=0x7f528f7fd760, pmd=0x7f5344358010)        
    at lib/dpif-netdev.c:5289                       
#4  dp_netdev_process_rxq_port (pmd=pmd@entry=0x7f5344358010, rxq=0x55bf503cd110, port_no=3)             
    at lib/dpif-netdev.c:3286                       
#5  0x000055bf4f3535fa in pmd_thread_main (f_=<optimized out>) at lib/dpif-netdev.c:4145                 
#6  0x000055bf4f3d0296 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:348               
#7  0x00007f535676cdd5 in start_thread () from /lib64/libpthread.so.0                                    
#8  0x00007f5355b6ab3d in clone () from /lib64/libc.so.6

Comment 3 Davide Caratti 2018-04-18 16:38:05 UTC
[root@wsfd-netdev85 ~]# ovs-vsctl show 
238e63b3-b5ce-4427-8dee-a42f418ce9a9
    Bridge "ovs_pvp_br0"
        Port "vhost0"
            Interface "vhost0"
                type: dpdkvhostuserclient
                options: {n_rxq="2", vhost-server-path="/tmp/vhost-sock0"}
        Port "ovs_pvp_br0"
            Interface "ovs_pvp_br0"
                type: internal
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:05:00.0", n_rxq="2"}
    Bridge "ovsbr1"
        Port "vxlan0"
            Interface "vxlan0"
                type: vxlan
                options: {dst_port="8472", key="1000", remote_ip="192.0.2.10"}
        Port "ovsbr1"
            Interface "ovsbr1"
                type: internal
    ovs_version: "2.9.0"
[root@wsfd-netdev85 ~]# ip address add dev ovs_pvp_br0 192.0.2.9/30
[root@wsfd-netdev85 ~]# ip address add dev ovsbr1 192.0.2.13/30
[root@wsfd-netdev85 ~]# ip link set mtu 1450 dev ovsbr1
[root@wsfd-netdev85 ~]# ip link set dev ovs_pvp_br0 up
[root@wsfd-netdev85 ~]# ip link set dev ovsbr1 up

(p5p1 at 0000:05:00.0 is connected to an external Linux host, that has vxlan0 over eth0. eth0 ip address is 192.0.2.10/30, vxlan0 addres is to 192.0.2.14/30). 

[root@wsfd-netdev85 ~]# ping 192.0.2.14
PING 192.0.2.14 (192.0.2.14) 56(84) bytes of data.
64 bytes from 192.0.2.14: icmp_seq=1 ttl=64 time=501 ms
64 bytes from 192.0.2.14: icmp_seq=2 ttl=64 time=0.053 ms
64 bytes from 192.0.2.14: icmp_seq=3 ttl=64 time=0.050 ms
^C
--- 192.0.2.14 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.050/167.256/501.667/236.464 ms

to generate the segfault:
[root@wsfd-netdev85 ~]# netperf  -H 192.0.2.14 -l 10

Comment 4 Jean-Tsung Hsiao 2018-04-19 19:25:06 UTC
Looks like a VXLAN related issue. I removed it from the equation, and
saw no such issue any more.

[root@localhost ~]# for i in {1..5}; do echo Test $i; netperf -H
172.16.33.106 -l 60; done
Test 1
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv   Send    Send                         
Socket Socket  Message  Elapsed             
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    60.00    2998.53  
Test 2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv   Send    Send                         
Socket Socket  Message  Elapsed             
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    60.00    2999.61  
Test 3
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv   Send    Send                         
Socket Socket  Message  Elapsed             
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    60.00    3005.10  
Test 4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv   Send    Send                         
Socket Socket  Message  Elapsed             
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    60.00    3010.29  
Test 5
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv   Send    Send                         
Socket Socket  Message  Elapsed             
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    60.00    2997.77  
[root@localhost ~]#

[root@netqe22 images]# ovs-vsctl show
b1c7ac41-ed9e-4e2b-9993-2e35dd03f64e
    Bridge "ovsbr0"
        Port "dpdk-10"
            Interface "dpdk-10"
                type: dpdk
                options: {dpdk-devargs="0000:07:00.0", n_rxq="1"}
        Port "vhost1"
            Interface "vhost1"
                type: dpdkvhostuser
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "vhost0"
            Interface "vhost0"
                type: dpdkvhostuser
    ovs_version: "2.9.0"
[root@netqe22 images]#

[root@netqe16 images]# ovs-vsctl show
1e55f622-13d0-42a7-9e3f-2e9744ee83cd
    Bridge "ovsbr0"
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "vhost1"
            Interface "vhost1"
                type: dpdkvhostuser
        Port "dpdk-10"
            Interface "dpdk-10"
                type: dpdk
                options: {dpdk-devargs="0000:84:00.0", n_rxq="1"}
        Port "vhost0"
            Interface "vhost0"
                type: dpdkvhostuser
    ovs_version: "2.9.0"

Comment 5 Jean-Tsung Hsiao 2018-04-19 21:35:46 UTC
> hi Jean,
>
> can you try the RPM at [1]? it contains a series of 3 patches developed
> today by Broadcom, and on my netdev85 I don't see crashes anymore with
> VXLAN. 

Looking good --- See netperf below. Will run more tests.
Is this going to fix "PvP not converging"  issue as well ?
Thanks!
Jean

[root@localhost ~]# netperf -H 172.16.33.106
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv   Send    Send                         
Socket Socket  Message  Elapsed             
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    10.00    2918.73  
[root@localhost ~]# netperf -H 172.16.33.106
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv   Send    Send                         
Socket Socket  Message  Elapsed             
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    10.00    3006.06  
[root@localhost ~]# netperf -H 172.16.33.106
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv   Send    Send                         
Socket Socket  Message  Elapsed             
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  16384  16384    10.00    2949.83  
[root@localhost ~]#

Comment 6 Jean-Tsung Hsiao 2018-04-19 22:26:26 UTC
Over vxlan looking solid --- see attached below.

Will run geneve testing next.


 [root@localhost ~]# for i in {1..5}
> do
> echo Test $i
> !net -l 60
netperf -H 172.16.33.106 -l 60
> done
Test 1
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    3016.04   
Test 2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    3013.58   
Test 3
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    2991.29   
Test 4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    2967.69   
Test 5
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    3013.59   
[root@localhost ~]# for i in {1..5}; do echo Test $i; netperf -H 172.16.33.106 -l 60 -t TCP_MAERTS; done
Test 1
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    4020.62   
Test 2
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    4019.86   
Test 3
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    4207.85   
Test 4
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    4052.89   
Test 5
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    4176.87   
[root@localhost ~]#

Comment 7 Jean-Tsung Hsiao 2018-04-19 22:55:22 UTC
Geneve tunnelling test passed.

[root@localhost ~]# for i in {1..5}
> do
> echo Test $i
> !net -l 60
netperf -H 172.16.33.106 -l 60
> done
Test 1
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    3019.86   
Test 2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    3018.38   
Test 3
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    3021.84   
Test 4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    3017.67   
Test 5
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    3016.38   
[root@localhost ~]# for i in {1..5}; do echo Test $i; netperf -H 172.16.33.106 -l 60 -t TCP_MAERTS; done
Test 1
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    4421.30   
Test 2
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
catcher: timer popped with times_up != 0
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    4338.12   
Test 3
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    4435.75   
Test 4
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    4470.80   
Test 5
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    60.00    4372.40   
[root@localhost ~]#

Comment 8 Davide Caratti 2018-04-20 11:22:10 UTC
*** Bug 1569883 has been marked as a duplicate of this bug. ***

Comment 9 Ajit Khaparde 2018-04-20 16:33:48 UTC
Patches applied upstream. Commit ids on dpdk-next-net tree:
e5c04b1d1bc83115a2cc28615a5d5c6645c66cd4
52bececea4d2327d842ee40e6c99388b6b3d8f93
02bd8182658600ebf2cbe61168e80c19ce4cdaa5

Thanks
Ajit

Comment 10 Davide Caratti 2018-04-20 18:36:50 UTC
(In reply to Ajit Khaparde from comment #9)
> Patches applied upstream. Commit ids on dpdk-next-net tree:
> e5c04b1d1bc83115a2cc28615a5d5c6645c66cd4
> 52bececea4d2327d842ee40e6c99388b6b3d8f93
> 02bd8182658600ebf2cbe61168e80c19ce4cdaa5
> 
> Thanks
> Ajit

thanks Ajit!

backport done and pushed to dist-git, thus I'm moving state to MODIFIED.
regards,

-- 
davide

Comment 11 Timothy Redaelli 2018-05-03 14:37:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1267


Note You need to log in before you can comment on or make changes to this bug.