Bug 1567634
| Summary: | OVS daemon crashed while running netperf TCP_STREAM between guests over OVS/dpdk/bnxt | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jean-Tsung Hsiao <jhsiao> |
| Component: | openvswitch | Assignee: | Davide Caratti <dcaratti> |
| Status: | CLOSED ERRATA | QA Contact: | Jean-Tsung Hsiao <jhsiao> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 7.5 | CC: | ajit.khaparde, atragler, ctrautma, gospo, haili, jhsiao, kzhang, qding, rkhan, sxavier, tredaelli |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | openvswitch-2.9.0-18.el7fdn | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-05-03 14:37:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1560628 | ||
|
Description
Jean-Tsung Hsiao
2018-04-15 16:49:30 UTC
[root@wsfd-netdev85 ~]# systemctl restart openvswitch.service
[root@wsfd-netdev85 ~]# pgrep vswitchd
8223
[root@wsfd-netdev85 ~]# gdb 8223
(gdb) thread 60
[Switching to thread 60 (Thread 0x7f528f7fe700 (LWP 8293))]
#0 netdev_dpdk_rxq_recv (rxq=0x7f4f908dae80, batch=0x7f528f7fd760) at lib/netdev-dpdk.c:1880
1880 struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
(gdb) c
Continuing.
[New Thread 0x7f5357679c80 (LWP 8430)]
[Thread 0x7f5357679c80 (LWP 8430) exited]
Program received signal SIGSEGV, Segmentation fault.
miniflow_extract (packet=packet@entry=0x7f4f90db4200, dst=dst@entry=0x7f528f7fd3c8) at lib/flow.c:709
709 miniflow_push_macs(mf, dl_dst, data);
(gdb) bt
#0 miniflow_extract (packet=packet@entry=0x7f4f90db4200, dst=dst@entry=0x7f528f7fd3c8)
at lib/flow.c:709
#1 0x000055bf4f352b75 in emc_processing (port_no=3, md_is_valid=false, n_batches=0x7f528f7fd6d8,
batches=0x7f528f7fd290, keys=<optimized out>, packets_=0x7f528f7fd760, pmd=0x7f5344358010)
at lib/dpif-netdev.c:5027
#2 dp_netdev_input__ (pmd=pmd@entry=0x7f5344358010, packets=packets@entry=0x7f528f7fd760,
md_is_valid=md_is_valid@entry=false, port_no=port_no@entry=3) at lib/dpif-netdev.c:5256
#3 0x000055bf4f353226 in dp_netdev_input (port_no=3, packets=0x7f528f7fd760, pmd=0x7f5344358010)
at lib/dpif-netdev.c:5289
#4 dp_netdev_process_rxq_port (pmd=pmd@entry=0x7f5344358010, rxq=0x55bf503cd110, port_no=3)
at lib/dpif-netdev.c:3286
#5 0x000055bf4f3535fa in pmd_thread_main (f_=<optimized out>) at lib/dpif-netdev.c:4145
#6 0x000055bf4f3d0296 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:348
#7 0x00007f535676cdd5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f5355b6ab3d in clone () from /lib64/libc.so.6
[root@wsfd-netdev85 ~]# ovs-vsctl show
238e63b3-b5ce-4427-8dee-a42f418ce9a9
Bridge "ovs_pvp_br0"
Port "vhost0"
Interface "vhost0"
type: dpdkvhostuserclient
options: {n_rxq="2", vhost-server-path="/tmp/vhost-sock0"}
Port "ovs_pvp_br0"
Interface "ovs_pvp_br0"
type: internal
Port "dpdk0"
Interface "dpdk0"
type: dpdk
options: {dpdk-devargs="0000:05:00.0", n_rxq="2"}
Bridge "ovsbr1"
Port "vxlan0"
Interface "vxlan0"
type: vxlan
options: {dst_port="8472", key="1000", remote_ip="192.0.2.10"}
Port "ovsbr1"
Interface "ovsbr1"
type: internal
ovs_version: "2.9.0"
[root@wsfd-netdev85 ~]# ip address add dev ovs_pvp_br0 192.0.2.9/30
[root@wsfd-netdev85 ~]# ip address add dev ovsbr1 192.0.2.13/30
[root@wsfd-netdev85 ~]# ip link set mtu 1450 dev ovsbr1
[root@wsfd-netdev85 ~]# ip link set dev ovs_pvp_br0 up
[root@wsfd-netdev85 ~]# ip link set dev ovsbr1 up
(p5p1 at 0000:05:00.0 is connected to an external Linux host, that has vxlan0 over eth0. eth0 ip address is 192.0.2.10/30, vxlan0 addres is to 192.0.2.14/30).
[root@wsfd-netdev85 ~]# ping 192.0.2.14
PING 192.0.2.14 (192.0.2.14) 56(84) bytes of data.
64 bytes from 192.0.2.14: icmp_seq=1 ttl=64 time=501 ms
64 bytes from 192.0.2.14: icmp_seq=2 ttl=64 time=0.053 ms
64 bytes from 192.0.2.14: icmp_seq=3 ttl=64 time=0.050 ms
^C
--- 192.0.2.14 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.050/167.256/501.667/236.464 ms
to generate the segfault:
[root@wsfd-netdev85 ~]# netperf -H 192.0.2.14 -l 10
Looks like a VXLAN related issue. I removed it from the equation, and
saw no such issue any more.
[root@localhost ~]# for i in {1..5}; do echo Test $i; netperf -H
172.16.33.106 -l 60; done
Test 1
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 2998.53
Test 2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 2999.61
Test 3
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 3005.10
Test 4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 3010.29
Test 5
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 2997.77
[root@localhost ~]#
[root@netqe22 images]# ovs-vsctl show
b1c7ac41-ed9e-4e2b-9993-2e35dd03f64e
Bridge "ovsbr0"
Port "dpdk-10"
Interface "dpdk-10"
type: dpdk
options: {dpdk-devargs="0000:07:00.0", n_rxq="1"}
Port "vhost1"
Interface "vhost1"
type: dpdkvhostuser
Port "ovsbr0"
Interface "ovsbr0"
type: internal
Port "vhost0"
Interface "vhost0"
type: dpdkvhostuser
ovs_version: "2.9.0"
[root@netqe22 images]#
[root@netqe16 images]# ovs-vsctl show
1e55f622-13d0-42a7-9e3f-2e9744ee83cd
Bridge "ovsbr0"
Port "ovsbr0"
Interface "ovsbr0"
type: internal
Port "vhost1"
Interface "vhost1"
type: dpdkvhostuser
Port "dpdk-10"
Interface "dpdk-10"
type: dpdk
options: {dpdk-devargs="0000:84:00.0", n_rxq="1"}
Port "vhost0"
Interface "vhost0"
type: dpdkvhostuser
ovs_version: "2.9.0"
> hi Jean, > > can you try the RPM at [1]? it contains a series of 3 patches developed > today by Broadcom, and on my netdev85 I don't see crashes anymore with > VXLAN. Looking good --- See netperf below. Will run more tests. Is this going to fix "PvP not converging" issue as well ? Thanks! Jean [root@localhost ~]# netperf -H 172.16.33.106 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 2918.73 [root@localhost ~]# netperf -H 172.16.33.106 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3006.06 [root@localhost ~]# netperf -H 172.16.33.106 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 2949.83 [root@localhost ~]# Over vxlan looking solid --- see attached below.
Will run geneve testing next.
[root@localhost ~]# for i in {1..5}
> do
> echo Test $i
> !net -l 60
netperf -H 172.16.33.106 -l 60
> done
Test 1
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 3016.04
Test 2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 3013.58
Test 3
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 2991.29
Test 4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 2967.69
Test 5
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 3013.59
[root@localhost ~]# for i in {1..5}; do echo Test $i; netperf -H 172.16.33.106 -l 60 -t TCP_MAERTS; done
Test 1
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 4020.62
Test 2
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 4019.86
Test 3
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 4207.85
Test 4
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 4052.89
Test 5
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 4176.87
[root@localhost ~]#
Geneve tunnelling test passed.
[root@localhost ~]# for i in {1..5}
> do
> echo Test $i
> !net -l 60
netperf -H 172.16.33.106 -l 60
> done
Test 1
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 3019.86
Test 2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 3018.38
Test 3
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 3021.84
Test 4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 3017.67
Test 5
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 3016.38
[root@localhost ~]# for i in {1..5}; do echo Test $i; netperf -H 172.16.33.106 -l 60 -t TCP_MAERTS; done
Test 1
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 4421.30
Test 2
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
catcher: timer popped with times_up != 0
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 4338.12
Test 3
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 4435.75
Test 4
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 4470.80
Test 5
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.33.106 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 60.00 4372.40
[root@localhost ~]#
*** Bug 1569883 has been marked as a duplicate of this bug. *** Patches applied upstream. Commit ids on dpdk-next-net tree: e5c04b1d1bc83115a2cc28615a5d5c6645c66cd4 52bececea4d2327d842ee40e6c99388b6b3d8f93 02bd8182658600ebf2cbe61168e80c19ce4cdaa5 Thanks Ajit (In reply to Ajit Khaparde from comment #9) > Patches applied upstream. Commit ids on dpdk-next-net tree: > e5c04b1d1bc83115a2cc28615a5d5c6645c66cd4 > 52bececea4d2327d842ee40e6c99388b6b3d8f93 > 02bd8182658600ebf2cbe61168e80c19ce4cdaa5 > > Thanks > Ajit thanks Ajit! backport done and pushed to dist-git, thus I'm moving state to MODIFIED. regards, -- davide Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1267 |