Bug 1977243
| Summary: | ovs-vswitchd crashed at netdev_linux_batch_rxq_recv_sock | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Jianlin Shi <jishi> |
| Component: | openvswitch2.15 | Assignee: | Eelco Chaudron <echaudro> |
| Status: | CLOSED UPSTREAM | QA Contact: | ovs-qe |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | FDP 21.E | CC: | ctrautma, dhill, fhallal, fleitner, i.maximets, jhsiao, ralongi |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-14 11:08:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Might be related (I didn't look at the coredump, just remembered this upstream report): https://mail.openvswitch.org/pipermail/ovs-discuss/2021-June/051209.html Here is some backtrace info:
(gdb) bt
#0 dp_packet_set_size (b=0x0, b=0x0, v=13028) at ../lib/dp-packet.h:578
#1 netdev_linux_batch_rxq_recv_sock (rx=<optimized out>, mtu=<optimized out>, batch=0x7ffeefcd56a0) at ../lib/netdev-linux.c:1308
#2 0x0000559536ec657f in netdev_linux_rxq_recv (rxq_=0x55953a498cc0, batch=0x7ffeefcd56a0, qfill=0x0) at ../lib/netdev-linux.c:1508
#3 0x0000559536e19c75 in netdev_rxq_recv (rx=<optimized out>, batch=batch@entry=0x7ffeefcd56a0, qfill=<optimized out>) at ../lib/netdev.c:727
#4 0x0000559536dea4bb in dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fa096880010, rxq=0x55953a498c30, port_no=3) at ../lib/dpif-netdev.c:4749
#5 0x0000559536deb26c in dpif_netdev_run (dpif=<optimized out>) at ../lib/dpif-netdev.c:5792
#6 0x0000559536d9d98c in type_run (type=<optimized out>) at ../ofproto/ofproto-dpif.c:370
#7 0x0000559536d8562f in ofproto_type_run (datapath_type=datapath_type@entry=0x55953a3c76d0 "netdev") at ../ofproto/ofproto.c:1780
#8 0x0000559536d729fc in bridge_run__ () at ../vswitchd/bridge.c:3245
#9 0x0000559536d78f5d in bridge_run () at ../vswitchd/bridge.c:3310
#10 0x00005595367111c5 in main (argc=<optimized out>, argv=<optimized out>) at ../vswitchd/ovs-vswitchd.c:127
(gdb) frame 1
#1 netdev_linux_batch_rxq_recv_sock (rx=<optimized out>, mtu=<optimized out>, batch=0x7ffeefcd56a0) at ../lib/netdev-linux.c:1308
1308 dp_packet_set_size(pkt, mmsgs[i].msg_len - std_len);
(gdb) l
1303
1304 if (mmsgs[i].msg_len > std_len) {
1305 /* Build a single linear TSO packet by prepending the data from
1306 * std_len buffer to the aux_buf. */
1307 pkt = rx->aux_bufs[i];
1308 dp_packet_set_size(pkt, mmsgs[i].msg_len - std_len);
1309 dp_packet_push(pkt, dp_packet_data(buffers[i]), std_len);
1310 /* The headroom should be the same in buffers[i], pkt and
1311 * DP_NETDEV_HEADROOM. */
1312 dp_packet_resize(pkt, DP_NETDEV_HEADROOM, 0);
(gdb) p pkt
$2 = (struct dp_packet *) 0x0
(gdb) p mmsgs[i].msg_len
$4 = 14546
(gdb) p std_len
$5 = 1518
(gdb) ovs_dump_dp_netdev ports
(struct dp_netdev *) 0x55953a3ca500: name = ovs-netdev, class = (struct dpif_class *) 0x5595373901a0 <dpif_netdev_class>
(struct dp_netdev_port *) 0x55953a44ff60:
port_no = 0, n_rxq = 1, type = tap
netdev = (struct netdev *) 0x55953a44fcb0: name = ovs-netdev, n_txq/rxq = 1/1
(struct dp_netdev_port *) 0x55953a49a280:
port_no = 4, n_rxq = 1, type = tap
netdev = (struct netdev *) 0x55953a49a050: name = br-phy, n_txq/rxq = 1/1
(struct dp_netdev_port *) 0x559539773660:
port_no = 1, n_rxq = 1, type = tap
netdev = (struct netdev *) 0x55953a496110: name = br-int, n_txq/rxq = 1/1
(struct dp_netdev_port *) 0x55953a497530:
port_no = 2, n_rxq = 1, type = tap
netdev = (struct netdev *) 0x55953a4972e0: name = ls1p3, n_txq/rxq = 1/1
(struct dp_netdev_port *) 0x55953a498ba0:
port_no = 3, n_rxq = 1, type = system
netdev = (struct netdev *) 0x55953a3c4f70: name = ens1f0, n_txq/rxq = 1/1
So port 3 is the receiving port, so the question is, did you manually enable TSO on this port through the ethtool port?
If so can you test this with all ports in OVS to have TSO disabled? In the meantime, I'll do a bit more research.
I'll repeat here what I said on IRC to keep the history. We're using AF_PACKET sockets here with MSG_TRUNC, and 'man 7 packet' says following about that: When the MSG_TRUNC flag is passed to recvmsg(2), recv(2), or recvfrom(2), the real length of the packet on the wire is always returned, even when it is longer than the buffer. So, I suppose that issue was there for a long time, but implementation of userspace-tso exposed it with a segfault. In short, the size returned from the kernel is not the size of the actual data received. If it's larger than expected, we need to truncate it down to the size of a buffer we're using. Hi, We use recvmmsg() instead, so the return is the number of msgs received and it is already truncated to the buffer size provided by the userspace. We would need to iterate over all received packets checking if the flag MSG_TRUNC is set in mmsgs[i].msg_hdr.msg_flags. If that's the case, the packet is truncated/corrupted. Since it is not passing MSG_PEEK, the next entries will receive the next packets in the queue. The real length of the packet is provided in msg_hdr.msg_len: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/socket.c#n274 fbl (In reply to Flavio Leitner from comment #5) > Hi, > > We use recvmmsg() instead, so the return is the number of msgs received > and it is already truncated to the buffer size provided by the userspace. Sure. I think, that man page just wasn't updated to include recvmmsg(), but the behaviour should be logically the same. > > We would need to iterate over all received packets checking if the flag > MSG_TRUNC is set in mmsgs[i].msg_hdr.msg_flags. If that's the case, the > packet is truncated/corrupted. This might be a good solution, we already iterating over them to check if they are too short/needs linearization. (In reply to Eelco Chaudron from comment #3) > So port 3 is the receiving port, so the question is, did you manually enable > TSO on this port through the ethtool port? TSO is enabled by default > > If so can you test this with all ports in OVS to have TSO disabled? In the > meantime, I'll do a bit more research. after disable TSO for ens1f0, the crash still happened: [root@wsfd-advnetlab21 dpdk]# ethtool -k ens1f0 | grep tcp tcp-segmentation-offload: off tx-tcp-segmentation: off tx-tcp-ecn-segmentation: off [fixed] tx-tcp-mangleid-segmentation: off tx-tcp6-segmentation: off [root@wsfd-advnetlab21 dpdk]# coredumpctl list TIME PID UID GID SIG COREFILE EXE Thu 2021-07-01 20:58:45 EDT 11976 993 1001 11 present /usr/sbin/ovs-vswitchd Thu 2021-07-01 20:58:48 EDT 12204 993 1001 11 present /usr/sbin/ovs-vswitchd (In reply to Jianlin Shi from comment #7) > (In reply to Eelco Chaudron from comment #3) > > > So port 3 is the receiving port, so the question is, did you manually enable > > TSO on this port through the ethtool port? > > TSO is enabled by default > > > > > If so can you test this with all ports in OVS to have TSO disabled? In the > > meantime, I'll do a bit more research. > > after disable TSO for ens1f0, the crash still happened: Did you happen to take a core dump, just want to make sure it's the same issue (probably traffic from another kernel interface with TSO enabled)? coredump with tso disabled: http://netqe-bj.usersys.redhat.com/share/jishi/bz1977243/core.ovs-vswitchd.993.23f9e741e194440da67a725551d51daf.25484.1625208423000000.lz4 (In reply to Jianlin Shi from comment #9) > coredump with tso disabled: > http://netqe-bj.usersys.redhat.com/share/jishi/bz1977243/core.ovs-vswitchd. > 993.23f9e741e194440da67a725551d51daf.25484.1625208423000000.lz4 The same problem as before, will look into this. Thanks for the quick response. So for your setup to work you need to disable both GSO and TSO. Which makes me wonder why you add a kernel interface to begin with? This is a DPDK environment, so you should use DPDK physical interfaces if you care about performance. I will work on a patch to make sure OVS no longer crashes when GSO and/or TSO is enabled. Sent out a patch to the OVS mailing list: https://patchwork.ozlabs.org/project/openvswitch/patch/162523574862.28549.11301540064982906102.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com/ (In reply to Eelco Chaudron from comment #11) > So for your setup to work you need to disable both GSO and TSO. > > Which makes me wonder why you add a kernel interface to begin with? This is > a DPDK environment, so you should use DPDK physical interfaces if you care > about performance. > > I will work on a patch to make sure OVS no longer crashes when GSO and/or > TSO is enabled. Hi Eelco, I'm not testing performance but just trying to create the topology and test functionality. and as the kernel interface is supported from the guide line: https://docs.openvswitch.org/en/latest/howto/userspace-tunneling/, so I tried to add a kernel interface into ovs. ovs won't crash if disable tso, gso and gro:
[root@wsfd-advnetlab21 dpdk]# ethtool -k ens1f0 | grep -E "tcp|generi"
tx-checksum-ip-generic: on
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: off
generic-segmentation-offload: off
generic-receive-offload: off
Thanks for confirming the issue is solved with disabling tso/gso and gro. I sent out a v2 of the patch, https://patchwork.ozlabs.org/project/openvswitch/patch/162548620436.40409.579366497986013480.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com/ and it should solve the issue. Regarding the typology test, it's supported to use kernel interfaces in combination with OVS-DPDK but it's not a recommended solution. So please do not suggest anyone to use it as part of an OVN/OVS deployment. Patch got accepted upstream and backported all the way to 2.13. Next FDP release will automatically pick up the upstream fixes. |
Description of problem: when try to set datapath_type as netdev for ovn setup, ovs-vswitchd crashed at netdev_linux_batch_rxq_recv_sock Version-Release number of selected component (if applicable): [root@wsfd-advnetlab18 test]# rpm -qa | grep -E "openvswitch2.15|ovn-2021" ovn-2021-host-21.03.0-40.el8fdp.x86_64 python3-openvswitch2.15-2.15.0-24.el8fdp.x86_64 ovn-2021-central-21.03.0-40.el8fdp.x86_64 ovn-2021-21.03.0-40.el8fdp.x86_64 openvswitch2.15-2.15.0-24.el8fdp.x86_64 How reproducible: Always Steps to Reproduce: Server: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x02 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024" systemctl restart openvswitch ovs-vsctl add-br br-phy -- set bridge br-phy datapath_type=netdev ovs-vsctl add-port br-phy ens1f0 ip link set br-phy up ip link set ens1f0 up ip addr add 20.0.173.25/24 dev br-phy pmd_cmd="python2 /root/test/get_pmd.py" cpu_mask=$($pmd_cmd --cmd host_pmd --nic ens1f1 --pmd 2) ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=$cpu_mask ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.173.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.173.25 external_ids:ovn-bridge-datapath-type= netdev systemctl restart ovn-controller ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 ls1p1 ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 192.168.173.1 2001:173::1" ovn-nbctl lsp-add ls1 ls1p2 ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:02:02 192.168.173.2 2001:173::2" ovn-nbctl lsp-add ls1 ls1p3 ovn-nbctl lsp-set-addresses ls1p3 "00:00:00:01:03:02 192.168.173.3 2001:173::3" ovn-nbctl lsp-add ls1 ls1p4 ovn-nbctl lsp-set-addresses ls1p4 "00:00:00:01:04:02 192.168.173.4 2001:173::4" ovs-vsctl add-port br-int ls1p2 -- set interface ls1p2 type=internal external_ids:iface-id=ls1p2 ip netns add ls1p2 ip link set ls1p2 netns ls1p2 ip netns exec ls1p2 ip link set ls1p2 address 00:00:00:01:02:02 ip netns exec ls1p2 ip link set ls1p2 up ip netns exec ls1p2 ip addr add 192.168.173.2/24 dev ls1p2 ip netns exec ls1p2 ip addr add 2001:173::2/64 dev ls1p2 ovs-vsctl add-port br-int ls1p4 -- set interface ls1p4 type=internal external_ids:iface-id=ls1p4 ip netns add ls1p4 ip link set ls1p4 netns ls1p4 ip netns exec ls1p4 ip link set ls1p4 address 00:00:00:01:04:02 ip netns exec ls1p4 ip link set ls1p4 up ip netns exec ls1p4 ip addr add 192.168.173.4/24 dev ls1p4 ip netns exec ls1p4 ip addr add 2001:173::4/64 dev ls1p4 Client: systemctl start openvswitch ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x02 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024" systemctl restart openvswitch ovs-vsctl add-br br-phy -- set bridge br-phy datapath_type=netdev ovs-vsctl add-port br-phy ens1f0 ip link set br-phy up ip link set ens1f0 up ip addr add 20.0.173.26/24 dev br-phy pmd_cmd="python2 /root/test/get_pmd.py" cpu_mask=$($pmd_cmd --cmd host_pmd --nic ens1f1 --pmd 2) ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=$cpu_mask ovs-vsctl set open . external_ids:system-id=hv0 external_ids:ovn-remote=tcp:20.0.173.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.173.26 external_ids:ovn-bridge-datapath-type=netdev systemctl start ovn-controller ovs-vsctl add-port br-int ls1p3 -- set interface ls1p3 type=internal external_ids:iface-id=ls1p3 ip netns add ls1p3 ip link set ls1p3 netns ls1p3 ip netns exec ls1p3 ip link set ls1p3 address 00:00:00:01:03:02 ip netns exec ls1p3 ip link set ls1p3 up ip netns exec ls1p3 ip addr add 192.168.173.3/24 dev ls1p3 ip netns exec ls1p3 ip addr add 2001:173::3/64 dev ls1p3 Actual results: ovs-vswitchd on Client would crash: PID: 87996 (ovs-vswitchd) [60/1936] UID: 993 (openvswitch) GID: 1001 (hugetlbfs) Signal: 11 (SEGV) Timestamp: Tue 2021-06-29 05:53:38 EDT (5min ago) Command Line: ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:in> Executable: /usr/sbin/ovs-vswitchd Control Group: /system.slice/ovs-vswitchd.service Unit: ovs-vswitchd.service Slice: system.slice Boot ID: fe461b1328a644d78f02b3ea881615e0 Machine ID: acafba64de404d9fbb4862ef24220018 Hostname: wsfd-advnetlab21.anl.lab.eng.bos.redhat.com Storage: /var/lib/systemd/coredump/core.ovs-vswitchd.993.fe461b1328a644d78f02b3ea881615e0.879> Message: Process 87996 (ovs-vswitchd) of user 993 dumped core. Stack trace of thread 87996: #0 0x0000559536ec5ec6 netdev_linux_batch_rxq_recv_sock (ovs-vswitchd) #1 0x0000559536ec657f netdev_linux_rxq_recv (ovs-vswitchd) #2 0x0000559536e19c75 netdev_rxq_recv (ovs-vswitchd) #3 0x0000559536dea4bb dp_netdev_process_rxq_port (ovs-vswitchd) #4 0x0000559536deb26c dpif_netdev_run (ovs-vswitchd) #5 0x0000559536d9d98c type_run (ovs-vswitchd) #6 0x0000559536d8562f ofproto_type_run (ovs-vswitchd) #7 0x0000559536d729fc bridge_run__ (ovs-vswitchd) #8 0x0000559536d78f5d bridge_run (ovs-vswitchd) #9 0x00005595367111c5 main (ovs-vswitchd) #10 0x00007fa099f50493 __libc_start_main (libc.so.6) #11 0x000055953671243e _start (ovs-vswitchd) Stack trace of thread 88000: #0 0x00007fa099ff5d98 __nanosleep (libc.so.6) #1 0x00007fa099ff5c9e sleep (libc.so.6) #2 0x0000559536eae153 xsleep (ovs-vswitchd) #3 0x0000559536ee2b31 dpdk_watchdog (ovs-vswitchd) #4 0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd) #5 0x00007fa09bfcc14a start_thread (libpthread.so.0) #6 0x00007fa09a029dc3 __clone (libc.so.6) Stack trace of thread 88002: #0 0x00007fa09a01ea41 __poll (libc.so.6) #1 0x0000559536ea3af5 time_poll (ovs-vswitchd) #2 0x0000559536e8c9cc poll_block (ovs-vswitchd) #3 0x0000559536e75cda ovsrcu_postpone_thread (ovs-vswitchd) #4 0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd) #5 0x00007fa09bfcc14a start_thread (libpthread.so.0) #6 0x00007fa09a029dc3 __clone (libc.so.6) Stack trace of thread 88003: #0 0x00007fa09a01ea41 __poll (libc.so.6) #1 0x0000559536ea3af5 time_poll (ovs-vswitchd) #2 0x0000559536e8c9cc poll_block (ovs-vswitchd) #3 0x0000559536f180fa clean_thread_main (ovs-vswitchd) #4 0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd) #5 0x00007fa09bfcc14a start_thread (libpthread.so.0) #6 0x00007fa09a029dc3 __clone (libc.so.6) Stack trace of thread 88006: #0 0x00007fa09a01ea41 __poll (libc.so.6) #1 0x0000559536ea3af5 time_poll (ovs-vswitchd) #2 0x0000559536e8c9cc poll_block (ovs-vswitchd) #3 0x0000559536daf124 udpif_revalidator (ovs-vswitchd) #4 0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd) #5 0x00007fa09bfcc14a start_thread (libpthread.so.0) #6 0x00007fa09a029dc3 __clone (libc.so.6) Stack trace of thread 87999: #0 0x00007fa09bfd5a07 accept (libpthread.so.0) #1 0x0000559536d6d15b socket_listener (ovs-vswitchd) #2 0x00007fa09bfcc14a start_thread (libpthread.so.0) #3 0x00007fa09a029dc3 __clone (libc.so.6) Stack trace of thread 88005: #0 0x00007fa09a01ea41 __poll (libc.so.6) #1 0x0000559536ea3af5 time_poll (ovs-vswitchd) #2 0x0000559536e8c9cc poll_block (ovs-vswitchd) #3 0x0000559536dadca1 udpif_upcall_handler (ovs-vswitchd) #4 0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd) #5 0x00007fa09bfcc14a start_thread (libpthread.so.0) #6 0x00007fa09a029dc3 __clone (libc.so.6) Stack trace of thread 88004: #0 0x00007fa09a01ea41 __poll (libc.so.6) #1 0x0000559536ea3af5 time_poll (ovs-vswitchd) #2 0x0000559536e8c9cc poll_block (ovs-vswitchd) #3 0x0000559536dfd091 ipf_clean_thread_main (ovs-vswitchd) #4 0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd) #5 0x00007fa09bfcc14a start_thread (libpthread.so.0) #6 0x00007fa09a029dc3 __clone (libc.so.6) Stack trace of thread 88007: #0 0x00007fa09bfd264a pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fa09a62ad30 handle_fildes_io (librt.so.1) #2 0x00007fa09bfcc14a start_thread (libpthread.so.0) #3 0x00007fa09a029dc3 __clone (libc.so.6) Stack trace of thread 87997: #0 0x00007fa09a02a0f7 epoll_wait (libc.so.6) #1 0x0000559536d644aa eal_intr_thread_main (ovs-vswitchd) #2 0x00007fa09bfcc14a start_thread (libpthread.so.0) #3 0x00007fa09a029dc3 __clone (libc.so.6) Stack trace of thread 87998: #0 0x00007fa09bfd67b7 recvmsg (libpthread.so.0) #1 0x0000559536d5283e mp_handle (ovs-vswitchd) #2 0x00007fa09bfcc14a start_thread (libpthread.so.0) #3 0x00007fa09a029dc3 __clone (libc.so.6) Expected results: ovs should not crash Additional info: [root@wsfd-advnetlab21 test]# rpm -qa | grep -E "openvswitch2.15|ovn-2021" ovn-2021-host-21.03.0-40.el8fdp.x86_64 python3-openvswitch2.15-2.15.0-24.el8fdp.x86_64 ovn-2021-central-21.03.0-40.el8fdp.x86_64 ovn-2021-21.03.0-40.el8fdp.x86_64 openvswitch2.15-2.15.0-24.el8fdp.x86_64 [root@wsfd-advnetlab21 test]# uname -a Linux wsfd-advnetlab21.anl.lab.eng.bos.redhat.com 4.18.0-316.el8.x86_64 #1 SMP Mon Jun 21 15:32:48 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux