Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1977243

Summary: ovs-vswitchd crashed at netdev_linux_batch_rxq_recv_sock
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Jianlin Shi <jishi>
Component: openvswitch2.15Assignee: Eelco Chaudron <echaudro>
Status: CLOSED UPSTREAM QA Contact: ovs-qe
Severity: high Docs Contact:
Priority: high    
Version: FDP 21.ECC: ctrautma, dhill, fhallal, fleitner, i.maximets, jhsiao, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-14 11:08:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jianlin Shi 2021-06-29 10:00:32 UTC
Description of problem:
when try to set datapath_type as netdev for ovn setup, ovs-vswitchd crashed at netdev_linux_batch_rxq_recv_sock

Version-Release number of selected component (if applicable):
[root@wsfd-advnetlab18 test]# rpm -qa | grep -E "openvswitch2.15|ovn-2021"
ovn-2021-host-21.03.0-40.el8fdp.x86_64                                                                
python3-openvswitch2.15-2.15.0-24.el8fdp.x86_64                                                       
ovn-2021-central-21.03.0-40.el8fdp.x86_64                                                             
ovn-2021-21.03.0-40.el8fdp.x86_64                                                                     
openvswitch2.15-2.15.0-24.el8fdp.x86_64

How reproducible:
Always

Steps to Reproduce:
Server:
systemctl start openvswitch                                                                                                                                                                                                                                        
systemctl start ovn-northd                                                                   
ovn-nbctl set-connection ptcp:6641                        
ovn-sbctl set-connection ptcp:6642                                                                    
                                                                                                      
ovs-vsctl set Open_vSwitch . other_config={}                                                          
ovs-vsctl  set Open_vSwitch . other_config:dpdk-init=true                      
ovs-vsctl  set Open_vSwitch . other_config:dpdk-lcore-mask=0x02                                       
ovs-vsctl  set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024"         
systemctl restart openvswitch                                                                         
                                                                               
ovs-vsctl add-br br-phy -- set bridge br-phy datapath_type=netdev                                                                                           
ovs-vsctl add-port br-phy ens1f0                                                                      
ip link set br-phy up                                                                           
ip link set ens1f0 up                                                                                 
ip addr add 20.0.173.25/24 dev br-phy                                                                 
                                                                                                      
pmd_cmd="python2 /root/test/get_pmd.py"                        
cpu_mask=$($pmd_cmd --cmd host_pmd --nic ens1f1 --pmd 2)                                              
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=$cpu_mask
                                                                                                   
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.173.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.173.25 external_ids:ovn-bridge-datapath-type=
netdev                                                                                                                                                        
systemctl restart ovn-controller 

ovn-nbctl ls-add ls1                                                                                  
ovn-nbctl lsp-add ls1 ls1p1
ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 192.168.173.1 2001:173::1"
ovn-nbctl lsp-add ls1 ls1p2 
ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:02:02 192.168.173.2 2001:173::2"
ovn-nbctl lsp-add ls1 ls1p3
ovn-nbctl lsp-set-addresses ls1p3 "00:00:00:01:03:02 192.168.173.3 2001:173::3"
ovn-nbctl lsp-add ls1 ls1p4
ovn-nbctl lsp-set-addresses ls1p4 "00:00:00:01:04:02 192.168.173.4 2001:173::4"

ovs-vsctl add-port br-int ls1p2 -- set interface ls1p2 type=internal external_ids:iface-id=ls1p2

ip netns add ls1p2
ip link set ls1p2 netns ls1p2                                                                         
ip netns exec ls1p2 ip link set ls1p2 address 00:00:00:01:02:02
ip netns exec ls1p2 ip link set ls1p2 up 
ip netns exec ls1p2 ip addr add 192.168.173.2/24 dev ls1p2
ip netns exec ls1p2 ip addr add 2001:173::2/64 dev ls1p2
                                                                                                      
ovs-vsctl add-port br-int ls1p4 -- set interface ls1p4 type=internal external_ids:iface-id=ls1p4

ip netns add ls1p4
ip link set ls1p4 netns ls1p4
ip netns exec ls1p4 ip link set ls1p4 address 00:00:00:01:04:02
ip netns exec ls1p4 ip link set ls1p4 up                                                              
ip netns exec ls1p4 ip addr add 192.168.173.4/24 dev ls1p4
ip netns exec ls1p4 ip addr add 2001:173::4/64 dev ls1p4

Client:

systemctl start openvswitch

ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl  set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl  set Open_vSwitch . other_config:dpdk-lcore-mask=0x02
ovs-vsctl  set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024"
systemctl restart openvswitch

ovs-vsctl add-br br-phy -- set bridge br-phy datapath_type=netdev
ovs-vsctl add-port br-phy ens1f0
ip link set br-phy up
ip link set ens1f0 up
ip addr add 20.0.173.26/24 dev br-phy

pmd_cmd="python2 /root/test/get_pmd.py"
cpu_mask=$($pmd_cmd --cmd host_pmd --nic ens1f1 --pmd 2)
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=$cpu_mask

ovs-vsctl set open . external_ids:system-id=hv0 external_ids:ovn-remote=tcp:20.0.173.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.173.26 external_ids:ovn-bridge-datapath-type=netdev

systemctl start ovn-controller 

ovs-vsctl add-port br-int ls1p3 -- set interface ls1p3 type=internal external_ids:iface-id=ls1p3

ip netns add ls1p3
ip link set ls1p3 netns ls1p3
ip netns exec ls1p3 ip link set ls1p3 address 00:00:00:01:03:02
ip netns exec ls1p3 ip link set ls1p3 up
ip netns exec ls1p3 ip addr add 192.168.173.3/24 dev ls1p3
ip netns exec ls1p3 ip addr add 2001:173::3/64 dev ls1p3

Actual results:
ovs-vswitchd on Client would crash:

           PID: 87996 (ovs-vswitchd)                                                                                                                                                               [60/1936]
           UID: 993 (openvswitch)                                                           
           GID: 1001 (hugetlbfs)                                       
        Signal: 11 (SEGV)                                            
     Timestamp: Tue 2021-06-29 05:53:38 EDT (5min ago)             
  Command Line: ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:in>
    Executable: /usr/sbin/ovs-vswitchd                               
 Control Group: /system.slice/ovs-vswitchd.service             
          Unit: ovs-vswitchd.service                                      
         Slice: system.slice                                         
       Boot ID: fe461b1328a644d78f02b3ea881615e0          
    Machine ID: acafba64de404d9fbb4862ef24220018               
      Hostname: wsfd-advnetlab21.anl.lab.eng.bos.redhat.com     
       Storage: /var/lib/systemd/coredump/core.ovs-vswitchd.993.fe461b1328a644d78f02b3ea881615e0.879>
       Message: Process 87996 (ovs-vswitchd) of user 993 dumped core.  
                                                                     
                Stack trace of thread 87996:                              
                #0  0x0000559536ec5ec6 netdev_linux_batch_rxq_recv_sock (ovs-vswitchd)
                #1  0x0000559536ec657f netdev_linux_rxq_recv (ovs-vswitchd)
                #2  0x0000559536e19c75 netdev_rxq_recv (ovs-vswitchd)               
                #3  0x0000559536dea4bb dp_netdev_process_rxq_port (ovs-vswitchd)    
                #4  0x0000559536deb26c dpif_netdev_run (ovs-vswitchd)               
                #5  0x0000559536d9d98c type_run (ovs-vswitchd)                      
                #6  0x0000559536d8562f ofproto_type_run (ovs-vswitchd) 
                #7  0x0000559536d729fc bridge_run__ (ovs-vswitchd)   
                #8  0x0000559536d78f5d bridge_run (ovs-vswitchd)                    
                #9  0x00005595367111c5 main (ovs-vswitchd)                          
                #10 0x00007fa099f50493 __libc_start_main (libc.so.6)                
                #11 0x000055953671243e _start (ovs-vswitchd)                        

                Stack trace of thread 88000:
                #0  0x00007fa099ff5d98 __nanosleep (libc.so.6)
                #1  0x00007fa099ff5c9e sleep (libc.so.6)
                #2  0x0000559536eae153 xsleep (ovs-vswitchd)
                #3  0x0000559536ee2b31 dpdk_watchdog (ovs-vswitchd)
                #4  0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd)
                #5  0x00007fa09bfcc14a start_thread (libpthread.so.0)
                #6  0x00007fa09a029dc3 __clone (libc.so.6)
                
                Stack trace of thread 88002:
                #0  0x00007fa09a01ea41 __poll (libc.so.6)
                #1  0x0000559536ea3af5 time_poll (ovs-vswitchd)
                #2  0x0000559536e8c9cc poll_block (ovs-vswitchd)
                #3  0x0000559536e75cda ovsrcu_postpone_thread (ovs-vswitchd)
                #4  0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd)
                #5  0x00007fa09bfcc14a start_thread (libpthread.so.0)
                #6  0x00007fa09a029dc3 __clone (libc.so.6)
                
                Stack trace of thread 88003:
                #0  0x00007fa09a01ea41 __poll (libc.so.6)
                #1  0x0000559536ea3af5 time_poll (ovs-vswitchd)
                #2  0x0000559536e8c9cc poll_block (ovs-vswitchd)
                #3  0x0000559536f180fa clean_thread_main (ovs-vswitchd)
                #4  0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd)
                #5  0x00007fa09bfcc14a start_thread (libpthread.so.0)
                #6  0x00007fa09a029dc3 __clone (libc.so.6)
                Stack trace of thread 88006:                         
                #0  0x00007fa09a01ea41 __poll (libc.so.6) 
                #1  0x0000559536ea3af5 time_poll (ovs-vswitchd)
                #2  0x0000559536e8c9cc poll_block (ovs-vswitchd)
                #3  0x0000559536daf124 udpif_revalidator (ovs-vswitchd)                     
                #4  0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd)
                #5  0x00007fa09bfcc14a start_thread (libpthread.so.0)
                #6  0x00007fa09a029dc3 __clone (libc.so.6)

                Stack trace of thread 87999:
                #0  0x00007fa09bfd5a07 accept (libpthread.so.0)
                #1  0x0000559536d6d15b socket_listener (ovs-vswitchd)     
                #2  0x00007fa09bfcc14a start_thread (libpthread.so.0)
                #3  0x00007fa09a029dc3 __clone (libc.so.6)

                Stack trace of thread 88005:
                #0  0x00007fa09a01ea41 __poll (libc.so.6)       
                #1  0x0000559536ea3af5 time_poll (ovs-vswitchd)
                #2  0x0000559536e8c9cc poll_block (ovs-vswitchd)     
                #3  0x0000559536dadca1 udpif_upcall_handler (ovs-vswitchd)
                #4  0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd)
                #5  0x00007fa09bfcc14a start_thread (libpthread.so.0)
                #6  0x00007fa09a029dc3 __clone (libc.so.6)                          
                                                                                    
                Stack trace of thread 88004:                                        
                #0  0x00007fa09a01ea41 __poll (libc.so.6)                           
                #1  0x0000559536ea3af5 time_poll (ovs-vswitchd)
                #2  0x0000559536e8c9cc poll_block (ovs-vswitchd) 
                #3  0x0000559536dfd091 ipf_clean_thread_main (ovs-vswitchd)         
                #4  0x0000559536e782e3 ovsthread_wrapper (ovs-vswitchd)             
                #5  0x00007fa09bfcc14a start_thread (libpthread.so.0)               
                #6  0x00007fa09a029dc3 __clone (libc.so.6)                          
                             
                Stack trace of thread 88007:
                #0  0x00007fa09bfd264a pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007fa09a62ad30 handle_fildes_io (librt.so.1)
                #2  0x00007fa09bfcc14a start_thread (libpthread.so.0)
                #3  0x00007fa09a029dc3 __clone (libc.so.6)
                
                Stack trace of thread 87997:
                #0  0x00007fa09a02a0f7 epoll_wait (libc.so.6)
                #1  0x0000559536d644aa eal_intr_thread_main (ovs-vswitchd)
                #2  0x00007fa09bfcc14a start_thread (libpthread.so.0)
                #3  0x00007fa09a029dc3 __clone (libc.so.6)
                
                Stack trace of thread 87998:
                #0  0x00007fa09bfd67b7 recvmsg (libpthread.so.0)
                #1  0x0000559536d5283e mp_handle (ovs-vswitchd)
                #2  0x00007fa09bfcc14a start_thread (libpthread.so.0)
                #3  0x00007fa09a029dc3 __clone (libc.so.6)

Expected results:
ovs should not crash

Additional info:

[root@wsfd-advnetlab21 test]# rpm -qa | grep -E "openvswitch2.15|ovn-2021"
ovn-2021-host-21.03.0-40.el8fdp.x86_64
python3-openvswitch2.15-2.15.0-24.el8fdp.x86_64
ovn-2021-central-21.03.0-40.el8fdp.x86_64
ovn-2021-21.03.0-40.el8fdp.x86_64
openvswitch2.15-2.15.0-24.el8fdp.x86_64
[root@wsfd-advnetlab21 test]# uname -a
Linux wsfd-advnetlab21.anl.lab.eng.bos.redhat.com 4.18.0-316.el8.x86_64 #1 SMP Mon Jun 21 15:32:48 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

Comment 2 Ilya Maximets 2021-06-29 10:30:26 UTC
Might be related (I didn't look at the coredump, just remembered this
upstream report):
https://mail.openvswitch.org/pipermail/ovs-discuss/2021-June/051209.html

Comment 3 Eelco Chaudron 2021-07-01 14:35:04 UTC
Here is some backtrace info:

(gdb) bt
#0  dp_packet_set_size (b=0x0, b=0x0, v=13028) at ../lib/dp-packet.h:578
#1  netdev_linux_batch_rxq_recv_sock (rx=<optimized out>, mtu=<optimized out>, batch=0x7ffeefcd56a0) at ../lib/netdev-linux.c:1308
#2  0x0000559536ec657f in netdev_linux_rxq_recv (rxq_=0x55953a498cc0, batch=0x7ffeefcd56a0, qfill=0x0) at ../lib/netdev-linux.c:1508
#3  0x0000559536e19c75 in netdev_rxq_recv (rx=<optimized out>, batch=batch@entry=0x7ffeefcd56a0, qfill=<optimized out>) at ../lib/netdev.c:727
#4  0x0000559536dea4bb in dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fa096880010, rxq=0x55953a498c30, port_no=3) at ../lib/dpif-netdev.c:4749
#5  0x0000559536deb26c in dpif_netdev_run (dpif=<optimized out>) at ../lib/dpif-netdev.c:5792
#6  0x0000559536d9d98c in type_run (type=<optimized out>) at ../ofproto/ofproto-dpif.c:370
#7  0x0000559536d8562f in ofproto_type_run (datapath_type=datapath_type@entry=0x55953a3c76d0 "netdev") at ../ofproto/ofproto.c:1780
#8  0x0000559536d729fc in bridge_run__ () at ../vswitchd/bridge.c:3245
#9  0x0000559536d78f5d in bridge_run () at ../vswitchd/bridge.c:3310
#10 0x00005595367111c5 in main (argc=<optimized out>, argv=<optimized out>) at ../vswitchd/ovs-vswitchd.c:127

(gdb) frame 1
#1  netdev_linux_batch_rxq_recv_sock (rx=<optimized out>, mtu=<optimized out>, batch=0x7ffeefcd56a0) at ../lib/netdev-linux.c:1308
1308	            dp_packet_set_size(pkt, mmsgs[i].msg_len - std_len);
(gdb) l
1303	
1304	        if (mmsgs[i].msg_len > std_len) {
1305	            /* Build a single linear TSO packet by prepending the data from
1306	             * std_len buffer to the aux_buf. */
1307	            pkt = rx->aux_bufs[i];
1308	            dp_packet_set_size(pkt, mmsgs[i].msg_len - std_len);
1309	            dp_packet_push(pkt, dp_packet_data(buffers[i]), std_len);
1310	            /* The headroom should be the same in buffers[i], pkt and
1311	             * DP_NETDEV_HEADROOM. */
1312	            dp_packet_resize(pkt, DP_NETDEV_HEADROOM, 0);

(gdb) p pkt
$2 = (struct dp_packet *) 0x0

(gdb) p mmsgs[i].msg_len
$4 = 14546

(gdb) p std_len
$5 = 1518


(gdb) ovs_dump_dp_netdev ports
(struct dp_netdev *) 0x55953a3ca500: name = ovs-netdev, class = (struct dpif_class *) 0x5595373901a0 <dpif_netdev_class>
    (struct dp_netdev_port *) 0x55953a44ff60:
        port_no = 0, n_rxq = 1, type = tap
        netdev = (struct netdev *) 0x55953a44fcb0: name = ovs-netdev, n_txq/rxq = 1/1
    (struct dp_netdev_port *) 0x55953a49a280:
        port_no = 4, n_rxq = 1, type = tap
        netdev = (struct netdev *) 0x55953a49a050: name = br-phy, n_txq/rxq = 1/1
    (struct dp_netdev_port *) 0x559539773660:
        port_no = 1, n_rxq = 1, type = tap
        netdev = (struct netdev *) 0x55953a496110: name = br-int, n_txq/rxq = 1/1
    (struct dp_netdev_port *) 0x55953a497530:
        port_no = 2, n_rxq = 1, type = tap
        netdev = (struct netdev *) 0x55953a4972e0: name = ls1p3, n_txq/rxq = 1/1
    (struct dp_netdev_port *) 0x55953a498ba0:
        port_no = 3, n_rxq = 1, type = system
        netdev = (struct netdev *) 0x55953a3c4f70: name = ens1f0, n_txq/rxq = 1/1

So port 3 is the receiving port, so the question is, did you manually enable TSO on this port through the ethtool port?

If so can you test this with all ports in OVS to have TSO disabled? In the meantime, I'll do a bit more research.

Comment 4 Ilya Maximets 2021-07-01 14:43:42 UTC
I'll repeat here what I said on IRC to keep the history.
We're using AF_PACKET sockets here with MSG_TRUNC, and
'man 7 packet' says following about that:

  When the MSG_TRUNC flag is passed to recvmsg(2), recv(2),
  or recvfrom(2), the real length of the packet on the wire
  is always returned, even when it is longer than the buffer.

So, I suppose that issue was there for a long time, but
implementation of userspace-tso exposed it with a segfault.

In short, the size returned from the kernel is not the size
of the actual data received.  If it's larger than expected,
we need to truncate it down to the size of a buffer we're
using.

Comment 5 Flavio Leitner 2021-07-01 17:44:11 UTC
Hi,

We use recvmmsg() instead, so the return is the number of msgs received
and it is already truncated to the buffer size provided by the userspace.

We would need to iterate over all received packets checking if the flag
MSG_TRUNC is set in mmsgs[i].msg_hdr.msg_flags. If that's the case, the
packet is truncated/corrupted.

Since it is not passing MSG_PEEK, the next entries will receive the next
packets in the queue.

The real length of the packet is provided in msg_hdr.msg_len:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/socket.c#n274

fbl

Comment 6 Ilya Maximets 2021-07-01 18:14:07 UTC
(In reply to Flavio Leitner from comment #5)
> Hi,
> 
> We use recvmmsg() instead, so the return is the number of msgs received
> and it is already truncated to the buffer size provided by the userspace.

Sure.  I think, that man page just wasn't updated to include recvmmsg(),
but the behaviour should be logically the same.

> 
> We would need to iterate over all received packets checking if the flag
> MSG_TRUNC is set in mmsgs[i].msg_hdr.msg_flags. If that's the case, the
> packet is truncated/corrupted.

This might be a good solution, we already iterating over them to check
if they are too short/needs linearization.

Comment 7 Jianlin Shi 2021-07-02 01:00:25 UTC
(In reply to Eelco Chaudron from comment #3)

> So port 3 is the receiving port, so the question is, did you manually enable
> TSO on this port through the ethtool port?

TSO is enabled by default

> 
> If so can you test this with all ports in OVS to have TSO disabled? In the
> meantime, I'll do a bit more research.

after disable TSO for ens1f0, the crash still happened:

[root@wsfd-advnetlab21 dpdk]# ethtool  -k ens1f0 | grep tcp                                           
tcp-segmentation-offload: off                                                                         
        tx-tcp-segmentation: off                                                                      
        tx-tcp-ecn-segmentation: off [fixed]                                                          
        tx-tcp-mangleid-segmentation: off                                                             
        tx-tcp6-segmentation: off 

[root@wsfd-advnetlab21 dpdk]# coredumpctl list                                                        
TIME                            PID   UID   GID SIG COREFILE  EXE                                     
Thu 2021-07-01 20:58:45 EDT   11976   993  1001  11 present   /usr/sbin/ovs-vswitchd                  
Thu 2021-07-01 20:58:48 EDT   12204   993  1001  11 present   /usr/sbin/ovs-vswitchd

Comment 8 Eelco Chaudron 2021-07-02 06:39:09 UTC
(In reply to Jianlin Shi from comment #7)
> (In reply to Eelco Chaudron from comment #3)
> 
> > So port 3 is the receiving port, so the question is, did you manually enable
> > TSO on this port through the ethtool port?
> 
> TSO is enabled by default
> 
> > 
> > If so can you test this with all ports in OVS to have TSO disabled? In the
> > meantime, I'll do a bit more research.
> 
> after disable TSO for ens1f0, the crash still happened:

Did you happen to take a core dump, just want to make sure it's the same issue (probably traffic from another kernel interface with TSO enabled)?

Comment 10 Eelco Chaudron 2021-07-02 07:00:42 UTC
(In reply to Jianlin Shi from comment #9)
> coredump with tso disabled:
> http://netqe-bj.usersys.redhat.com/share/jishi/bz1977243/core.ovs-vswitchd.
> 993.23f9e741e194440da67a725551d51daf.25484.1625208423000000.lz4

The same problem as before, will look into this. Thanks for the quick response.

Comment 11 Eelco Chaudron 2021-07-02 13:31:29 UTC
So for your setup to work you need to disable both GSO and TSO.

Which makes me wonder why you add a kernel interface to begin with? This is a DPDK environment, so you should use DPDK physical interfaces if you care about performance.

I will work on a patch to make sure OVS no longer crashes when GSO and/or TSO is enabled.

Comment 13 Jianlin Shi 2021-07-05 00:39:31 UTC
(In reply to Eelco Chaudron from comment #11)
> So for your setup to work you need to disable both GSO and TSO.
> 
> Which makes me wonder why you add a kernel interface to begin with? This is
> a DPDK environment, so you should use DPDK physical interfaces if you care
> about performance.
> 
> I will work on a patch to make sure OVS no longer crashes when GSO and/or
> TSO is enabled.

Hi Eelco,

I'm not testing performance but just trying to create the topology and test functionality.
and as the kernel interface is supported from the guide line: https://docs.openvswitch.org/en/latest/howto/userspace-tunneling/, so I tried to add a kernel interface into ovs.

Comment 14 Jianlin Shi 2021-07-05 03:34:17 UTC
ovs won't crash if disable tso, gso and gro:

[root@wsfd-advnetlab21 dpdk]# ethtool -k ens1f0 | grep -E "tcp|generi"                                
        tx-checksum-ip-generic: on                                                                    
tcp-segmentation-offload: off
        tx-tcp-segmentation: off                                                                      
        tx-tcp-ecn-segmentation: off [fixed]                                                          
        tx-tcp-mangleid-segmentation: off                                                             
        tx-tcp6-segmentation: off                                                                     
generic-segmentation-offload: off
generic-receive-offload: off

Comment 15 Eelco Chaudron 2021-07-05 12:03:57 UTC
Thanks for confirming the issue is solved with disabling tso/gso and gro. I sent out a v2 of the patch, https://patchwork.ozlabs.org/project/openvswitch/patch/162548620436.40409.579366497986013480.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com/ and it should solve the issue.

Regarding the typology test, it's supported to use kernel interfaces in combination with OVS-DPDK but it's not a recommended solution. So please do not suggest anyone to use it as part of an OVN/OVS deployment.

Comment 16 Eelco Chaudron 2021-07-14 11:08:49 UTC
Patch got accepted upstream and backported all the way to 2.13.

Next FDP release will automatically pick up the upstream fixes.