Bug 2160686 - [ovs-dpdk-bond]L4 connection failed with ovs-dpdk
Summary: [ovs-dpdk-bond]L4 connection failed with ovs-dpdk
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch
Version: RHEL 9.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Mike Pattrick
QA Contact: mhou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-13 10:32 UTC by mhou
Modified: 2023-07-13 07:25 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2603 0 None None None 2023-01-13 10:32:48 UTC

Description mhou 2023-01-13 10:32:04 UTC
Description of problem:
use ncat or netperf can't connect to peer side.

Version-Release number of selected component (if applicable):
kernel version:5.14.0-231.el9.x86_64
openvswitch: openvswitch2.17-2.17.0-52.el9fdp.x86_64

How reproducible: 100%


Steps to Reproduce:
1. build ovs topo as below:
driverctl set-override 0000:af:00.0 vfio-pci
driverctl set-override 0000:af:00.1 vfio-pci
systemctl start openvswitch &>/dev/null
ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem='8192,8192'
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x800000000
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0xf000000000
ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --may-exist add-br bondbridge -- set bridge bondbridge datapath_type=netdev
ovs-vsctl set int bondbridge mtu_request=9200
ovs-vsctl add-bond bondbridge balance-tcp ens4f1 ens4f0 lacp=active bond_mode=balance-tcp  \
    -- set Interface ens4f1 type=dpdk options:dpdk-devargs=0000:af:00.1 options:n_rxq=4 mtu_request=9200  \
    -- set Interface ens4f0 type=dpdk options:dpdk-devargs=0000:af:00.0 options:n_rxq=4 mtu_request=9200
ovs-vsctl add-port bondbridge patchbond \
    -- set Interface patchbond type=patch \
    -- set Interface patchbond options:peer=patchguest mtu_request=9200
ovs-vsctl set int ens4f0 mtu_request=9200
ovs-vsctl set int ens4f1 mtu_request=9200
ovs-vsctl set int patchbond mtu_request=9200
ovs-ofctl mod-port bondbridge bondbridge up
ovs-ofctl mod-port bondbridge ens4f1 up
ovs-ofctl mod-port bondbridge ens4f0 up
ovs-vsctl --may-exist add-br guestbridge -- set bridge bondbridge datapath_type=netdev
ovs-vsctl --may-exist add-port  guestbridge patchguest \
    -- set Interface patchguest type=patch \
    -- set Interface patchguest options:peer=patchbond mtu_request=9200
ovs-vsctl set int guestbridge mtu_request=9200
ovs-ofctl mod-port guestbridge guestbridge up
ovs-vsctl --may-exist add-br guestbridge -- set bridge guestbridge datapath_type=netdev
ovs-vsctl --may-exist add-port  guestbridge patchguest -- set Interface patchguest type=patch -- set Interface patchguest options:peer=patchbond mtu_request=9200
ovs-vsctl set int guestbridge mtu_request=9200
ovs-ofctl mod-port guestbridge guestbridge up
ovs-ofctl mod-port guestbridge patchguest up

2. add two containers to guest bridge
# podman ps -a
CONTAINER ID  IMAGE                            COMMAND         CREATED         STATUS             PORTS       NAMES
e42b1f94f696  localhost/rhel9.0_x86_64:latest  sleep infinity  10 minutes ago  Up 10 minutes ago              g1
8201bd35a4c9  localhost/rhel9.0_x86_64:latest  sleep infinity  10 minutes ago  Up 10 minutes ago              g2

ovs-podman add-port guestbridge eth1 g1 --ipaddress=172.31.152.42/24 --ip6address=2001:db8:152::42/64 --mtu=9200 --macaddress=00:de:ad:98:02:02
ovs-podman add-port guestbridge eth2 g2 --ipaddress=172.31.152.52/24 --ip6address=2001:db8:152::52/64 --mtu=9200 --macaddress=00:de:ad:98:02:12
# podman exec g1 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0@if202: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:cb:30:99:df:bc brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.88.0.24/16 brd 10.88.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::4cb:30ff:fe99:dfbc/64 scope link 
       valid_lft forever preferred_lft forever
208: eth1@if209: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9200 qdisc noqueue state UP group default qlen 1000
    link/ether 00:de:ad:98:02:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.31.152.42/24 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 2001:db8:152::42/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::b8c9:8dff:fe71:7740/64 scope link 
       valid_lft forever preferred_lft forever
# podman exec g2 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0@if203: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 1e:f9:c8:d2:91:e4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.88.0.25/16 brd 10.88.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::1cf9:c8ff:fed2:91e4/64 scope link 
       valid_lft forever preferred_lft forever
210: eth2@if211: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9200 qdisc noqueue state UP group default qlen 1000
    link/ether 00:de:ad:98:02:12 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.31.152.52/24 scope global eth2
       valid_lft forever preferred_lft forever
    inet6 2001:db8:152::52/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::c0a9:1aff:fe10:f0d7/64 scope link 
       valid_lft forever preferred_lft forever


# ovs-vsctl show
0b41afb4-a3b3-4752-ba38-136f192af156
    Bridge bondbridge
        datapath_type: netdev
        Port patchbond
            Interface patchbond
                type: patch
                options: {peer=patchguest}
        Port balance-slb
            Interface ens4f1
                type: dpdk
                options: {dpdk-devargs="0000:af:00.1", n_rxq="4"}
            Interface ens4f0
                type: dpdk
                options: {dpdk-devargs="0000:af:00.0", n_rxq="4"}
        Port bondbridge
            Interface bondbridge
                type: internal
    Bridge guestbridge
        datapath_type: netdev
        Port patchguest
            Interface patchguest
                type: patch
                options: {peer=patchbond}
        Port guestbridge
            Interface guestbridge
                type: internal
        Port "1d25e098d1404_l"
            Interface "1d25e098d1404_l"
        Port e875a0b4e3f74_l
            Interface e875a0b4e3f74_l
    ovs_version: "2.17.4"
[root@hp-dl388g10-03 ~]# ovs-appctl bond/show
---- balance-slb ----
bond_mode: balance-slb
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
lb_output action: disabled, bond-id: -1
updelay: 0 ms
downdelay: 0 ms
next rebalance: 3104 ms
lacp_status: negotiated
lacp_fallback_ab: false
active-backup primary: <none>
active member mac: 3c:fd:fe:bd:1c:a5(ens4f1)

member ens4f0: enabled
  may_enable: true

member ens4f1: enabled
  active member
  may_enable: true



3. add NORMAL flow to all of bridge
ovs-ofctl add-flow guestbridge actions=NORMAL
ovs-ofctl add-flow bondbridge actions=NORMAL

4. ping from g1 to peer side
[root@e42b1f94f696 /]# ping 172.31.152.1 -c 3
PING 172.31.152.1 (172.31.152.1) 56(84) bytes of data.
64 bytes from 172.31.152.1: icmp_seq=1 ttl=64 time=0.074 ms
64 bytes from 172.31.152.1: icmp_seq=2 ttl=64 time=0.060 ms
64 bytes from 172.31.152.1: icmp_seq=3 ttl=64 time=0.074 ms

--- 172.31.152.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2078ms
rtt min/avg/max/mdev = 0.060/0.069/0.074/0.006 ms

5. do netperf from g1 to peer side
[root@e42b1f94f696 /]# netperf -4 -t TCP_STREAM -H 172.31.152.1 -l 10
establish control: are you sure there is a netserver listening on 172.31.152.1 at port 12865?
establish_control could not establish the control connection from 0.0.0.0 port 0 address family AF_INET to 172.31.152.1 port 12865 address family AF_INET
[root@e42b1f94f696 /]# netperf -4 -t UDP_STREAM -H 172.31.152.1 -l 10
establish control: are you sure there is a netserver listening on 172.31.152.1 at port 12865?
establish_control could not establish the control connection from 0.0.0.0 port 0 address family AF_INET to 172.31.152.1 port 12865 address family AF_INET

[root@hp-dl388g10-03 ovs_bond_function]# ovs-appctl dpctl/dump-flows -m
flow-dump from the main thread:
ufid:469d131a-a66d-4738-b6cf-8756ab2096a0, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(1d25e098d1404_l),packet_type(ns=0,id=0),eth(src=00:de:ad:98:02:02,dst=40:a6:b7:3e:a5:60),eth_type(0x0800),ipv4(src=172.31.152.42/0.0.0.0,dst=172.31.152.1/0.0.0.0,proto=6/0,tos=0/0,ttl=64/0,frag=no),tcp(src=44629/0,dst=12865/0),tcp_flags(0/0), packets:4, bytes:296, used:2.685s, flags:S, dp:ovs, actions:ens4f1, dp-extra-info:miniflow_bits(5,1)
flow-dump from pmd on cpu core: 39
ufid:7a0760d7-64a1-466f-9cdf-1bb2e20ab66d, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens4f0),packet_type(ns=0,id=0),eth(src=b0:c5:3c:f6:36:d4,dst=01:80:c2:00:00:0e),eth_type(0x88cc), packets:146, bytes:40442, used:4.314s, dp:ovs, actions:drop, dp-extra-info:miniflow_bits(5,0)
flow-dump from pmd on cpu core: 36
ufid:269a2a3d-0e3e-4931-98b5-ddddffbb7835, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens4f1),packet_type(ns=0,id=0),eth(src=b0:c5:3c:f6:36:d5,dst=01:80:c2:00:00:0e),eth_type(0x88cc), packets:149, bytes:41273, used:1.758s, dp:ovs, actions:drop, dp-extra-info:miniflow_bits(5,0)

6. check netserver start as well
[root@hp-dl388g10-02 ovs_bond_function]# netstat -anltup
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 192.168.122.1:53        0.0.0.0:*               LISTEN      1618/dnsmasq        
tcp        0      0 127.0.0.1:8081          0.0.0.0:*               LISTEN      1942/restraintd     
tcp        0      0 0.0.0.0:4999            0.0.0.0:*               LISTEN      3596027/nc          
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1322/sshd: /usr/sbi 
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1/systemd           
tcp        0    248 10.73.89.32:22          10.72.12.191:43530      ESTABLISHED 3593945/sshd: root  
tcp        0      0 10.73.89.32:942         10.73.130.89:2049       ESTABLISHED -                   
tcp        0      0 10.73.89.32:22          10.72.12.191:43526      ESTABLISHED 3593904/sshd: root  
tcp6       0      0 ::1:8081                :::*                    LISTEN      1942/restraintd     
tcp6       0      0 :::12865                :::*                    LISTEN      3596043/netserver   
tcp6       0      0 :::4999                 :::*                    LISTEN      3596027/nc          
tcp6       0      0 :::22                   :::*                    LISTEN      1322/sshd: /usr/sbi 
tcp6       0      0 :::111                  :::*                    LISTEN      1/systemd           
udp        0      0 192.168.122.1:53        0.0.0.0:*                           1618/dnsmasq        
udp        0      0 0.0.0.0:67              0.0.0.0:*                           1618/dnsmasq        
udp        0      0 10.73.89.32:68          10.73.2.108:67          ESTABLISHED 9673/NetworkManager 
udp        0      0 0.0.0.0:111             0.0.0.0:*                           1/systemd           
udp        0      0 127.0.0.1:323           0.0.0.0:*                           1273/chronyd        
udp6       0      0 :::111                  :::*                                1/systemd           
udp6       0      0 ::1:323                 :::*                                1273/chronyd 

Actual results:
1.Use netperf to do tcp/udp test failed

Expected results:
2. netperf test should run as well

Additional info:
I also try to use nc command but still failed.

peer side:
[root@hp-dl388g10-02 ovs_bond_function]# nc -l 4999

container side:
[root@e42b1f94f696 /]# nc 172.31.152.1 4999
Ncat: TIMEOUT.

Comment 1 Mike Pattrick 2023-01-18 15:47:58 UTC
Hello,

My initial thought is this could be an L3 checksum issue, that would explain why ICMP can pass but TCP can't. Is the setup still available? If so, could you grab a pcap? 

Running the following while conducting the test should help clear that up:

> # On hp-dl388g10-03
> ovs-tcpdump --span -i bondbridge -w bondbridge.pcap &
> ovs-tcpdump --span -i guestbridge -w guestbridge.pcap &

A tcpdump from hp-dl388g10-02 would also be helpful.

If your setup is no longer available then I can try to reproduce this issue.

Comment 2 mhou 2023-01-18 16:03:09 UTC
Hello Michael

I need wait current test finished. I can give you the results tomorrow.

Comment 3 mhou 2023-01-19 02:59:39 UTC
Hello Michael

Found the TCP retrasmission when capture bondbridge and guestbridge. I upload all of pcap failes on attachment.

Comment 7 Mike Pattrick 2023-01-19 15:00:49 UTC
Thanks! This confirms that it is an L4 checksum issue. I'll investigate further.

Comment 8 mhou 2023-03-01 07:22:43 UTC
Hello Michael

I found a document to describe this issue.
https://access.redhat.com/solutions/3964031

Comment 9 mhou 2023-03-01 08:48:04 UTC
Once I disable tx off on container side. Netperf can work as well. I think the current issue is inherited from https://bugzilla.redhat.com/show_bug.cgi?id=1685616#c5
[root@hp-dl388g10-03 ~]# ovs-vsctl show
04d6f2db-7723-49ca-8236-bdd535022bb5
    Bridge bondbridge
        datapath_type: netdev
        Port active-backup
            Interface ens4f1
                type: dpdk
                options: {dpdk-devargs="0000:af:00.1", n_rxq="4"}
            Interface ens4f0
                type: dpdk
                options: {dpdk-devargs="0000:af:00.0", n_rxq="4"}
        Port bondbridge
            Interface bondbridge
                type: internal
        Port patchbond
            Interface patchbond
                type: patch
                options: {peer=patchguest}
    Bridge guestbridge
        datapath_type: netdev
        Port c51ae05c88fe4_l
            Interface c51ae05c88fe4_l
        Port "488a9bac51db4_l"
            Interface "488a9bac51db4_l"
        Port guestbridge
            Interface guestbridge
                type: internal
        Port patchguest
            Interface patchguest
                type: patch
                options: {peer=patchbond}
    ovs_version: "3.1.1"
[root@41f1c92819a5 /]# ethtool -K eth1 tx off
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp-ecn-segmentation: off [not requested]
tx-tcp-mangleid-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
tx-checksum-sctp: off

[root@41f1c92819a5 /]# netperf -H 172.31.152.1
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.31.152.1 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

131072  16384  16384    10.00    4120.48

Comment 10 Mike Pattrick 2023-03-01 15:12:12 UTC
Good find, that seems about right. I am currently working on a patch set to improve the handling of checksum offloaded interfaces which may improve this behavior. Do you want to keep this ticket open? Or close for now as it's a duplicate of a previous ticket?

Comment 11 mhou 2023-03-01 15:14:33 UTC
I tend to keep it for tracking subsequent updates (if any update)


Note You need to log in before you can comment on or make changes to this bug.