The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1984953 - [OSP16.1/RHEL8.4][ML2-OVN][Hw-Offload] DVR DNAT Floating IP connection is not offloaded to HW due to uncommitted Connection Tracking zone
Summary: [OSP16.1/RHEL8.4][ML2-OVN][Hw-Offload] DVR DNAT Floating IP connection is not...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn-2021
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On: 2043187
Blocks: 2131355
TreeView+ depends on / blocked
 
Reported: 2021-07-22 14:15 UTC by Itai Levy
Modified: 2023-09-18 00:28 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2024599 (view as bug list)
Environment:
Last Closed: 2023-03-13 07:14:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1591 0 None None None 2021-10-08 12:34:28 UTC

Description Itai Levy 2021-07-22 14:15:46 UTC
Description of problem:

One of the OVS DP rules created for DNAT floating-ip traffic in DVR setup, is redirected into a Connection Tracking zone which is not committed, and therefore cannot be offloaded by HW. 


Version-Release number of selected component (if applicable):
• OSP16.1.4 
• RHEL8.4 with kernel 4.18.0-305.7.1.el8_4.x86_64
• MLNX_OFED_LINUX-5.4-0.5 
• openvswitch 2.14.1 (MOFED OVS)
• ConnectX NIC is configured as bond (VF-LAG, LACP)
• geneve tenant network , direct ports with "switchdev" capabilities and with security groups 
• vRouter with the geneve tenant subnet + additional external subnet with floating IPs
• VMs running iperf3 test between the floating IPs


How reproducible:
Every time. 

Steps to Reproduce:
1. deploy cloud
2. create geneve tenant network 
3. create direct ports with: --binding-profile '{"capabilities":["switchdev"]}' --security-group my_policy (to allow the iperf traffic)
3. create an external provider vlan / flat network
4. create vrouter with both subnets (--external-gateway for the "external" network)
5. create floating IPs on the "external" network
6. create instances with the geneve direct ports and assign external floating IPs
7. run traffic (iperf) between VMs or between a VM and an external iperf server via the floating IP


Actual results:

Traffic is not offloaded, seen in TCPdump.
one of the OVS DP rules created for the DNAT traffic is redirected into a Connection Tracking zone which is not committed into kernel, TC output shows the traffic of this "ghost" zone is "in_hw" however the packet on the specific chain are processed by SW and not by HW.


Expected results:
full offload of all DNAT OVS DP rules 


Additional info:

See below OVS DP rules from the iperf TX node, egress chain 0x31 is redirected to chain 0x3a / zone=5, and this zone does not appear in  /proc/net/nf_conntrack output...
Although this rule is marked as "offloaded", TC output shows packets of this chain are not sent by HW.


ingress DP:

ufid:ea92a156-0026-4f35-8dba-853525d39732, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0),dp_hash(0/0),in_port(bond0),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:46:93:ab,dst=fa:16:3e:61:01:ce),eth_type(0x8100),vlan(vid=101,pcp=0),encap(eth_type(0x0800),ipv4(src=11.11.11.0/255.255.255.192,dst=11.11.11.115,proto=6,tos=0/0,ttl=63,frag=no),tcp(src=0/0,dst=0/0)), packets:5123628, bytes:379148492, used:0.880s, offloaded:yes, dp:tc, actions:ct_clear,pop_vlan,ct(zone=3,nat),recirc(0x3c)

ufid:1fa4bae9-6703-420b-b371-1a83afcfef76, skb_priority(0/0),skb_mark(0/0),ct_state(0x2a/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x3c),dp_hash(0/0),in_port(bond0),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:46:93:ab,dst=fa:16:3e:61:01:ce),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=33.33.33.108,proto=6,tos=0/0,ttl=63,frag=no),tcp(src=0/0,dst=0/0), packets:5123628, bytes:358654028, used:0.880s, offloaded:yes, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:1d:eb:d7,dst=fa:16:3e:c1:c7:78)),set(ipv4(ttl=62)),ct(zone=1),recirc(0x3d)

ufid:21aee1ac-0a83-49cb-a108-75daca673e6e, skb_priority(0/0),skb_mark(0/0),ct_state(0x2a/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x3d),dp_hash(0/0),in_port(bond0),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=fa:16:3e:c1:c7:78),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=33.33.33.108,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:5123628, bytes:358654028, used:0.880s, offloaded:yes, dp:tc, actions:ens1f0_9


egress DP:

ufid:d036fb54-74a2-48ca-b506-843a6fdee55b, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0_9),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:c1:c7:78,dst=fa:16:3e:1d:eb:d7),eth_type(0x0800),ipv4(src=33.33.33.108,dst=0.0.0.0/0.0.0.0,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=0/0), packets:109255400, bytes:976762191901, used:0.880s, offloaded:yes, dp:tc, actions:ct(zone=1),recirc(0x31)  

ufid:02daed36-ea4a-49ef-b49a-45ddca0ae381, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x31),dp_hash(0/0),in_port(ens1f0_9),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:c1:c7:78,dst=fa:16:3e:1d:eb:d7),eth_type(0x0800),ipv4(src=33.33.33.108,dst=11.11.11.13,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=5101), packets:109558494, bytes:973074160887, used:0.000s, offloaded:yes, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:61:01:ce,dst=fa:16:3e:46:93:ab)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x3a)

ufid:19edcb48-7e72-424e-a7bd-f5bab318c599,
 skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x3a),dp_hash(0/0),in_port(ens1f0_9),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:61:01:ce,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=33.33.33.108,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:109558497, bytes:973074161067, used:0.000s, offloaded:yes, dp:tc, actions:ct(commit,zone=3,nat(src=11.11.11.115)),recirc(0x3b)
 
ufid:6f721f1a-d420-488f-b165-75e4599b6709, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x3b),dp_hash(0/0),in_port(ens1f0_9),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:61:01:ce,dst=fa:16:3e:46:93:ab),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=11.11.11.0/255.255.255.192,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=0/0), packets:109558501, bytes:973074223169, used:0.000s, offloaded:yes, dp:tc, actions:ct_clear,push_vlan(vid=101,pcp=0),bond0



Matching TC rules:

)[root@overcloud-computesriov-rack0-1 /]# 
()[root@overcloud-computesriov-rack0-1 /]# tc -s filter show dev ens1f0_9 ingress
filter protocol ip pref 3 flower chain 0 
filter protocol ip pref 3 flower chain 0 handle 0x1 
  dst_mac fa:16:3e:1d:eb:d7
  src_mac fa:16:3e:c1:c7:78
  eth_type ipv4
  ip_proto tcp
  src_ip 33.33.33.108
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: gact action 12
         random type none 262150 val 0
         index 85466 ref 0 bind 0 installed 0 sec used 42949672 sec expires 343597383 sec
        Action statistics:
!!!Deficit -4, rta_len=20

        cookie 54fb36d0ca48a2743a8406b55be5de6f

        action order 2: gact action goto chain 49
         random type none pass val 0
         index 1 ref 1 bind 1 installed 854 sec used 0 sec
        Action statistics:
        Sent 2591111747191 bytes 289802644 pkt (dropped 0, overlimits 0 requeues 0) 
        Sent software 953 bytes 15 pkt
        Sent hardware 2591111746238 bytes 289802629 pkt
        backlog 0b 0p requeues 0
        cookie 54fb36d0ca48a2743a8406b55be5de6f



filter protocol ip pref 3 flower chain 49 handle 0x3 
  dst_mac fa:16:3e:1d:eb:d7
  src_mac fa:16:3e:c1:c7:78
  eth_type ipv4
  ip_proto tcp
  ip_ttl 0x40/ff
  dst_ip 11.11.11.13
  src_ip 33.33.33.108
  dst_port 5101
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: gact action pass
         random type none 65560 val 0
         index 85464 ref 0 bind 0 installed 0 sec used 42949672 sec expires 343597383 sec
        Action statistics:
!!!Deficit -4, rta_len=20

        cookie 36edda02ef494aeadd459ab481e30aca

        action order 2:  pedit action pipe keys 5
         index 3 ref 1 bind 1 installed 854 sec
         key #0  at ipv4+8: val 3f000000 mask 00ffffff
         key #1  at eth+4: val 0000fa16 mask ffff0000
         key #2  at eth+8: val 3e6101ce mask 00000000
         key #3  at eth+0: val fa163e46 mask 00000000
         key #4  at eth+4: val 93ab0000 mask 0000ffff
        Action statistics:
        Sent 2574555460931 bytes 289844124 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0

        action order 3: csum (iph, tcp) action pipe
        index 3 ref 1 bind 1 installed 854 sec
        Action statistics:
        Sent 2574555460931 bytes 289844124 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0

        action order 4: gact action pass
         random type none 262150 val 0
         index 85464 ref 0 bind 0 installed 0 sec used 42949672 sec expires 343597383 sec
        Action statistics:
!!!Deficit -4, rta_len=20

        cookie 36edda02ef494aeadd459ab481e30aca

        action order 5: gact action goto chain 58
         random type none pass val 0
         index 6 ref 1 bind 1 installed 854 sec
        Action statistics:
        Sent 2574555460931 bytes 289844124 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie 36edda02ef494aeadd459ab481e30aca







filter protocol ip pref 3 flower chain 58 
filter protocol ip pref 3 flower chain 58 handle 0x1 
  src_mac fa:16:3e:61:01:ce
  eth_type ipv4
  src_ip 33.33.33.108
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: gact action pass
         random type none 262150 val 0
         index 85466 ref 0 bind 0 installed 0 sec used 42949672 sec expires 343597383 sec
        Action statistics:
!!!Deficit -4, rta_len=20

        cookie 48cbed194e42727ebaf5bda799c518b3

        action order 2: gact action goto chain 59
         random type none pass val 0
         index 3 ref 1 bind 1 installed 854 sec
        Action statistics:
        Sent 2574555461111 bytes 289844127 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie 48cbed194e42727ebaf5bda799c518b3





filter protocol ip pref 3 flower chain 59 
filter protocol ip pref 3 flower chain 59 handle 0x2 
  dst_mac fa:16:3e:46:93:ab
  src_mac fa:16:3e:61:01:ce
  eth_type ipv4
  ip_proto tcp
  dst_ip 11.11.11.13/26
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: gact action pass
         random type none 65560 val 0
         index 85464 ref 0 bind 0 installed 0 sec used 42949672 sec expires 343597383 sec
        Action statistics:
!!!Deficit -4, rta_len=20

        cookie 1a1f726f8f4820d4e47565b109679b59

        action order 2: vlan  push id 101 protocol 802.1Q priority 0 pipe
         index 13 ref 1 bind 1 installed 854 sec
        Action statistics:
        Sent 2574555460931 bytes 289844124 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0

        action order 3: mirred (Egress Redirect to device bond0) stolen
        index 19 ref 1 bind 1 installed 854 sec
        Action statistics:
        Sent 2574555460931 bytes 289844124 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie 1a1f726f8f4820d4e47565b109679b59

()[root@overcloud-computesriov-rack0-1 /]#

Comment 1 Numan Siddique 2021-07-22 14:41:03 UTC
Can you please attach the OVN NB and SB databases to the BZ ?

Thanks

Comment 2 Itai Levy 2021-07-22 14:49:03 UTC
OVN NB/SB:

[root@overcloud-computesriov-rack0-0 heat-admin]# ovn-nbctl show
switch 2708c2bc-3643-41fa-b579-e6dbaf6ec856 (neutron-c7bf34dd-aeab-4c5a-b2c0-29b16d9df946) (aka public)
    port provnet-2fd236a5-be1c-4789-8f47-39a9a114cb97
        type: localnet
        addresses: ["unknown"]
    port 45a5dd5b-ff3c-4dc0-9801-1e6e8cae7e49
        type: localport
        addresses: ["fa:16:3e:72:f5:42"]
switch 104b5a00-cf6b-4e69-84ae-6d2581cc5ba1 (neutron-73dd37bf-5449-4466-a9ef-9bfdfa92e14a) (aka vlan_data)
    port 79da228c-e188-423a-93ed-d305cadd12b4
        type: router
        router-port: lrp-79da228c-e188-423a-93ed-d305cadd12b4
    port a7e51f61-8878-4500-ba7c-89c4b68c70fd (aka direct112)
        addresses: ["fa:16:3e:53:4d:e3 11.11.11.125"]
    port c7cb4fff-d031-4901-92ef-a49062576918
        type: localport
        addresses: ["fa:16:3e:be:ff:8d 11.11.11.2"]
    port b6614061-7cd1-4abf-911b-ddf23b703103 (aka direct111)
        addresses: ["fa:16:3e:22:99:6e 11.11.11.38"]
    port provnet-345cf522-b7a3-4c1f-96af-f3b159be8cbe
        type: localnet
        tag: 101
        addresses: ["unknown"]
switch 8b5bcd4a-0804-4c09-8a44-5a95226a4c91 (neutron-8c933dc8-2baf-423b-ae3d-798d2f446e74) (aka gen_data)
    port 0b65cf78-f2cf-4ec9-9653-9c63765cc3a8
        type: localport
        addresses: ["fa:16:3e:43:c5:52 33.33.33.2"]
    port 54657064-139c-4123-8cba-0da1e65095b8
        type: router
        router-port: lrp-54657064-139c-4123-8cba-0da1e65095b8
    port 564e9d5e-4db7-4527-838c-15725ee28208 (aka direct12)
        addresses: ["fa:16:3e:c1:c7:78 33.33.33.108"]
    port b9f97053-32c3-4836-a262-755b603c90cf (aka direct11)
        addresses: ["fa:16:3e:86:53:96 33.33.33.130"]
switch 24ea9b8e-bef1-479c-983e-a79a42f106e9 (neutron-21b0556d-e5c6-4e36-808e-8c551b66295d) (aka test-net)
    port 4a88705b-11e8-4511-ae14-9bdb32cde53d
        type: localport
        addresses: ["fa:16:3e:52:ea:0e"]
router 110e0cc6-e631-4d0c-bbcb-9ad5d48e0072 (neutron-df846c89-7eba-4de0-b721-1a9ee5c8c34a) (aka vlan_router)
    port lrp-79da228c-e188-423a-93ed-d305cadd12b4
        mac: "fa:16:3e:54:15:52"
        networks: ["11.11.11.232/24"]
        gateway chassis: [ce414fad-ee29-47b7-9313-94f8f7c437e5 bc2891c9-fa0a-408c-842c-a415d1461a85 3ff18497-91f6-47b9-86f7-7a5f4d33979d]
    port lrp-54657064-139c-4123-8cba-0da1e65095b8
        mac: "fa:16:3e:f0:b1:9e"
        networks: ["33.33.33.1/24"]
    nat 2ba7cb52-5df5-406c-9f7d-cfeba37bf0af
        external ip: "11.11.11.13"
        logical ip: "33.33.33.130"
        type: "dnat_and_snat"
    nat 33ec8d2a-ccce-4688-8a01-83792c6c0418
        external ip: "11.11.11.232"
        logical ip: "33.33.33.0/24"
        type: "snat"
    nat aef7016d-1e3c-4bf9-bfa6-cfc2677e167a
        external ip: "11.11.11.115"
        logical ip: "33.33.33.108"
        type: "dnat_and_snat"
router 9bfa38f2-be6b-4db8-b33e-206babd2a365 (neutron-fb746e48-7b18-4608-9875-510e3aa9d88c) (aka public_router)


[root@overcloud-computesriov-rack0-0 heat-admin]# ovn-sbctl show
Chassis "3a155514-bb86-4316-a838-71585eeb733a"
    hostname: overcloud-computesriov-rack0-0.localdomain
    Encap geneve
        ip: "172.16.0.20"
        options: {csum="true"}
    Port_Binding "b9f97053-32c3-4836-a262-755b603c90cf"
Chassis "dccff23d-0d97-4a89-a3fc-c596d3e91e2b"
    hostname: overcloud-computesriov-rack1-0.localdomain
    Encap geneve
        ip: "172.16.1.54"
        options: {csum="true"}
Chassis "6626ffbe-47aa-41ef-8842-e6330d0dcffc"
    hostname: overcloud-computesriov-rack0-1.localdomain
    Encap geneve
        ip: "172.16.0.53"
        options: {csum="true"}
    Port_Binding "564e9d5e-4db7-4527-838c-15725ee28208"
Chassis "bc2891c9-fa0a-408c-842c-a415d1461a85"
    hostname: overcloud-controller-0.localdomain
    Encap geneve
        ip: "172.16.0.172"
        options: {csum="true"}
Chassis "ce414fad-ee29-47b7-9313-94f8f7c437e5"
    hostname: overcloud-controller-1.localdomain
    Encap geneve
        ip: "172.16.0.101"
        options: {csum="true"}
Chassis "3ff18497-91f6-47b9-86f7-7a5f4d33979d"
    hostname: overcloud-controller-2.localdomain
    Encap geneve
        ip: "172.16.0.168"
        options: {csum="true"}
    Port_Binding cr-lrp-79da228c-e188-423a-93ed-d305cadd12b4

Comment 3 Itai Levy 2021-07-22 14:51:56 UTC
I saw the same behaviour when sending traffic between instances floating IP:

$ openstack server list
+--------------------------------------+--------+--------+-------------------------------------+-------+--------+
| ID                                   | Name   | Status | Networks                            | Image | Flavor |
+--------------------------------------+--------+--------+-------------------------------------+-------+--------+
| 2798889d-a994-4c7e-bbe8-897a2b61761c | trex12 | ACTIVE | gen_data=33.33.33.108, 11.11.11.115 | perf  |        |
| bb6f6ce0-b029-4d6e-a918-aa3fa609b146 | trex11 | ACTIVE | gen_data=33.33.33.130, 11.11.11.13  | perf  |        |
+--------------------------------------+--------+--------+-------------------------------------+-------+--------+

And also when I used a floating IP from the flat ("public") network assigned to the instance to communicate with external server.

Comment 4 Marcelo Ricardo Leitner 2021-07-23 21:37:53 UTC
(In reply to Itai Levy from comment #0)
> Matching TC rules:
> 
> )[root@overcloud-computesriov-rack0-1 /]# 
> ()[root@overcloud-computesriov-rack0-1 /]# tc -s filter show dev ens1f0_9
> ingress
> filter protocol ip pref 3 flower chain 0 
> filter protocol ip pref 3 flower chain 0 handle 0x1 
>   dst_mac fa:16:3e:1d:eb:d7
>   src_mac fa:16:3e:c1:c7:78
>   eth_type ipv4
>   ip_proto tcp
>   src_ip 33.33.33.108
>   ip_flags nofrag
>   in_hw in_hw_count 1
>         action order 1: gact action 12
>          random type none 262150 val 0
>          index 85466 ref 0 bind 0 installed 0 sec used 42949672 sec expires
> 343597383 sec
>         Action statistics:
> !!!Deficit -4, rta_len=20
> 
>         cookie 54fb36d0ca48a2743a8406b55be5de6f

This is breaking the dump of stats.
Which tc version are you using? I'm wondering if we have a kernel or iproute bug here.

Comment 5 Itai Levy 2021-07-25 07:00:21 UTC
Hi Marcelo,
See below full output I recollected - check out zone2 (chain 32 handle 0x3 in TC)
I used the MOFED TC to collect the output (tc utility, iproute2-5.11.0), I think last time I used the TC from the inbox nova_compute container.



OVS DP:
recirc_id(0),in_port(ens1f0_15),eth(src=fa:16:3e:86:53:96,dst=fa:16:3e:43:c5:52),eth_type(0x0800),ipv4(src=33.33.33.130,proto=6,frag=no), packets:37, bytes:7230, used:0.360s, actions:ct(zone=1),recirc(0x20)
recirc_id(0),in_port(ens1f0_15),eth(src=fa:16:3e:86:53:96,dst=fa:16:3e:f0:b1:9e),eth_type(0x0800),ipv4(src=33.33.33.130,proto=6,frag=no), packets:11231222, bytes:100422727635, used:0.360s, actions:ct(zone=1),recirc(0x20)
ct_state(+est-rel+rpl-inv+trk),ct_label(0/0x1),recirc_id(0x20),in_port(ens1f0_15),eth(src=fa:16:3e:86:53:96,dst=fa:16:3e:43:c5:52),eth_type(0x0800),ipv4(dst=33.33.33.2/255.255.255.254,proto=6,frag=no), packets:37, bytes:7230, used:0.360s, actions:ct(zone=9),recirc(0x21)
ct_state(+est-rel-rpl-inv+trk),ct_label(0/0x1),recirc_id(0x20),in_port(ens1f0_15),eth(src=fa:16:3e:86:53:96,dst=fa:16:3e:f0:b1:9e),eth_type(0x0800),ipv4(src=33.33.33.130,dst=11.11.11.115,proto=6,ttl=64,frag=no),tcp(dst=5101), packets:11346742, bytes:100793166480, used:0.000s, actions:ct_clear,set(eth(src=fa:16:3e:3a:50:7c,dst=fa:16:3e:d8:c6:6c)),set(ipv4(ttl=63)),ct(zone=2,nat),recirc(0x22)
recirc_id(0x22),in_port(ens1f0_15),eth(src=fa:16:3e:3a:50:7c),eth_type(0x0800),ipv4(src=33.33.33.130,frag=no), packets:11346743, bytes:100793166540, used:0.000s, actions:ct(commit,zone=5,nat(src=11.11.11.13)),recirc(0x23)
ct_state(+est-rel-rpl-inv+trk),ct_label(0/0x1),recirc_id(0x23),in_port(ens1f0_15),eth(src=fa:16:3e:3a:50:7c,dst=fa:16:3e:d8:c6:6c),eth_type(0x0800),ipv4(dst=11.11.11.64/255.255.255.192,proto=6,frag=no), packets:11346742, bytes:100793166480, used:0.000s, actions:ct_clear,push_vlan(vid=101,pcp=0),bond0

CT:
# cat /proc/net/nf_conntrack | grep "zone=1"
ipv4     2 tcp      6 src=33.33.33.2 dst=33.33.33.130 sport=55144 dport=22 src=33.33.33.130 dst=33.33.33.2 sport=22 dport=55144 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=1 use=3
ipv4     2 tcp      6 src=33.33.33.130 dst=11.11.11.115 sport=45324 dport=5101 src=11.11.11.115 dst=33.33.33.130 sport=5101 dport=45324 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=1 use=57279084

[root@overcloud-computesriov-rack0-0 heat-admin]# cat /proc/net/nf_conntrack | grep "zone=2"

[root@overcloud-computesriov-rack0-0 heat-admin]# cat /proc/net/nf_conntrack | grep "zone=5"
ipv4     2 tcp      6 src=33.33.33.130 dst=11.11.11.115 sport=45324 dport=5101 src=11.11.11.115 dst=11.11.11.13 sport=5101 dport=45324 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=5 use=3
[root@overcloud-computesriov-rack0-0 heat-admin]# cat /proc/net/nf_conntrack | grep "zone=9"
ipv4     2 tcp      6 src=33.33.33.2 dst=33.33.33.130 sport=55144 dport=22 src=33.33.33.130 dst=33.33.33.2 sport=22 dport=55144 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=9 use=3


TC:
root@overcloud-computesriov-rack0-0 heat-admin]# /opt/mellanox/iproute2/sbin/tc -s filter show dev ens1f0_15 ingress
filter protocol ip pref 6 flower chain 0 
filter protocol ip pref 6 flower chain 0 handle 0x1 
  dst_mac fa:16:3e:43:c5:52
  src_mac fa:16:3e:86:53:96
  eth_type ipv4
  ip_proto tcp
  src_ip 33.33.33.130
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: ct zone 1 pipe
         index 3 ref 1 bind 1 installed 434 sec used 0 sec
        Action statistics:
        Sent 87864 bytes 452 pkt (dropped 0, overlimits 0 requeues 0) 
        Sent software 0 bytes 0 pkt
        Sent hardware 87864 bytes 452 pkt
        backlog 0b 0p requeues 0
        cookie a4cd88803843f34fe0e9c39e588ebcb8
        used_hw_stats delayed

        action order 2: gact action goto chain 32
         random type none pass val 0
         index 3 ref 1 bind 1 installed 434 sec used 0 sec
        Action statistics:
        Sent 87864 bytes 452 pkt (dropped 0, overlimits 0 requeues 0) 
        Sent software 0 bytes 0 pkt
        Sent hardware 87864 bytes 452 pkt
        backlog 0b 0p requeues 0
        cookie a4cd88803843f34fe0e9c39e588ebcb8
        used_hw_stats delayed

filter protocol ip pref 6 flower chain 0 handle 0x2 
  dst_mac fa:16:3e:f0:b1:9e
  src_mac fa:16:3e:86:53:96
  eth_type ipv4
  ip_proto tcp
  src_ip 33.33.33.130
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: ct zone 1 pipe
         index 5 ref 1 bind 1 installed 434 sec used 0 sec firstused 433 sec
        Action statistics:
        Sent 1286031408501 bytes 143809951 pkt (dropped 0, overlimits 0 requeues 0) 
        Sent software 342 bytes 5 pkt
        Sent hardware 1286031408159 bytes 143809946 pkt
        backlog 0b 0p requeues 0
        cookie 91287ed546403493b98e5cb1a364eb5c
        used_hw_stats delayed

        action order 2: gact action goto chain 32
         random type none pass val 0
         index 5 ref 1 bind 1 installed 434 sec used 0 sec firstused 433 sec
        Action statistics:
        Sent 1286031408501 bytes 143809951 pkt (dropped 0, overlimits 0 requeues 0) 
        Sent software 342 bytes 5 pkt
        Sent hardware 1286031408159 bytes 143809946 pkt
        backlog 0b 0p requeues 0
        cookie 91287ed546403493b98e5cb1a364eb5c
        used_hw_stats delayed

filter protocol ip pref 6 flower chain 32 
filter protocol ip pref 6 flower chain 32 handle 0x1 
  dst_mac fa:16:3e:43:c5:52
  src_mac fa:16:3e:86:53:96
  eth_type ipv4
  ip_proto tcp
  dst_ip 33.33.33.2/31
  ip_flags nofrag
  ct_state +trk+est-inv+rpl
  ct_label 00000000000000000000000000000000/010000000000000000000000000000
  in_hw in_hw_count 1
        action order 1: ct zone 9 pipe
         index 4 ref 1 bind 1 installed 434 sec used 0 sec
        Action statistics:
        Sent 87864 bytes 452 pkt (dropped 0, overlimits 0 requeues 0) 
        Sent software 0 bytes 0 pkt
        Sent hardware 87864 bytes 452 pkt
        backlog 0b 0p requeues 0
        cookie 2435c61d4245e31fac134a846feba6dd
        used_hw_stats delayed

        action order 2: gact action goto chain 33
         random type none pass val 0
         index 4 ref 1 bind 1 installed 434 sec used 0 sec
        Action statistics:
        Sent 87864 bytes 452 pkt (dropped 0, overlimits 0 requeues 0) 
        Sent software 0 bytes 0 pkt
        Sent hardware 87864 bytes 452 pkt
        backlog 0b 0p requeues 0
        cookie 2435c61d4245e31fac134a846feba6dd
        used_hw_stats delayed

filter protocol ip pref 6 flower chain 32 handle 0x2 
  dst_mac fa:16:3e:f0:b1:9e
  src_mac fa:16:3e:86:53:96
  eth_type ipv4
  ip_proto tcp
  ip_ttl 64
  dst_ip 11.11.11.115
  src_ip 33.33.33.130
  dst_port 5101
  ip_flags nofrag
  ct_state +trk+new-est-inv-rpl
  ct_label 00000000000000000000000000000000/010000000000000000000000000000
  not_in_hw
        action order 1: ct commit zone 1 label 00000000000000000000000000000000/01000000000000000000000000000000 pipe
         index 6 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec
        Action statistics:
        Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie d951fb4b724561b98a26fca2b689e4f3

        action order 2: ct clear pipe
         index 7 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec
        Action statistics:
        Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie d951fb4b724561b98a26fca2b689e4f3

        action order 3:  pedit action pipe keys 5
         index 1 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec
         key #0  at ipv4+8: val 3f000000 mask 00ffffff
         key #1  at eth+4: val 0000fa16 mask ffff0000
         key #2  at eth+8: val 3e3a507c mask 00000000
         key #3  at eth+0: val fa163ed8 mask 00000000
         key #4  at eth+4: val c66c0000 mask 0000ffff
        Action statistics:
        Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0

        action order 4: csum (iph, tcp) action pipe
        index 1 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec
        Action statistics:
        Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        no_percpu

        action order 5: ct zone 2 nat pipe
         index 8 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec
        Action statistics:
        Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie d951fb4b724561b98a26fca2b689e4f3

        action order 6: gact action goto chain 34
         random type none pass val 0
         index 6 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec
        Action statistics:
        Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie d951fb4b724561b98a26fca2b689e4f3

filter protocol ip pref 6 flower chain 32 handle 0x3 
  dst_mac fa:16:3e:f0:b1:9e
  src_mac fa:16:3e:86:53:96
  eth_type ipv4
  ip_proto tcp
  ip_ttl 64
  dst_ip 11.11.11.115
  src_ip 33.33.33.130
  dst_port 5101
  ip_flags nofrag
  ct_state +trk+est-inv-rpl
  ct_label 00000000000000000000000000000000/010000000000000000000000000000
  in_hw in_hw_count 1
        action order 1: ct clear pipe
         index 15 ref 1 bind 1 installed 433 sec firstused 433 sec
        Action statistics:
        Sent 1277977658708 bytes 143848875 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie 899f102c8a420d73499008bcddf19829
        used_hw_stats delayed

        action order 2:  pedit action pipe keys 5
         index 3 ref 1 bind 1 installed 433 sec firstused 433 sec
         key #0  at ipv4+8: val 3f000000 mask 00ffffff
         key #1  at eth+4: val 0000fa16 mask ffff0000
         key #2  at eth+8: val 3e3a507c mask 00000000
         key #3  at eth+0: val fa163ed8 mask 00000000
         key #4  at eth+4: val c66c0000 mask 0000ffff
        Action statistics:
        Sent 1277977658708 bytes 143848875 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        used_hw_stats delayed

        action order 3: csum (iph, tcp) action pipe
        index 3 ref 1 bind 1 installed 433 sec firstused 433 sec
        Action statistics:
        Sent 1277977658708 bytes 143848875 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        no_percpu
        used_hw_stats delayed

        action order 4: ct zone 2 nat pipe
         index 16 ref 1 bind 1 installed 433 sec firstused 433 sec
        Action statistics:
        Sent 1277977658708 bytes 143848875 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie 899f102c8a420d73499008bcddf19829
        used_hw_stats delayed

        action order 5: gact action goto chain 34
         random type none pass val 0
         index 10 ref 1 bind 1 installed 433 sec firstused 433 sec
        Action statistics:
        Sent 1277977658708 bytes 143848875 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie 899f102c8a420d73499008bcddf19829
        used_hw_stats delayed

filter protocol ip pref 6 flower chain 33 
filter protocol ip pref 6 flower chain 33 handle 0x1 
  dst_mac fa:16:3e:43:c5:52/01:00:00:00:00:00
  eth_type ipv4
  ip_flags nofrag
  ct_state +trk+est-inv+rpl
  ct_label 00000000000000000000000000000000/010000000000000000000000000000
  not_in_hw
        action order 1: mirred (Egress Redirect to device tapbd693cae-90) stolen
        index 13 ref 1 bind 1 installed 434 sec used 0 sec firstused 434 sec
        Action statistics:
        Sent 81536 bytes 452 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie 828edae9894ef835a15fc8b07208db9f
        no_percpu

filter protocol ip pref 6 flower chain 34 
filter protocol ip pref 6 flower chain 34 handle 0x1 
  src_mac fa:16:3e:3a:50:7c
  eth_type ipv4
  src_ip 33.33.33.130
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: ct commit zone 5 nat src addr 11.11.11.13 pipe
         index 9 ref 1 bind 1 installed 434 sec firstused 433 sec
        Action statistics:
        Sent 1277977721050 bytes 143848883 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie ea20be5a17401bab90bdab8946cbd25a
        used_hw_stats delayed

        action order 2: gact action goto chain 35
         random type none pass val 0
         index 7 ref 1 bind 1 installed 434 sec firstused 433 sec
        Action statistics:
        Sent 1277977721050 bytes 143848883 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie ea20be5a17401bab90bdab8946cbd25a
        used_hw_stats delayed

filter protocol ip pref 6 flower chain 35 
filter protocol ip pref 6 flower chain 35 handle 0x1 
  dst_mac fa:16:3e:d8:c6:6c
  src_mac fa:16:3e:3a:50:7c
  eth_type ipv4
  ip_proto tcp
  dst_ip 11.11.11.115/26
  ip_flags nofrag
  ct_state +trk+new-est-inv-rpl
  ct_label 00000000000000000000000000000000/010000000000000000000000000000
  not_in_hw
        action order 1: ct clear pipe
         index 10 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec
        Action statistics:
        Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie 0fb5c4be684e7504ea0488a84b1c0aa2

        action order 2: vlan  push id 101 protocol 802.1Q priority 0 pipe
         index 7 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec
        Action statistics:
        Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        no_percpu

        action order 3: mirred (Egress Redirect to device bond0) stolen
        index 14 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec
        Action statistics:
        Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie 0fb5c4be684e7504ea0488a84b1c0aa2
        no_percpu

filter protocol ip pref 6 flower chain 35 handle 0x2 
  dst_mac fa:16:3e:d8:c6:6c
  src_mac fa:16:3e:3a:50:7c
  eth_type ipv4
  ip_proto tcp
  dst_ip 11.11.11.115/26
  ip_flags nofrag
  ct_state +trk+est-inv-rpl
  ct_label 00000000000000000000000000000000/010000000000000000000000000000
  in_hw in_hw_count 1
        action order 1: ct clear pipe
         index 20 ref 1 bind 1 installed 433 sec firstused 433 sec
        Action statistics:
        Sent 1277977729932 bytes 143848883 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie 2394f0eaf747620083ade288d9f9c022
        used_hw_stats delayed

        action order 2: vlan  push id 101 protocol 802.1Q priority 0 pipe
         index 9 ref 1 bind 1 installed 433 sec firstused 433 sec
        Action statistics:
        Sent 1277977729932 bytes 143848883 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        no_percpu
        used_hw_stats delayed

        action order 3: mirred (Egress Redirect to device bond0) stolen
        index 16 ref 1 bind 1 installed 433 sec firstused 433 sec
        Action statistics:
        Sent 1277977667650 bytes 143848876 pkt (dropped 0, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0
        cookie 2394f0eaf747620083ade288d9f9c022
        no_percpu
        used_hw_stats delayed

[root@overcloud-computesriov-rack0-0

Comment 6 Itai Levy 2021-07-28 07:13:19 UTC
Guys, 
Can you confirm that for DVR DNAT workload, the current OVN implementation is to always send the packet via 2 zones (SNAT, DNAT) in its pipeline while committing the connection only in one of them?

Comment 8 Marcelo Ricardo Leitner 2021-07-28 21:46:34 UTC
Adding needinfo regarding comment #6.

Comment 9 Marcelo Ricardo Leitner 2021-08-04 21:10:58 UTC
(In reply to Itai Levy from comment #6)
> Guys, 
> Can you confirm that for DVR DNAT workload, the current OVN implementation
> is to always send the packet via 2 zones (SNAT, DNAT) in its pipeline while
> committing the connection only in one of them?

Hi Itai,

Does this seem familiar?
https://bugzilla.redhat.com/show_bug.cgi?id=1974585#c7

Comment 10 Itai Levy 2021-08-09 07:00:04 UTC
Hi Marcelo, 
Yes this BZ seems to include the same symptom that only one of the CT zones used in OVS rules is committed in the kernel (preventing HW offload).

Please take a look on this patch:
https://patchwork.ozlabs.org/project/openvswitch/cover/1570154179-14525-1-git-send-email-ankur.sharma@nutanix.com/

It is introducing a stateless handling for DNAT traffic (only). CT will not be used for the NAT operation in order to prevent ddos flooding CT with unneeded dant entries.
This makes a sense to me, and in addition I can confirm it is solving the DNAT offload issue caused by the current ovn CT-dnat implementation.

If it makes sense to you as well, can you consider exposing a user-friendly option to work in this mode? 
Looks like a simple straightforward implementation...

Itai

Comment 11 Marcelo Ricardo Leitner 2021-08-10 12:46:45 UTC
Hi Itai,

I need to discuss that with OSP team. I myself am a bit hesitant with it because going stateless often sounds like undoing work and (also) often leaves some holes behind. But lets see.

In this case, for example, it makes the OSP use case to work, but other users that don't want to go stateless would still have the bug there (if we go along with just this knob).

The final version of the patch from comment #10:
https://patchwork.ozlabs.org/project/openvswitch/patch/1572571718-83139-2-git-send-email-ankur.sharma@nutanix.com/
(it is accepted in OVN since Nov 2019).

@Haresh, I need to step out now, but lets talk about it. I'll ping you when I'm back.

Comment 12 Haresh Khandelwal 2021-08-22 19:12:56 UTC
(In reply to Marcelo Ricardo Leitner from comment #11)
> Hi Itai,
> 
> I need to discuss that with OSP team. I myself am a bit hesitant with it
> because going stateless often sounds like undoing work and (also) often
> leaves some holes behind. But lets see.
> 
> In this case, for example, it makes the OSP use case to work, but other
> users that don't want to go stateless would still have the bug there (if we
> go along with just this knob).
> 
> The final version of the patch from comment #10:
> https://patchwork.ozlabs.org/project/openvswitch/patch/1572571718-83139-2-
> git-send-email-ankur.sharma/
> (it is accepted in OVN since Nov 2019).
> 
> @Haresh, I need to step out now, but lets talk about it. I'll ping you when
> I'm back.

Hi Marcelo, Itai,

So, i tried floating ip attachment with switchdev interface which belongs to geneve tenant network. 
I have RHOSP 16.2 (RHOS-16.2-RHEL-8-20210728.n.2).

With kernel-modules-extra, I do traffic egressing is not offloaded.

ufid:a9539e9d-23dc-4ead-843c-be13b437544b, recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(enp4s0f1_0),skb_mark(0/0),ct_state(0/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),eth(src=f8:f2:1e:03:bf:f6,dst=fa:16:3e:1f:52:ca),eth_type(0x0800),ipv4(src=7.7.7.32/255.255.255.224,dst=10.10.54.100,proto=0/0,tos=0/0x3,ttl=64,frag=no), packets:1772, bytes:173656, used:0.907s, dp:ovs, actions:ct_clear,set(tunnel(tun_id=0x3,dst=10.10.51.171,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x40002}),flags(df|csum|key))),set(eth(src=fa:16:3e:0b:8c:4b,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),genev_sys_6081

Whereas ingressing is offloaded.

ufid:2123667b-83ef-4614-8bf2-4df268839daf, skb_priority(0/0),tunnel(tun_id=0x4,src=10.10.51.171,dst=10.10.51.150,ttl=0/0,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x30002/0x7fffffff}),flags(+key)),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(genev_sys_6081),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:1f:52:ca,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:5907, bytes:578886, used:0.500s, offloaded:yes, dp:tc, actions:enp4s0f1_0

With kernel-modules-extra, I do see both directions traffic offloaded.

ufid:952cf16e-f7d9-4313-8534-b79d286e247d, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0),dp_hash(0/0),in_port(enp4s0f1_0),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f6,dst=fa:16:3e:1f:52:ca),eth_type(0x0800),ipv4(src=7.7.7.32/255.255.255.224,dst=192.0.0.0/224.0.0.0,proto=17,tos=0/0x3,ttl=64,frag=no),udp(src=0/0,dst=0/0), packets:8, bytes:520, used:1.390s, offloaded:yes, dp:tc, actions:ct_clear,set(tunnel(tun_id=0x3,dst=10.10.51.171,ttl=64,tp_dst=6081,key6(bad key length 1, expected 0)(01)geneve({class=0x102,type=0x80,len=4,0x40002}),flags(key))),set(eth(src=fa:16:3e:0b:8c:4b,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),genev_sys_6081

ufid:365b5b04-173c-48a4-b685-6b83c0ba10d4, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0),dp_hash(0/0),in_port(enp4s0f1_0),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f6,dst=fa:16:3e:1f:52:ca),eth_type(0x0800),ipv4(src=7.7.7.32/255.255.255.224,dst=10.10.54.100,proto=1,tos=0/0x3,ttl=64,frag=no),icmp(type=0/0,code=0/0), packets:145, bytes:12180, used:0.930s, offloaded:yes, dp:tc, actions:ct_clear,set(tunnel(tun_id=0x3,dst=10.10.51.171,ttl=64,tp_dst=6081,key6(bad key length 1, expected 0)(01)geneve({class=0x102,type=0x80,len=4,0x40002}),flags(key))),set(eth(src=fa:16:3e:0b:8c:4b,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),genev_sys_6081

ufid:f294c7a7-f5c6-4ec3-b349-48d8377f98c6, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0),dp_hash(0/0),in_port(enp4s0f1_0),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f6,dst=fa:16:3e:1f:52:ca),eth_type(0x0800),ipv4(src=7.7.7.32/255.255.255.224,dst=128.0.0.0/192.0.0.0,proto=17,tos=0/0x3,ttl=64,frag=no),udp(src=0/0,dst=0/0), packets:6, bytes:390, used:6.400s, offloaded:yes, dp:tc, actions:ct_clear,set(tunnel(tun_id=0x3,dst=10.10.51.171,ttl=64,tp_dst=6081,key6(bad key length 1, expected 0)(01)geneve({class=0x102,type=0x80,len=4,0x40002}),flags(key))),set(eth(src=fa:16:3e:0b:8c:4b,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),genev_sys_6081

ufid:57d4aa27-f7fa-45e2-a6d4-a0d67330fb93, skb_priority(0/0),tunnel(tun_id=0x4,src=10.10.51.171,dst=10.10.51.150,ttl=0/0,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x30002/0x7fffffff}),flags(+key)),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(genev_sys_6081),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:1f:52:ca,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:160, bytes:15680, used:0.420s, offloaded:yes, dp:tc, actions:enp4s0f1_0

One thing to note here is, this is DVR disabled, NAT happens at controller itself. 
Did you tried DVR enabling on your deployment?

Comment 13 Itai Levy 2021-08-23 14:08:29 UTC
Hi Haresh, 

yes, as I wrote in the description I used a DVR deployment where FIP DNAT happens on the compute node itself...


Marcelo, 
FYI - the stateless FIP DNAT patch was taken by OpenStack community https://review.opendev.org/c/openstack/neutron/+/804807

Itai

Comment 14 Haresh Khandelwal 2021-08-27 11:02:32 UTC
Hi Itai,

I deployed distributed NAT and tried same thing (ICMP traffic, tcp traffic). 

ICMP:

Egresss:

ufid:9a58f50a-66c0-4983-849d-d98a4e889af6, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f2,dst=fa:16:3e:84:64:93),eth_type(0x0800),ipv4(src=7.7.7.68,dst=0.0.0.0/0.0.0.0,proto=1,tos=0/0,ttl=0/0,frag=no),icmp(type=0/0,code=0/0), packets:100, bytes:8400, used:0.760s, offloaded:yes, dp:tc, actions:ct(zone=9),recirc(0x36)

ufid:ab48baff-1eb3-4d14-8a1c-e1e13e982c69, skb_priority(0/0),skb_mark(0/0),ct_state(0x2a/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x36),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f2,dst=fa:16:3e:84:64:93),eth_type(0x0800),ipv4(src=7.7.7.68,dst=10.10.54.100,proto=1,tos=0/0,ttl=64,frag=no),icmp(type=0/0,code=0/0), packets:100, bytes:8400, used:0.760s, offloaded:yes, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:fd:50:3c,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),ct(zone=1,nat),recirc(0x37)

ufid:e76a1233-edbe-4ebd-a872-fe9c2fe6e488, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x37),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:fd:50:3c,dst=72:f9:61:09:b9:79),eth_type(0x0800),ipv4(src=8.0.0.0/248.0.0.0,dst=10.10.54.0/255.255.255.128,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:100, bytes:8400, used:0.760s, offloaded:yes, dp:tc, actions:ct_clear,push_vlan(vid=405,pcp=0),mx-bond

Ingress:

ufid:9f92ef2b-3e0a-49de-a800-ba4c267d3563, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=72:f9:61:09:b9:79,dst=fa:16:3e:fd:50:3c),eth_type(0x8100),vlan(vid=405,pcp=0),encap(eth_type(0x0800),ipv4(src=10.10.54.0/255.255.255.128,dst=10.10.54.129,proto=1,tos=0/0,ttl=64,frag=no),icmp(type=0/0,code=0/0)), packets:100, bytes:8400, used:0.760s, offloaded:yes, dp:tc, actions:ct_clear,pop_vlan,ct(zone=4,nat),recirc(0x33)

ufid:61396550-4f08-4dfd-8efc-e44a5bea8575, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x33),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=10.10.54.129,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:100, bytes:8400, used:0.760s, offloaded:yes, dp:tc, actions:ct(commit,zone=1,nat(dst=7.7.7.68)),recirc(0x34)

ufid:dd5fa327-2adf-42e8-939e-ba65fc8d5ef9, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x34),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=72:f9:61:09:b9:79,dst=fa:16:3e:fd:50:3c),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=7.7.7.68,proto=1,tos=0/0,ttl=64,frag=no),icmp(type=0/0,code=0/0), packets:99, bytes:8316, used:0.760s, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:84:64:93,dst=f8:f2:1e:03:bf:f2)),set(ipv4(ttl=63)),ct(zone=9),recirc(0x35)

ufid:94b24ff4-c02c-4469-a4b4-e6d38757ed90, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x35),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:84:64:93,dst=f8:f2:1e:03:bf:f2),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=7.7.7.68,proto=1,tos=0/0,ttl=0/0,frag=no),icmp(type=0/0,code=0/0), packets:99, bytes:8316, used:0.760s, offloaded:yes, dp:tc, actions:enp4s0f0_2


As you see, dp rule in ingress direction with recirc_id(0x34) is not offloaded. Rest all are offloaded. 

TCP:

Egress:

ufid:0f233985-485a-45ac-80f2-16a437e70d53, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f2,dst=fa:16:3e:84:64:93),eth_type(0x0800),ipv4(src=7.7.7.68,dst=0.0.0.0/0.0.0.0,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=0/0), packets:1000278, bytes:1448505492, used:0.290s, offloaded:yes, dp:tc, actions:ct(zone=9),recirc(0x5f)

ufid:7964ee11-696c-4c01-9227-05f8ab0ead44, skb_priority(0/0),skb_mark(0/0),ct_state(0x2a/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x5f),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f2,dst=fa:16:3e:84:64:93),eth_type(0x0800),ipv4(src=7.7.7.68,dst=10.10.54.100,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=0/0), packets:1035630, bytes:1434466076, used:0.000s, offloaded:yes, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:fd:50:3c,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),ct(zone=1,nat),recirc(0x60)

ufid:16a13e6f-49cd-42b3-85c7-40ea3ffe857e, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x60),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:fd:50:3c,dst=72:f9:61:09:b9:79),eth_type(0x0800),ipv4(src=8.0.0.0/248.0.0.0,dst=10.10.54.0/255.255.255.128,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:1035630, bytes:1434466076, used:0.000s, offloaded:yes, dp:tc, actions:ct_clear,push_vlan(vid=405,pcp=0),mx-bond

Ingress:

ufid:e1ea623c-64f4-4fd8-a6c4-6ef45c792ad9, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=72:f9:61:09:b9:79,dst=fa:16:3e:fd:50:3c),eth_type(0x8100),vlan(vid=405,pcp=0),encap(eth_type(0x0800),ipv4(src=10.10.54.0/255.255.255.128,dst=10.10.54.129,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=0/0)), packets:1351777, bytes:85968988, used:0.000s, offloaded:yes, dp:tc, actions:ct_clear,pop_vlan,ct(zone=4,nat),recirc(0x62)

ufid:8ccf45b5-c79f-4fae-a548-91ff15d59e8e, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x62),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=10.10.54.129,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:1351778, bytes:85969040, used:0.000s, offloaded:yes, dp:tc, actions:ct(commit,zone=1,nat(dst=7.7.7.68)),recirc(0x63)

ufid:a0bc0880-e449-4a04-9dfa-ee8776b71577, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x63),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=72:f9:61:09:b9:79,dst=fa:16:3e:fd:50:3c),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=7.7.7.68,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=0/0), packets:1351780, bytes:85969176, used:0.000s, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:84:64:93,dst=f8:f2:1e:03:bf:f2)),set(ipv4(ttl=63)),ct(zone=9),recirc(0x64)

ufid:2de5ad96-dd4e-4307-8819-aa35d69d8731, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x64),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:84:64:93,dst=f8:f2:1e:03:bf:f2),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=7.7.7.68,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=22), packets:1351779, bytes:85967804, used:0.000s, offloaded:yes, dp:tc, actions:enp4s0f0_2

Dp rule in ingress direction with recird_id 0x63 not offloaded. 

I dont see any "ghost" rule as such here. 

Thanks

Comment 15 Itai Levy 2021-08-29 08:48:12 UTC
Hi Haresh, 

As you can see in the description, I am using the following components in my deployment:
• RHEL8.4 with kernel 4.18.0-305.7.1.el8_4.x86_64
• MLNX_OFED_LINUX-5.4-0.5 
• openvswitch 2.14.1 (MOFED OVS)

Your system will probably behave differently.

Itai

Comment 16 Haresh Khandelwal 2021-08-29 09:18:28 UTC
Hi Itai,

This mixture wont be conclusive from OSP perspective (as customer would deploy what shipped with RHSO16.2), not denying your issue is not relevant though.
Do you have  access to latest 16.2 RC? I am using that compose if you can try that.
We have RHEL 8.4 with kernel 4.18.0-305.12.1.el8_4.x86_64 (And mlx5_core), ovs version is openvswitch2.15-2.15.0-26.el8fdp.x86_64 in RHOSP16.2 RC.

Thanks

Comment 17 Itai Levy 2021-08-29 09:52:57 UTC
Hi Haresh, 
1. Unfortunately I dont have access to the latest 16.2 RC. Will appreciate if you can provide it to me.
2. The components we used might be different, and even though you dont see the exact ovs dump as I do, the result is the same - DNAT traffic is not offloaded.
You can verify by checking if zone9 seen in your dumps above is committed in /proc/net/nf_conntrack output.

Itai

Comment 19 Itai Levy 2021-09-06 11:12:09 UTC
Hi Marcelo, 
An update from my side. 
It seems like the stateless DNAT is not a valid solution for offloading DNAT FIP traffic...
Looking into it more carefully, the traffic is offloaded only in a single direction.
As we are back to square one, the current DNAT conntrack implementation might be needed to be reconsidered in order to find a way to allow a proper HW offload of this important use case.

Itai

Comment 21 Alaa Hleihel (NVIDIA Mellanox) 2021-09-09 10:45:26 UTC
Itay mentioned that this patch is needed for this BZ: 
https://review.opendev.org/c/openstack/neutron/+/804807

Comment 22 Marcelo Ricardo Leitner 2021-09-16 12:08:45 UTC
Hi Alaa, Itai,

Now I am confused. The patch on comment #21 is using stateless NAT but comment #19 indicates that it isn't a good way forward?
I see Moshe reviewed the patch.
Does this patch solves the single direction issue mentioned in comment #19?

@Haresh, thoughts on the patch?

Thanks.

Comment 23 Haresh Khandelwal 2021-09-16 14:32:05 UTC
(In reply to Marcelo Ricardo Leitner from comment #22)
> Hi Alaa, Itai,
> 
> Now I am confused. The patch on comment #21 is using stateless NAT but
> comment #19 indicates that it isn't a good way forward?
> I see Moshe reviewed the patch.
> Does this patch solves the single direction issue mentioned in comment #19?
> 
> @Haresh, thoughts on the patch?

This patch looks like OSP side implementation of below. 
https://patchwork.ozlabs.org/project/openvswitch/patch/1572571718-83139-2-git-send-email-ankur.sharma@nutanix.com/

So, when we attach FIP to instance, this patch makes all FIP stateless and ovn configure snat_dnat as stateless rule. something like below.
ovn-nbctl --stateless lr-nat-add 473faace-478c-4841-b438-84c3ebaaa528 dnat_and_snat 10.10.54.129 7.7.7.68 4f408c64-7474-4f58-81ec-3f8abba72562 fa:16:3e:fd:50:3c

However, this is not offloading traffic against egress direction right now. 
You can check this Bz#2004995.

Thanks

Comment 24 Itai Levy 2021-09-18 13:36:34 UTC
Hi, 

To clarify:
1. Original issue is that FIP is not being offloaded due to ovn/ct/fip stateful implementation 
2. in order to "workaround" #1, we suggested to use a patch to allow making FIP traffic stateless eliminating CT nat part, however it seems like here as well offload is working only for one direction

so now we are back to #1, the question is - is it possible to reconsider the ovn/ct fip implementation which using 2 CT zones while committing only one of them.

Itai

Comment 25 Marcelo Ricardo Leitner 2021-10-05 20:54:23 UTC
Hi Terry, anything else you need here? Itai's comment above summarizes the situation here.

Comment 30 Numan Siddique 2021-10-27 20:40:54 UTC
Some updates on the progress so far:

1.  We have a fix in OVN to address this issue.  The idea is to just use one zone for NATting in distributed routers.
    We would still need to use 2 zones if the packet involves both SNAT and DNAT (for hairpin traffic whose source and destination
    are in the some compute node).

2.  Abhiram has tested this in his environment and the traffic for the scenarion mentioned in this BZ is getting offloaded.

3. The initial patch is here - https://github.com/numansiddique/ovn/tree/ct_nat_czone_v1/p2

Patch needs some work before it is posted for review.

Comment 31 Itai Levy 2021-10-28 14:00:11 UTC
Hi Numan, 
Thanks for the update.
Can you please elaborate on the "SNAT + DNAT" use case? what exactly is the traffic flow here? how come snat (many to one NAT) and dnat (one to one NAT) are used at the same time?

Itai

Comment 32 Numan Siddique 2021-10-29 01:00:19 UTC
Hi Itai,

This can happen in the following scenario

Lets say you have a VM/pod - LP1 (10.0.0.3) on logical switch sw1 and it has a dnat_and_snat configured
with external ip (floating ip in openstack terminology) 172.16.0.110

And there is another VM/pod - LP2 (20.0.0.3) on logical switch sw2 and it has floating ip - 172.16.0.120
and both are connected to the same router.

Lets assume both these VMs/pods are hosted on the same compute node.


Suppose LP1 sends a pkt to the floating ip of LP2

i.e ip.src = 10.0.0.3, ip.dst = 172.16.0.120

In this case, first the source IP will be SNATted to 172.16.0.110 in SNAT ct zone.

So the pkt becomes - (ip.src = 172.16.0.110, ip.dst = 172.16.0.120)

And then the pkt is DNATted to 20.0.0.3 in DNAT ct zone (ip.src = 172.16.0.110, ip.dst = 20.0.0.3)
and it is delivered to LP2.

What I meant in my comment was we need 2 zones for this scenario.

Otherwise just one ct zone is enough.

Thanks
Numan

Comment 33 Itai Levy 2021-10-31 07:28:41 UTC
Thanks Numan, well described and clarified.
Itai

Comment 34 Numan Siddique 2021-11-10 14:09:31 UTC
Patches posted for review - https://patchwork.ozlabs.org/project/ovn/list/?series=270616

Comment 35 arn 2021-11-11 10:22:13 UTC
Latest update on test with the patch

1) If flow_steering_mode is set to ‘dmfs’ Stateful NAT offload works fine without issue [Verified on 4.18.0-305.22.1.el8]

2) If ‘smfs’ is set then observing the csum failure issue. [Earlier suspicion was we might be hitting BZ --> 1974356 - s_pf0vf2: hw csum failure for mlx5 ] . So, I feel we might be dependent on flow_steering_mode which is making the difference here rather than on BZ 1974356

Comment 38 Marcelo Ricardo Leitner 2021-12-02 12:36:52 UTC
v3 was accepted, https://patchwork.ozlabs.org/project/ovn/list/?series=272942&state=*

Comment 41 Marcelo Ricardo Leitner 2022-03-18 19:58:52 UTC
Is this merged downstream already perhaps?

Comment 45 Marcelo Ricardo Leitner 2022-08-01 22:36:57 UTC
When can we close this bz?

Comment 46 Haresh Khandelwal 2022-08-02 05:05:09 UTC
(In reply to Marcelo Ricardo Leitner from comment #45)
> When can we close this bz?

Its clone Bz#2024599 for OSP has been verified in 16.2.3 FDP folks can verify this in latest ovn.

Thanks

Comment 48 Red Hat Bugzilla 2023-09-18 00:28:36 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.