Description of problem: One of the OVS DP rules created for DNAT floating-ip traffic in DVR setup, is redirected into a Connection Tracking zone which is not committed, and therefore cannot be offloaded by HW. Version-Release number of selected component (if applicable): • OSP16.1.4 • RHEL8.4 with kernel 4.18.0-305.7.1.el8_4.x86_64 • MLNX_OFED_LINUX-5.4-0.5 • openvswitch 2.14.1 (MOFED OVS) • ConnectX NIC is configured as bond (VF-LAG, LACP) • geneve tenant network , direct ports with "switchdev" capabilities and with security groups • vRouter with the geneve tenant subnet + additional external subnet with floating IPs • VMs running iperf3 test between the floating IPs How reproducible: Every time. Steps to Reproduce: 1. deploy cloud 2. create geneve tenant network 3. create direct ports with: --binding-profile '{"capabilities":["switchdev"]}' --security-group my_policy (to allow the iperf traffic) 3. create an external provider vlan / flat network 4. create vrouter with both subnets (--external-gateway for the "external" network) 5. create floating IPs on the "external" network 6. create instances with the geneve direct ports and assign external floating IPs 7. run traffic (iperf) between VMs or between a VM and an external iperf server via the floating IP Actual results: Traffic is not offloaded, seen in TCPdump. one of the OVS DP rules created for the DNAT traffic is redirected into a Connection Tracking zone which is not committed into kernel, TC output shows the traffic of this "ghost" zone is "in_hw" however the packet on the specific chain are processed by SW and not by HW. Expected results: full offload of all DNAT OVS DP rules Additional info: See below OVS DP rules from the iperf TX node, egress chain 0x31 is redirected to chain 0x3a / zone=5, and this zone does not appear in /proc/net/nf_conntrack output... Although this rule is marked as "offloaded", TC output shows packets of this chain are not sent by HW. ingress DP: ufid:ea92a156-0026-4f35-8dba-853525d39732, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0),dp_hash(0/0),in_port(bond0),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:46:93:ab,dst=fa:16:3e:61:01:ce),eth_type(0x8100),vlan(vid=101,pcp=0),encap(eth_type(0x0800),ipv4(src=11.11.11.0/255.255.255.192,dst=11.11.11.115,proto=6,tos=0/0,ttl=63,frag=no),tcp(src=0/0,dst=0/0)), packets:5123628, bytes:379148492, used:0.880s, offloaded:yes, dp:tc, actions:ct_clear,pop_vlan,ct(zone=3,nat),recirc(0x3c) ufid:1fa4bae9-6703-420b-b371-1a83afcfef76, skb_priority(0/0),skb_mark(0/0),ct_state(0x2a/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x3c),dp_hash(0/0),in_port(bond0),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:46:93:ab,dst=fa:16:3e:61:01:ce),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=33.33.33.108,proto=6,tos=0/0,ttl=63,frag=no),tcp(src=0/0,dst=0/0), packets:5123628, bytes:358654028, used:0.880s, offloaded:yes, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:1d:eb:d7,dst=fa:16:3e:c1:c7:78)),set(ipv4(ttl=62)),ct(zone=1),recirc(0x3d) ufid:21aee1ac-0a83-49cb-a108-75daca673e6e, skb_priority(0/0),skb_mark(0/0),ct_state(0x2a/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x3d),dp_hash(0/0),in_port(bond0),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=fa:16:3e:c1:c7:78),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=33.33.33.108,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:5123628, bytes:358654028, used:0.880s, offloaded:yes, dp:tc, actions:ens1f0_9 egress DP: ufid:d036fb54-74a2-48ca-b506-843a6fdee55b, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0_9),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:c1:c7:78,dst=fa:16:3e:1d:eb:d7),eth_type(0x0800),ipv4(src=33.33.33.108,dst=0.0.0.0/0.0.0.0,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=0/0), packets:109255400, bytes:976762191901, used:0.880s, offloaded:yes, dp:tc, actions:ct(zone=1),recirc(0x31) ufid:02daed36-ea4a-49ef-b49a-45ddca0ae381, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x31),dp_hash(0/0),in_port(ens1f0_9),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:c1:c7:78,dst=fa:16:3e:1d:eb:d7),eth_type(0x0800),ipv4(src=33.33.33.108,dst=11.11.11.13,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=5101), packets:109558494, bytes:973074160887, used:0.000s, offloaded:yes, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:61:01:ce,dst=fa:16:3e:46:93:ab)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x3a) ufid:19edcb48-7e72-424e-a7bd-f5bab318c599, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x3a),dp_hash(0/0),in_port(ens1f0_9),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:61:01:ce,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=33.33.33.108,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:109558497, bytes:973074161067, used:0.000s, offloaded:yes, dp:tc, actions:ct(commit,zone=3,nat(src=11.11.11.115)),recirc(0x3b) ufid:6f721f1a-d420-488f-b165-75e4599b6709, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x3b),dp_hash(0/0),in_port(ens1f0_9),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:61:01:ce,dst=fa:16:3e:46:93:ab),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=11.11.11.0/255.255.255.192,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=0/0), packets:109558501, bytes:973074223169, used:0.000s, offloaded:yes, dp:tc, actions:ct_clear,push_vlan(vid=101,pcp=0),bond0 Matching TC rules: )[root@overcloud-computesriov-rack0-1 /]# ()[root@overcloud-computesriov-rack0-1 /]# tc -s filter show dev ens1f0_9 ingress filter protocol ip pref 3 flower chain 0 filter protocol ip pref 3 flower chain 0 handle 0x1 dst_mac fa:16:3e:1d:eb:d7 src_mac fa:16:3e:c1:c7:78 eth_type ipv4 ip_proto tcp src_ip 33.33.33.108 ip_flags nofrag in_hw in_hw_count 1 action order 1: gact action 12 random type none 262150 val 0 index 85466 ref 0 bind 0 installed 0 sec used 42949672 sec expires 343597383 sec Action statistics: !!!Deficit -4, rta_len=20 cookie 54fb36d0ca48a2743a8406b55be5de6f action order 2: gact action goto chain 49 random type none pass val 0 index 1 ref 1 bind 1 installed 854 sec used 0 sec Action statistics: Sent 2591111747191 bytes 289802644 pkt (dropped 0, overlimits 0 requeues 0) Sent software 953 bytes 15 pkt Sent hardware 2591111746238 bytes 289802629 pkt backlog 0b 0p requeues 0 cookie 54fb36d0ca48a2743a8406b55be5de6f filter protocol ip pref 3 flower chain 49 handle 0x3 dst_mac fa:16:3e:1d:eb:d7 src_mac fa:16:3e:c1:c7:78 eth_type ipv4 ip_proto tcp ip_ttl 0x40/ff dst_ip 11.11.11.13 src_ip 33.33.33.108 dst_port 5101 ip_flags nofrag in_hw in_hw_count 1 action order 1: gact action pass random type none 65560 val 0 index 85464 ref 0 bind 0 installed 0 sec used 42949672 sec expires 343597383 sec Action statistics: !!!Deficit -4, rta_len=20 cookie 36edda02ef494aeadd459ab481e30aca action order 2: pedit action pipe keys 5 index 3 ref 1 bind 1 installed 854 sec key #0 at ipv4+8: val 3f000000 mask 00ffffff key #1 at eth+4: val 0000fa16 mask ffff0000 key #2 at eth+8: val 3e6101ce mask 00000000 key #3 at eth+0: val fa163e46 mask 00000000 key #4 at eth+4: val 93ab0000 mask 0000ffff Action statistics: Sent 2574555460931 bytes 289844124 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 3: csum (iph, tcp) action pipe index 3 ref 1 bind 1 installed 854 sec Action statistics: Sent 2574555460931 bytes 289844124 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 4: gact action pass random type none 262150 val 0 index 85464 ref 0 bind 0 installed 0 sec used 42949672 sec expires 343597383 sec Action statistics: !!!Deficit -4, rta_len=20 cookie 36edda02ef494aeadd459ab481e30aca action order 5: gact action goto chain 58 random type none pass val 0 index 6 ref 1 bind 1 installed 854 sec Action statistics: Sent 2574555460931 bytes 289844124 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 36edda02ef494aeadd459ab481e30aca filter protocol ip pref 3 flower chain 58 filter protocol ip pref 3 flower chain 58 handle 0x1 src_mac fa:16:3e:61:01:ce eth_type ipv4 src_ip 33.33.33.108 ip_flags nofrag in_hw in_hw_count 1 action order 1: gact action pass random type none 262150 val 0 index 85466 ref 0 bind 0 installed 0 sec used 42949672 sec expires 343597383 sec Action statistics: !!!Deficit -4, rta_len=20 cookie 48cbed194e42727ebaf5bda799c518b3 action order 2: gact action goto chain 59 random type none pass val 0 index 3 ref 1 bind 1 installed 854 sec Action statistics: Sent 2574555461111 bytes 289844127 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 48cbed194e42727ebaf5bda799c518b3 filter protocol ip pref 3 flower chain 59 filter protocol ip pref 3 flower chain 59 handle 0x2 dst_mac fa:16:3e:46:93:ab src_mac fa:16:3e:61:01:ce eth_type ipv4 ip_proto tcp dst_ip 11.11.11.13/26 ip_flags nofrag in_hw in_hw_count 1 action order 1: gact action pass random type none 65560 val 0 index 85464 ref 0 bind 0 installed 0 sec used 42949672 sec expires 343597383 sec Action statistics: !!!Deficit -4, rta_len=20 cookie 1a1f726f8f4820d4e47565b109679b59 action order 2: vlan push id 101 protocol 802.1Q priority 0 pipe index 13 ref 1 bind 1 installed 854 sec Action statistics: Sent 2574555460931 bytes 289844124 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 3: mirred (Egress Redirect to device bond0) stolen index 19 ref 1 bind 1 installed 854 sec Action statistics: Sent 2574555460931 bytes 289844124 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 1a1f726f8f4820d4e47565b109679b59 ()[root@overcloud-computesriov-rack0-1 /]#
Can you please attach the OVN NB and SB databases to the BZ ? Thanks
OVN NB/SB: [root@overcloud-computesriov-rack0-0 heat-admin]# ovn-nbctl show switch 2708c2bc-3643-41fa-b579-e6dbaf6ec856 (neutron-c7bf34dd-aeab-4c5a-b2c0-29b16d9df946) (aka public) port provnet-2fd236a5-be1c-4789-8f47-39a9a114cb97 type: localnet addresses: ["unknown"] port 45a5dd5b-ff3c-4dc0-9801-1e6e8cae7e49 type: localport addresses: ["fa:16:3e:72:f5:42"] switch 104b5a00-cf6b-4e69-84ae-6d2581cc5ba1 (neutron-73dd37bf-5449-4466-a9ef-9bfdfa92e14a) (aka vlan_data) port 79da228c-e188-423a-93ed-d305cadd12b4 type: router router-port: lrp-79da228c-e188-423a-93ed-d305cadd12b4 port a7e51f61-8878-4500-ba7c-89c4b68c70fd (aka direct112) addresses: ["fa:16:3e:53:4d:e3 11.11.11.125"] port c7cb4fff-d031-4901-92ef-a49062576918 type: localport addresses: ["fa:16:3e:be:ff:8d 11.11.11.2"] port b6614061-7cd1-4abf-911b-ddf23b703103 (aka direct111) addresses: ["fa:16:3e:22:99:6e 11.11.11.38"] port provnet-345cf522-b7a3-4c1f-96af-f3b159be8cbe type: localnet tag: 101 addresses: ["unknown"] switch 8b5bcd4a-0804-4c09-8a44-5a95226a4c91 (neutron-8c933dc8-2baf-423b-ae3d-798d2f446e74) (aka gen_data) port 0b65cf78-f2cf-4ec9-9653-9c63765cc3a8 type: localport addresses: ["fa:16:3e:43:c5:52 33.33.33.2"] port 54657064-139c-4123-8cba-0da1e65095b8 type: router router-port: lrp-54657064-139c-4123-8cba-0da1e65095b8 port 564e9d5e-4db7-4527-838c-15725ee28208 (aka direct12) addresses: ["fa:16:3e:c1:c7:78 33.33.33.108"] port b9f97053-32c3-4836-a262-755b603c90cf (aka direct11) addresses: ["fa:16:3e:86:53:96 33.33.33.130"] switch 24ea9b8e-bef1-479c-983e-a79a42f106e9 (neutron-21b0556d-e5c6-4e36-808e-8c551b66295d) (aka test-net) port 4a88705b-11e8-4511-ae14-9bdb32cde53d type: localport addresses: ["fa:16:3e:52:ea:0e"] router 110e0cc6-e631-4d0c-bbcb-9ad5d48e0072 (neutron-df846c89-7eba-4de0-b721-1a9ee5c8c34a) (aka vlan_router) port lrp-79da228c-e188-423a-93ed-d305cadd12b4 mac: "fa:16:3e:54:15:52" networks: ["11.11.11.232/24"] gateway chassis: [ce414fad-ee29-47b7-9313-94f8f7c437e5 bc2891c9-fa0a-408c-842c-a415d1461a85 3ff18497-91f6-47b9-86f7-7a5f4d33979d] port lrp-54657064-139c-4123-8cba-0da1e65095b8 mac: "fa:16:3e:f0:b1:9e" networks: ["33.33.33.1/24"] nat 2ba7cb52-5df5-406c-9f7d-cfeba37bf0af external ip: "11.11.11.13" logical ip: "33.33.33.130" type: "dnat_and_snat" nat 33ec8d2a-ccce-4688-8a01-83792c6c0418 external ip: "11.11.11.232" logical ip: "33.33.33.0/24" type: "snat" nat aef7016d-1e3c-4bf9-bfa6-cfc2677e167a external ip: "11.11.11.115" logical ip: "33.33.33.108" type: "dnat_and_snat" router 9bfa38f2-be6b-4db8-b33e-206babd2a365 (neutron-fb746e48-7b18-4608-9875-510e3aa9d88c) (aka public_router) [root@overcloud-computesriov-rack0-0 heat-admin]# ovn-sbctl show Chassis "3a155514-bb86-4316-a838-71585eeb733a" hostname: overcloud-computesriov-rack0-0.localdomain Encap geneve ip: "172.16.0.20" options: {csum="true"} Port_Binding "b9f97053-32c3-4836-a262-755b603c90cf" Chassis "dccff23d-0d97-4a89-a3fc-c596d3e91e2b" hostname: overcloud-computesriov-rack1-0.localdomain Encap geneve ip: "172.16.1.54" options: {csum="true"} Chassis "6626ffbe-47aa-41ef-8842-e6330d0dcffc" hostname: overcloud-computesriov-rack0-1.localdomain Encap geneve ip: "172.16.0.53" options: {csum="true"} Port_Binding "564e9d5e-4db7-4527-838c-15725ee28208" Chassis "bc2891c9-fa0a-408c-842c-a415d1461a85" hostname: overcloud-controller-0.localdomain Encap geneve ip: "172.16.0.172" options: {csum="true"} Chassis "ce414fad-ee29-47b7-9313-94f8f7c437e5" hostname: overcloud-controller-1.localdomain Encap geneve ip: "172.16.0.101" options: {csum="true"} Chassis "3ff18497-91f6-47b9-86f7-7a5f4d33979d" hostname: overcloud-controller-2.localdomain Encap geneve ip: "172.16.0.168" options: {csum="true"} Port_Binding cr-lrp-79da228c-e188-423a-93ed-d305cadd12b4
I saw the same behaviour when sending traffic between instances floating IP: $ openstack server list +--------------------------------------+--------+--------+-------------------------------------+-------+--------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------+--------+-------------------------------------+-------+--------+ | 2798889d-a994-4c7e-bbe8-897a2b61761c | trex12 | ACTIVE | gen_data=33.33.33.108, 11.11.11.115 | perf | | | bb6f6ce0-b029-4d6e-a918-aa3fa609b146 | trex11 | ACTIVE | gen_data=33.33.33.130, 11.11.11.13 | perf | | +--------------------------------------+--------+--------+-------------------------------------+-------+--------+ And also when I used a floating IP from the flat ("public") network assigned to the instance to communicate with external server.
(In reply to Itai Levy from comment #0) > Matching TC rules: > > )[root@overcloud-computesriov-rack0-1 /]# > ()[root@overcloud-computesriov-rack0-1 /]# tc -s filter show dev ens1f0_9 > ingress > filter protocol ip pref 3 flower chain 0 > filter protocol ip pref 3 flower chain 0 handle 0x1 > dst_mac fa:16:3e:1d:eb:d7 > src_mac fa:16:3e:c1:c7:78 > eth_type ipv4 > ip_proto tcp > src_ip 33.33.33.108 > ip_flags nofrag > in_hw in_hw_count 1 > action order 1: gact action 12 > random type none 262150 val 0 > index 85466 ref 0 bind 0 installed 0 sec used 42949672 sec expires > 343597383 sec > Action statistics: > !!!Deficit -4, rta_len=20 > > cookie 54fb36d0ca48a2743a8406b55be5de6f This is breaking the dump of stats. Which tc version are you using? I'm wondering if we have a kernel or iproute bug here.
Hi Marcelo, See below full output I recollected - check out zone2 (chain 32 handle 0x3 in TC) I used the MOFED TC to collect the output (tc utility, iproute2-5.11.0), I think last time I used the TC from the inbox nova_compute container. OVS DP: recirc_id(0),in_port(ens1f0_15),eth(src=fa:16:3e:86:53:96,dst=fa:16:3e:43:c5:52),eth_type(0x0800),ipv4(src=33.33.33.130,proto=6,frag=no), packets:37, bytes:7230, used:0.360s, actions:ct(zone=1),recirc(0x20) recirc_id(0),in_port(ens1f0_15),eth(src=fa:16:3e:86:53:96,dst=fa:16:3e:f0:b1:9e),eth_type(0x0800),ipv4(src=33.33.33.130,proto=6,frag=no), packets:11231222, bytes:100422727635, used:0.360s, actions:ct(zone=1),recirc(0x20) ct_state(+est-rel+rpl-inv+trk),ct_label(0/0x1),recirc_id(0x20),in_port(ens1f0_15),eth(src=fa:16:3e:86:53:96,dst=fa:16:3e:43:c5:52),eth_type(0x0800),ipv4(dst=33.33.33.2/255.255.255.254,proto=6,frag=no), packets:37, bytes:7230, used:0.360s, actions:ct(zone=9),recirc(0x21) ct_state(+est-rel-rpl-inv+trk),ct_label(0/0x1),recirc_id(0x20),in_port(ens1f0_15),eth(src=fa:16:3e:86:53:96,dst=fa:16:3e:f0:b1:9e),eth_type(0x0800),ipv4(src=33.33.33.130,dst=11.11.11.115,proto=6,ttl=64,frag=no),tcp(dst=5101), packets:11346742, bytes:100793166480, used:0.000s, actions:ct_clear,set(eth(src=fa:16:3e:3a:50:7c,dst=fa:16:3e:d8:c6:6c)),set(ipv4(ttl=63)),ct(zone=2,nat),recirc(0x22) recirc_id(0x22),in_port(ens1f0_15),eth(src=fa:16:3e:3a:50:7c),eth_type(0x0800),ipv4(src=33.33.33.130,frag=no), packets:11346743, bytes:100793166540, used:0.000s, actions:ct(commit,zone=5,nat(src=11.11.11.13)),recirc(0x23) ct_state(+est-rel-rpl-inv+trk),ct_label(0/0x1),recirc_id(0x23),in_port(ens1f0_15),eth(src=fa:16:3e:3a:50:7c,dst=fa:16:3e:d8:c6:6c),eth_type(0x0800),ipv4(dst=11.11.11.64/255.255.255.192,proto=6,frag=no), packets:11346742, bytes:100793166480, used:0.000s, actions:ct_clear,push_vlan(vid=101,pcp=0),bond0 CT: # cat /proc/net/nf_conntrack | grep "zone=1" ipv4 2 tcp 6 src=33.33.33.2 dst=33.33.33.130 sport=55144 dport=22 src=33.33.33.130 dst=33.33.33.2 sport=22 dport=55144 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=1 use=3 ipv4 2 tcp 6 src=33.33.33.130 dst=11.11.11.115 sport=45324 dport=5101 src=11.11.11.115 dst=33.33.33.130 sport=5101 dport=45324 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=1 use=57279084 [root@overcloud-computesriov-rack0-0 heat-admin]# cat /proc/net/nf_conntrack | grep "zone=2" [root@overcloud-computesriov-rack0-0 heat-admin]# cat /proc/net/nf_conntrack | grep "zone=5" ipv4 2 tcp 6 src=33.33.33.130 dst=11.11.11.115 sport=45324 dport=5101 src=11.11.11.115 dst=11.11.11.13 sport=5101 dport=45324 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=5 use=3 [root@overcloud-computesriov-rack0-0 heat-admin]# cat /proc/net/nf_conntrack | grep "zone=9" ipv4 2 tcp 6 src=33.33.33.2 dst=33.33.33.130 sport=55144 dport=22 src=33.33.33.130 dst=33.33.33.2 sport=22 dport=55144 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=9 use=3 TC: root@overcloud-computesriov-rack0-0 heat-admin]# /opt/mellanox/iproute2/sbin/tc -s filter show dev ens1f0_15 ingress filter protocol ip pref 6 flower chain 0 filter protocol ip pref 6 flower chain 0 handle 0x1 dst_mac fa:16:3e:43:c5:52 src_mac fa:16:3e:86:53:96 eth_type ipv4 ip_proto tcp src_ip 33.33.33.130 ip_flags nofrag in_hw in_hw_count 1 action order 1: ct zone 1 pipe index 3 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 87864 bytes 452 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 87864 bytes 452 pkt backlog 0b 0p requeues 0 cookie a4cd88803843f34fe0e9c39e588ebcb8 used_hw_stats delayed action order 2: gact action goto chain 32 random type none pass val 0 index 3 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 87864 bytes 452 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 87864 bytes 452 pkt backlog 0b 0p requeues 0 cookie a4cd88803843f34fe0e9c39e588ebcb8 used_hw_stats delayed filter protocol ip pref 6 flower chain 0 handle 0x2 dst_mac fa:16:3e:f0:b1:9e src_mac fa:16:3e:86:53:96 eth_type ipv4 ip_proto tcp src_ip 33.33.33.130 ip_flags nofrag in_hw in_hw_count 1 action order 1: ct zone 1 pipe index 5 ref 1 bind 1 installed 434 sec used 0 sec firstused 433 sec Action statistics: Sent 1286031408501 bytes 143809951 pkt (dropped 0, overlimits 0 requeues 0) Sent software 342 bytes 5 pkt Sent hardware 1286031408159 bytes 143809946 pkt backlog 0b 0p requeues 0 cookie 91287ed546403493b98e5cb1a364eb5c used_hw_stats delayed action order 2: gact action goto chain 32 random type none pass val 0 index 5 ref 1 bind 1 installed 434 sec used 0 sec firstused 433 sec Action statistics: Sent 1286031408501 bytes 143809951 pkt (dropped 0, overlimits 0 requeues 0) Sent software 342 bytes 5 pkt Sent hardware 1286031408159 bytes 143809946 pkt backlog 0b 0p requeues 0 cookie 91287ed546403493b98e5cb1a364eb5c used_hw_stats delayed filter protocol ip pref 6 flower chain 32 filter protocol ip pref 6 flower chain 32 handle 0x1 dst_mac fa:16:3e:43:c5:52 src_mac fa:16:3e:86:53:96 eth_type ipv4 ip_proto tcp dst_ip 33.33.33.2/31 ip_flags nofrag ct_state +trk+est-inv+rpl ct_label 00000000000000000000000000000000/010000000000000000000000000000 in_hw in_hw_count 1 action order 1: ct zone 9 pipe index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 87864 bytes 452 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 87864 bytes 452 pkt backlog 0b 0p requeues 0 cookie 2435c61d4245e31fac134a846feba6dd used_hw_stats delayed action order 2: gact action goto chain 33 random type none pass val 0 index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 87864 bytes 452 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 87864 bytes 452 pkt backlog 0b 0p requeues 0 cookie 2435c61d4245e31fac134a846feba6dd used_hw_stats delayed filter protocol ip pref 6 flower chain 32 handle 0x2 dst_mac fa:16:3e:f0:b1:9e src_mac fa:16:3e:86:53:96 eth_type ipv4 ip_proto tcp ip_ttl 64 dst_ip 11.11.11.115 src_ip 33.33.33.130 dst_port 5101 ip_flags nofrag ct_state +trk+new-est-inv-rpl ct_label 00000000000000000000000000000000/010000000000000000000000000000 not_in_hw action order 1: ct commit zone 1 label 00000000000000000000000000000000/01000000000000000000000000000000 pipe index 6 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec Action statistics: Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie d951fb4b724561b98a26fca2b689e4f3 action order 2: ct clear pipe index 7 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec Action statistics: Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie d951fb4b724561b98a26fca2b689e4f3 action order 3: pedit action pipe keys 5 index 1 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec key #0 at ipv4+8: val 3f000000 mask 00ffffff key #1 at eth+4: val 0000fa16 mask ffff0000 key #2 at eth+8: val 3e3a507c mask 00000000 key #3 at eth+0: val fa163ed8 mask 00000000 key #4 at eth+4: val c66c0000 mask 0000ffff Action statistics: Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 4: csum (iph, tcp) action pipe index 1 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec Action statistics: Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu action order 5: ct zone 2 nat pipe index 8 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec Action statistics: Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie d951fb4b724561b98a26fca2b689e4f3 action order 6: gact action goto chain 34 random type none pass val 0 index 6 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec Action statistics: Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie d951fb4b724561b98a26fca2b689e4f3 filter protocol ip pref 6 flower chain 32 handle 0x3 dst_mac fa:16:3e:f0:b1:9e src_mac fa:16:3e:86:53:96 eth_type ipv4 ip_proto tcp ip_ttl 64 dst_ip 11.11.11.115 src_ip 33.33.33.130 dst_port 5101 ip_flags nofrag ct_state +trk+est-inv-rpl ct_label 00000000000000000000000000000000/010000000000000000000000000000 in_hw in_hw_count 1 action order 1: ct clear pipe index 15 ref 1 bind 1 installed 433 sec firstused 433 sec Action statistics: Sent 1277977658708 bytes 143848875 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 899f102c8a420d73499008bcddf19829 used_hw_stats delayed action order 2: pedit action pipe keys 5 index 3 ref 1 bind 1 installed 433 sec firstused 433 sec key #0 at ipv4+8: val 3f000000 mask 00ffffff key #1 at eth+4: val 0000fa16 mask ffff0000 key #2 at eth+8: val 3e3a507c mask 00000000 key #3 at eth+0: val fa163ed8 mask 00000000 key #4 at eth+4: val c66c0000 mask 0000ffff Action statistics: Sent 1277977658708 bytes 143848875 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 3: csum (iph, tcp) action pipe index 3 ref 1 bind 1 installed 433 sec firstused 433 sec Action statistics: Sent 1277977658708 bytes 143848875 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu used_hw_stats delayed action order 4: ct zone 2 nat pipe index 16 ref 1 bind 1 installed 433 sec firstused 433 sec Action statistics: Sent 1277977658708 bytes 143848875 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 899f102c8a420d73499008bcddf19829 used_hw_stats delayed action order 5: gact action goto chain 34 random type none pass val 0 index 10 ref 1 bind 1 installed 433 sec firstused 433 sec Action statistics: Sent 1277977658708 bytes 143848875 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 899f102c8a420d73499008bcddf19829 used_hw_stats delayed filter protocol ip pref 6 flower chain 33 filter protocol ip pref 6 flower chain 33 handle 0x1 dst_mac fa:16:3e:43:c5:52/01:00:00:00:00:00 eth_type ipv4 ip_flags nofrag ct_state +trk+est-inv+rpl ct_label 00000000000000000000000000000000/010000000000000000000000000000 not_in_hw action order 1: mirred (Egress Redirect to device tapbd693cae-90) stolen index 13 ref 1 bind 1 installed 434 sec used 0 sec firstused 434 sec Action statistics: Sent 81536 bytes 452 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 828edae9894ef835a15fc8b07208db9f no_percpu filter protocol ip pref 6 flower chain 34 filter protocol ip pref 6 flower chain 34 handle 0x1 src_mac fa:16:3e:3a:50:7c eth_type ipv4 src_ip 33.33.33.130 ip_flags nofrag in_hw in_hw_count 1 action order 1: ct commit zone 5 nat src addr 11.11.11.13 pipe index 9 ref 1 bind 1 installed 434 sec firstused 433 sec Action statistics: Sent 1277977721050 bytes 143848883 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie ea20be5a17401bab90bdab8946cbd25a used_hw_stats delayed action order 2: gact action goto chain 35 random type none pass val 0 index 7 ref 1 bind 1 installed 434 sec firstused 433 sec Action statistics: Sent 1277977721050 bytes 143848883 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie ea20be5a17401bab90bdab8946cbd25a used_hw_stats delayed filter protocol ip pref 6 flower chain 35 filter protocol ip pref 6 flower chain 35 handle 0x1 dst_mac fa:16:3e:d8:c6:6c src_mac fa:16:3e:3a:50:7c eth_type ipv4 ip_proto tcp dst_ip 11.11.11.115/26 ip_flags nofrag ct_state +trk+new-est-inv-rpl ct_label 00000000000000000000000000000000/010000000000000000000000000000 not_in_hw action order 1: ct clear pipe index 10 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec Action statistics: Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 0fb5c4be684e7504ea0488a84b1c0aa2 action order 2: vlan push id 101 protocol 802.1Q priority 0 pipe index 7 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec Action statistics: Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu action order 3: mirred (Egress Redirect to device bond0) stolen index 14 ref 1 bind 1 installed 434 sec used 433 sec firstused 433 sec Action statistics: Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 0fb5c4be684e7504ea0488a84b1c0aa2 no_percpu filter protocol ip pref 6 flower chain 35 handle 0x2 dst_mac fa:16:3e:d8:c6:6c src_mac fa:16:3e:3a:50:7c eth_type ipv4 ip_proto tcp dst_ip 11.11.11.115/26 ip_flags nofrag ct_state +trk+est-inv-rpl ct_label 00000000000000000000000000000000/010000000000000000000000000000 in_hw in_hw_count 1 action order 1: ct clear pipe index 20 ref 1 bind 1 installed 433 sec firstused 433 sec Action statistics: Sent 1277977729932 bytes 143848883 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 2394f0eaf747620083ade288d9f9c022 used_hw_stats delayed action order 2: vlan push id 101 protocol 802.1Q priority 0 pipe index 9 ref 1 bind 1 installed 433 sec firstused 433 sec Action statistics: Sent 1277977729932 bytes 143848883 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu used_hw_stats delayed action order 3: mirred (Egress Redirect to device bond0) stolen index 16 ref 1 bind 1 installed 433 sec firstused 433 sec Action statistics: Sent 1277977667650 bytes 143848876 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 2394f0eaf747620083ade288d9f9c022 no_percpu used_hw_stats delayed [root@overcloud-computesriov-rack0-0
Guys, Can you confirm that for DVR DNAT workload, the current OVN implementation is to always send the packet via 2 zones (SNAT, DNAT) in its pipeline while committing the connection only in one of them?
Adding needinfo regarding comment #6.
(In reply to Itai Levy from comment #6) > Guys, > Can you confirm that for DVR DNAT workload, the current OVN implementation > is to always send the packet via 2 zones (SNAT, DNAT) in its pipeline while > committing the connection only in one of them? Hi Itai, Does this seem familiar? https://bugzilla.redhat.com/show_bug.cgi?id=1974585#c7
Hi Marcelo, Yes this BZ seems to include the same symptom that only one of the CT zones used in OVS rules is committed in the kernel (preventing HW offload). Please take a look on this patch: https://patchwork.ozlabs.org/project/openvswitch/cover/1570154179-14525-1-git-send-email-ankur.sharma@nutanix.com/ It is introducing a stateless handling for DNAT traffic (only). CT will not be used for the NAT operation in order to prevent ddos flooding CT with unneeded dant entries. This makes a sense to me, and in addition I can confirm it is solving the DNAT offload issue caused by the current ovn CT-dnat implementation. If it makes sense to you as well, can you consider exposing a user-friendly option to work in this mode? Looks like a simple straightforward implementation... Itai
Hi Itai, I need to discuss that with OSP team. I myself am a bit hesitant with it because going stateless often sounds like undoing work and (also) often leaves some holes behind. But lets see. In this case, for example, it makes the OSP use case to work, but other users that don't want to go stateless would still have the bug there (if we go along with just this knob). The final version of the patch from comment #10: https://patchwork.ozlabs.org/project/openvswitch/patch/1572571718-83139-2-git-send-email-ankur.sharma@nutanix.com/ (it is accepted in OVN since Nov 2019). @Haresh, I need to step out now, but lets talk about it. I'll ping you when I'm back.
(In reply to Marcelo Ricardo Leitner from comment #11) > Hi Itai, > > I need to discuss that with OSP team. I myself am a bit hesitant with it > because going stateless often sounds like undoing work and (also) often > leaves some holes behind. But lets see. > > In this case, for example, it makes the OSP use case to work, but other > users that don't want to go stateless would still have the bug there (if we > go along with just this knob). > > The final version of the patch from comment #10: > https://patchwork.ozlabs.org/project/openvswitch/patch/1572571718-83139-2- > git-send-email-ankur.sharma/ > (it is accepted in OVN since Nov 2019). > > @Haresh, I need to step out now, but lets talk about it. I'll ping you when > I'm back. Hi Marcelo, Itai, So, i tried floating ip attachment with switchdev interface which belongs to geneve tenant network. I have RHOSP 16.2 (RHOS-16.2-RHEL-8-20210728.n.2). With kernel-modules-extra, I do traffic egressing is not offloaded. ufid:a9539e9d-23dc-4ead-843c-be13b437544b, recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(enp4s0f1_0),skb_mark(0/0),ct_state(0/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),eth(src=f8:f2:1e:03:bf:f6,dst=fa:16:3e:1f:52:ca),eth_type(0x0800),ipv4(src=7.7.7.32/255.255.255.224,dst=10.10.54.100,proto=0/0,tos=0/0x3,ttl=64,frag=no), packets:1772, bytes:173656, used:0.907s, dp:ovs, actions:ct_clear,set(tunnel(tun_id=0x3,dst=10.10.51.171,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x40002}),flags(df|csum|key))),set(eth(src=fa:16:3e:0b:8c:4b,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),genev_sys_6081 Whereas ingressing is offloaded. ufid:2123667b-83ef-4614-8bf2-4df268839daf, skb_priority(0/0),tunnel(tun_id=0x4,src=10.10.51.171,dst=10.10.51.150,ttl=0/0,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x30002/0x7fffffff}),flags(+key)),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(genev_sys_6081),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:1f:52:ca,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:5907, bytes:578886, used:0.500s, offloaded:yes, dp:tc, actions:enp4s0f1_0 With kernel-modules-extra, I do see both directions traffic offloaded. ufid:952cf16e-f7d9-4313-8534-b79d286e247d, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0),dp_hash(0/0),in_port(enp4s0f1_0),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f6,dst=fa:16:3e:1f:52:ca),eth_type(0x0800),ipv4(src=7.7.7.32/255.255.255.224,dst=192.0.0.0/224.0.0.0,proto=17,tos=0/0x3,ttl=64,frag=no),udp(src=0/0,dst=0/0), packets:8, bytes:520, used:1.390s, offloaded:yes, dp:tc, actions:ct_clear,set(tunnel(tun_id=0x3,dst=10.10.51.171,ttl=64,tp_dst=6081,key6(bad key length 1, expected 0)(01)geneve({class=0x102,type=0x80,len=4,0x40002}),flags(key))),set(eth(src=fa:16:3e:0b:8c:4b,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),genev_sys_6081 ufid:365b5b04-173c-48a4-b685-6b83c0ba10d4, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0),dp_hash(0/0),in_port(enp4s0f1_0),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f6,dst=fa:16:3e:1f:52:ca),eth_type(0x0800),ipv4(src=7.7.7.32/255.255.255.224,dst=10.10.54.100,proto=1,tos=0/0x3,ttl=64,frag=no),icmp(type=0/0,code=0/0), packets:145, bytes:12180, used:0.930s, offloaded:yes, dp:tc, actions:ct_clear,set(tunnel(tun_id=0x3,dst=10.10.51.171,ttl=64,tp_dst=6081,key6(bad key length 1, expected 0)(01)geneve({class=0x102,type=0x80,len=4,0x40002}),flags(key))),set(eth(src=fa:16:3e:0b:8c:4b,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),genev_sys_6081 ufid:f294c7a7-f5c6-4ec3-b349-48d8377f98c6, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0),dp_hash(0/0),in_port(enp4s0f1_0),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f6,dst=fa:16:3e:1f:52:ca),eth_type(0x0800),ipv4(src=7.7.7.32/255.255.255.224,dst=128.0.0.0/192.0.0.0,proto=17,tos=0/0x3,ttl=64,frag=no),udp(src=0/0,dst=0/0), packets:6, bytes:390, used:6.400s, offloaded:yes, dp:tc, actions:ct_clear,set(tunnel(tun_id=0x3,dst=10.10.51.171,ttl=64,tp_dst=6081,key6(bad key length 1, expected 0)(01)geneve({class=0x102,type=0x80,len=4,0x40002}),flags(key))),set(eth(src=fa:16:3e:0b:8c:4b,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),genev_sys_6081 ufid:57d4aa27-f7fa-45e2-a6d4-a0d67330fb93, skb_priority(0/0),tunnel(tun_id=0x4,src=10.10.51.171,dst=10.10.51.150,ttl=0/0,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x30002/0x7fffffff}),flags(+key)),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(genev_sys_6081),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:1f:52:ca,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:160, bytes:15680, used:0.420s, offloaded:yes, dp:tc, actions:enp4s0f1_0 One thing to note here is, this is DVR disabled, NAT happens at controller itself. Did you tried DVR enabling on your deployment?
Hi Haresh, yes, as I wrote in the description I used a DVR deployment where FIP DNAT happens on the compute node itself... Marcelo, FYI - the stateless FIP DNAT patch was taken by OpenStack community https://review.opendev.org/c/openstack/neutron/+/804807 Itai
Hi Itai, I deployed distributed NAT and tried same thing (ICMP traffic, tcp traffic). ICMP: Egresss: ufid:9a58f50a-66c0-4983-849d-d98a4e889af6, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f2,dst=fa:16:3e:84:64:93),eth_type(0x0800),ipv4(src=7.7.7.68,dst=0.0.0.0/0.0.0.0,proto=1,tos=0/0,ttl=0/0,frag=no),icmp(type=0/0,code=0/0), packets:100, bytes:8400, used:0.760s, offloaded:yes, dp:tc, actions:ct(zone=9),recirc(0x36) ufid:ab48baff-1eb3-4d14-8a1c-e1e13e982c69, skb_priority(0/0),skb_mark(0/0),ct_state(0x2a/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x36),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f2,dst=fa:16:3e:84:64:93),eth_type(0x0800),ipv4(src=7.7.7.68,dst=10.10.54.100,proto=1,tos=0/0,ttl=64,frag=no),icmp(type=0/0,code=0/0), packets:100, bytes:8400, used:0.760s, offloaded:yes, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:fd:50:3c,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),ct(zone=1,nat),recirc(0x37) ufid:e76a1233-edbe-4ebd-a872-fe9c2fe6e488, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x37),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:fd:50:3c,dst=72:f9:61:09:b9:79),eth_type(0x0800),ipv4(src=8.0.0.0/248.0.0.0,dst=10.10.54.0/255.255.255.128,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:100, bytes:8400, used:0.760s, offloaded:yes, dp:tc, actions:ct_clear,push_vlan(vid=405,pcp=0),mx-bond Ingress: ufid:9f92ef2b-3e0a-49de-a800-ba4c267d3563, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=72:f9:61:09:b9:79,dst=fa:16:3e:fd:50:3c),eth_type(0x8100),vlan(vid=405,pcp=0),encap(eth_type(0x0800),ipv4(src=10.10.54.0/255.255.255.128,dst=10.10.54.129,proto=1,tos=0/0,ttl=64,frag=no),icmp(type=0/0,code=0/0)), packets:100, bytes:8400, used:0.760s, offloaded:yes, dp:tc, actions:ct_clear,pop_vlan,ct(zone=4,nat),recirc(0x33) ufid:61396550-4f08-4dfd-8efc-e44a5bea8575, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x33),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=10.10.54.129,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:100, bytes:8400, used:0.760s, offloaded:yes, dp:tc, actions:ct(commit,zone=1,nat(dst=7.7.7.68)),recirc(0x34) ufid:dd5fa327-2adf-42e8-939e-ba65fc8d5ef9, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x34),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=72:f9:61:09:b9:79,dst=fa:16:3e:fd:50:3c),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=7.7.7.68,proto=1,tos=0/0,ttl=64,frag=no),icmp(type=0/0,code=0/0), packets:99, bytes:8316, used:0.760s, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:84:64:93,dst=f8:f2:1e:03:bf:f2)),set(ipv4(ttl=63)),ct(zone=9),recirc(0x35) ufid:94b24ff4-c02c-4469-a4b4-e6d38757ed90, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x35),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:84:64:93,dst=f8:f2:1e:03:bf:f2),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=7.7.7.68,proto=1,tos=0/0,ttl=0/0,frag=no),icmp(type=0/0,code=0/0), packets:99, bytes:8316, used:0.760s, offloaded:yes, dp:tc, actions:enp4s0f0_2 As you see, dp rule in ingress direction with recirc_id(0x34) is not offloaded. Rest all are offloaded. TCP: Egress: ufid:0f233985-485a-45ac-80f2-16a437e70d53, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f2,dst=fa:16:3e:84:64:93),eth_type(0x0800),ipv4(src=7.7.7.68,dst=0.0.0.0/0.0.0.0,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=0/0), packets:1000278, bytes:1448505492, used:0.290s, offloaded:yes, dp:tc, actions:ct(zone=9),recirc(0x5f) ufid:7964ee11-696c-4c01-9227-05f8ab0ead44, skb_priority(0/0),skb_mark(0/0),ct_state(0x2a/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x5f),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=f8:f2:1e:03:bf:f2,dst=fa:16:3e:84:64:93),eth_type(0x0800),ipv4(src=7.7.7.68,dst=10.10.54.100,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=0/0), packets:1035630, bytes:1434466076, used:0.000s, offloaded:yes, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:fd:50:3c,dst=72:f9:61:09:b9:79)),set(ipv4(ttl=63)),ct(zone=1,nat),recirc(0x60) ufid:16a13e6f-49cd-42b3-85c7-40ea3ffe857e, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x60),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:fd:50:3c,dst=72:f9:61:09:b9:79),eth_type(0x0800),ipv4(src=8.0.0.0/248.0.0.0,dst=10.10.54.0/255.255.255.128,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:1035630, bytes:1434466076, used:0.000s, offloaded:yes, dp:tc, actions:ct_clear,push_vlan(vid=405,pcp=0),mx-bond Ingress: ufid:e1ea623c-64f4-4fd8-a6c4-6ef45c792ad9, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=72:f9:61:09:b9:79,dst=fa:16:3e:fd:50:3c),eth_type(0x8100),vlan(vid=405,pcp=0),encap(eth_type(0x0800),ipv4(src=10.10.54.0/255.255.255.128,dst=10.10.54.129,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=0/0)), packets:1351777, bytes:85968988, used:0.000s, offloaded:yes, dp:tc, actions:ct_clear,pop_vlan,ct(zone=4,nat),recirc(0x62) ufid:8ccf45b5-c79f-4fae-a548-91ff15d59e8e, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x62),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=10.10.54.129,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:1351778, bytes:85969040, used:0.000s, offloaded:yes, dp:tc, actions:ct(commit,zone=1,nat(dst=7.7.7.68)),recirc(0x63) ufid:a0bc0880-e449-4a04-9dfa-ee8776b71577, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x63),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=72:f9:61:09:b9:79,dst=fa:16:3e:fd:50:3c),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=7.7.7.68,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=0/0), packets:1351780, bytes:85969176, used:0.000s, dp:tc, actions:ct_clear,set(eth(src=fa:16:3e:84:64:93,dst=f8:f2:1e:03:bf:f2)),set(ipv4(ttl=63)),ct(zone=9),recirc(0x64) ufid:2de5ad96-dd4e-4307-8819-aa35d69d8731, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x3e),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x64),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:84:64:93,dst=f8:f2:1e:03:bf:f2),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=7.7.7.68,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=22), packets:1351779, bytes:85967804, used:0.000s, offloaded:yes, dp:tc, actions:enp4s0f0_2 Dp rule in ingress direction with recird_id 0x63 not offloaded. I dont see any "ghost" rule as such here. Thanks
Hi Haresh, As you can see in the description, I am using the following components in my deployment: • RHEL8.4 with kernel 4.18.0-305.7.1.el8_4.x86_64 • MLNX_OFED_LINUX-5.4-0.5 • openvswitch 2.14.1 (MOFED OVS) Your system will probably behave differently. Itai
Hi Itai, This mixture wont be conclusive from OSP perspective (as customer would deploy what shipped with RHSO16.2), not denying your issue is not relevant though. Do you have access to latest 16.2 RC? I am using that compose if you can try that. We have RHEL 8.4 with kernel 4.18.0-305.12.1.el8_4.x86_64 (And mlx5_core), ovs version is openvswitch2.15-2.15.0-26.el8fdp.x86_64 in RHOSP16.2 RC. Thanks
Hi Haresh, 1. Unfortunately I dont have access to the latest 16.2 RC. Will appreciate if you can provide it to me. 2. The components we used might be different, and even though you dont see the exact ovs dump as I do, the result is the same - DNAT traffic is not offloaded. You can verify by checking if zone9 seen in your dumps above is committed in /proc/net/nf_conntrack output. Itai
Hi Marcelo, An update from my side. It seems like the stateless DNAT is not a valid solution for offloading DNAT FIP traffic... Looking into it more carefully, the traffic is offloaded only in a single direction. As we are back to square one, the current DNAT conntrack implementation might be needed to be reconsidered in order to find a way to allow a proper HW offload of this important use case. Itai
Itay mentioned that this patch is needed for this BZ: https://review.opendev.org/c/openstack/neutron/+/804807
Hi Alaa, Itai, Now I am confused. The patch on comment #21 is using stateless NAT but comment #19 indicates that it isn't a good way forward? I see Moshe reviewed the patch. Does this patch solves the single direction issue mentioned in comment #19? @Haresh, thoughts on the patch? Thanks.
(In reply to Marcelo Ricardo Leitner from comment #22) > Hi Alaa, Itai, > > Now I am confused. The patch on comment #21 is using stateless NAT but > comment #19 indicates that it isn't a good way forward? > I see Moshe reviewed the patch. > Does this patch solves the single direction issue mentioned in comment #19? > > @Haresh, thoughts on the patch? This patch looks like OSP side implementation of below. https://patchwork.ozlabs.org/project/openvswitch/patch/1572571718-83139-2-git-send-email-ankur.sharma@nutanix.com/ So, when we attach FIP to instance, this patch makes all FIP stateless and ovn configure snat_dnat as stateless rule. something like below. ovn-nbctl --stateless lr-nat-add 473faace-478c-4841-b438-84c3ebaaa528 dnat_and_snat 10.10.54.129 7.7.7.68 4f408c64-7474-4f58-81ec-3f8abba72562 fa:16:3e:fd:50:3c However, this is not offloading traffic against egress direction right now. You can check this Bz#2004995. Thanks
Hi, To clarify: 1. Original issue is that FIP is not being offloaded due to ovn/ct/fip stateful implementation 2. in order to "workaround" #1, we suggested to use a patch to allow making FIP traffic stateless eliminating CT nat part, however it seems like here as well offload is working only for one direction so now we are back to #1, the question is - is it possible to reconsider the ovn/ct fip implementation which using 2 CT zones while committing only one of them. Itai
Hi Terry, anything else you need here? Itai's comment above summarizes the situation here.
Some updates on the progress so far: 1. We have a fix in OVN to address this issue. The idea is to just use one zone for NATting in distributed routers. We would still need to use 2 zones if the packet involves both SNAT and DNAT (for hairpin traffic whose source and destination are in the some compute node). 2. Abhiram has tested this in his environment and the traffic for the scenarion mentioned in this BZ is getting offloaded. 3. The initial patch is here - https://github.com/numansiddique/ovn/tree/ct_nat_czone_v1/p2 Patch needs some work before it is posted for review.
Hi Numan, Thanks for the update. Can you please elaborate on the "SNAT + DNAT" use case? what exactly is the traffic flow here? how come snat (many to one NAT) and dnat (one to one NAT) are used at the same time? Itai
Hi Itai, This can happen in the following scenario Lets say you have a VM/pod - LP1 (10.0.0.3) on logical switch sw1 and it has a dnat_and_snat configured with external ip (floating ip in openstack terminology) 172.16.0.110 And there is another VM/pod - LP2 (20.0.0.3) on logical switch sw2 and it has floating ip - 172.16.0.120 and both are connected to the same router. Lets assume both these VMs/pods are hosted on the same compute node. Suppose LP1 sends a pkt to the floating ip of LP2 i.e ip.src = 10.0.0.3, ip.dst = 172.16.0.120 In this case, first the source IP will be SNATted to 172.16.0.110 in SNAT ct zone. So the pkt becomes - (ip.src = 172.16.0.110, ip.dst = 172.16.0.120) And then the pkt is DNATted to 20.0.0.3 in DNAT ct zone (ip.src = 172.16.0.110, ip.dst = 20.0.0.3) and it is delivered to LP2. What I meant in my comment was we need 2 zones for this scenario. Otherwise just one ct zone is enough. Thanks Numan
Thanks Numan, well described and clarified. Itai
Patches posted for review - https://patchwork.ozlabs.org/project/ovn/list/?series=270616
Latest update on test with the patch 1) If flow_steering_mode is set to ‘dmfs’ Stateful NAT offload works fine without issue [Verified on 4.18.0-305.22.1.el8] 2) If ‘smfs’ is set then observing the csum failure issue. [Earlier suspicion was we might be hitting BZ --> 1974356 - s_pf0vf2: hw csum failure for mlx5 ] . So, I feel we might be dependent on flow_steering_mode which is making the difference here rather than on BZ 1974356
v3 was accepted, https://patchwork.ozlabs.org/project/ovn/list/?series=272942&state=*
Is this merged downstream already perhaps?
When can we close this bz?
(In reply to Marcelo Ricardo Leitner from comment #45) > When can we close this bz? Its clone Bz#2024599 for OSP has been verified in 16.2.3 FDP folks can verify this in latest ovn. Thanks
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days