Created attachment 1783946 [details] full ovs flow dump Description of problem: Hostnetwork pod to external traffic is not working with ovn-kubernetes, reply packet gets dropped. Version-Release number of selected component (if applicable): OS: 4.18.0-305.2.1.el8_4.x86_64 OVN: ovn2.13-20.12.0-115.el8fdp.x86_64 OVS: openvswitch2.15-2.15.0-15.el8fdp.x86_64 OVN-K8s master: 19fc45c2aad19065070c4622292b5f962a245357 //------------------- ORIG DIRECTION ----------------------// recirc_id(0),in_port(br-ex),eth_type(0x0800),ipv4(dst=172.30.0.0/255.255.0.0,frag=no), packets:197, bytes:16326, used:0.270s, actions:ct(commit,zone=64001,nat(src=169.254.169.2)),recirc(0x1c763c) CT Zone 64001 recirc_id(0x1c763c),in_port(br-ex),ct_state(+new-est+trk),eth(src=98:03:9b:97:38:df,dst=3c:fd:fe:a0:d7:e1),eth_type(0x0800),ipv4(src=128.0.0.0/192.0.0.0,dst=172.30.0.1,proto=6,ttl=64,frag=no), packets:130, bytes:9620, used:0.277s, flags:S, actions:ct_clear,set(eth(dst=98:03:9b:97:38:df)),ct(zone=21),recirc(0x1c763d) recirc_id(0x1c763d),in_port(br-ex),ct_state(+new+trk),eth(),eth_type(0x0800),ipv4(dst=172.30.0.1,proto=6,frag=no),tcp(dst=443), packets:130, bytes:9620, used:0.277s, flags:S, actions:hash(l4(0)),recirc(0x1cf7e5) recirc_id(0x1cf7e5),dp_hash(0xe/0xf),in_port(br-ex),eth(),eth_type(0x0800),ipv4(frag=no), packets:5, bytes:370, used:7.957s, flags:S, actions:ct(commit,zone=21,label=0x2/0x2,nat(dst=10.0.1.12:6443)),recirc(0x1c763f) CT ZONE 21 recirc_id(0x1c763f),in_port(br-ex),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=98:03:9b:97:38:df,dst=98:03:9b:97:38:df),eth_type(0x0800),ipv4(dst=10.0.1.12,ttl=64,frag=no), packets:9, bytes:666, used:7.446s, flags:S, actions:set(eth(dst=3c:fd:fe:b5:80:ac)),set(ipv4(ttl=63)),ct(commit,nat(src=10.0.1.13)),recirc(0x1cf7e6) CT ZONE 0 recirc_id(0x1cf7e6),in_port(br-ex),ct_state(+new-est+trk),eth(src=98:03:9b:97:38:df,dst=3c:fd:fe:b5:80:ac),eth_type(0x0800),ipv4(dst=0.0.0.0/128.0.0.0,frag=no), packets:9, bytes:666, used:7.445s, flags:S, actions:ct_clear,ct(commit,zone=64000),ens801f1 //------------------- REPLY DIRECTION ----------------------// recirc_id(0),in_port(ens801f1),eth_type(0x0800),ipv4(proto=6,frag=no), packets:245178, bytes:115601664, used:0.000s, actions:ct(zone=64000),recirc(0x9) CT ZONE 64000 ct_state(+est+trk),recirc_id(0x9),in_port(ens801f1),eth(src=3c:fd:fe:b5:80:ac,dst=98:03:9b:97:38:df),eth_type(0x0800),ipv4(src=10.0.1.12,dst=10.0.1.13,proto=6,ttl=64,frag=no), packets:804, bytes:60778, used:0.140s, actions:ct_clear,ct(nat),recirc(0x1bfd57) CT ZONE 0 recirc_id(0x1bfd57),in_port(ens801f1),ct_state(-new-est-trk),eth(),eth_type(0x0800),ipv4(frag=no), packets:102977, bytes:8012828, used:0.086s, flags:SFPR., actions:ct(zone=21,nat),recirc(0x1bfd58) CT ZONE 21 recirc_id(0x1bfd58),in_port(ens801f1),ct_state(-new-est-rel-rpl+inv+trk),ct_label(0/0x1),eth(src=3c:fd:fe:b5:18:8c,dst=98:03:00:00:00:00/ff:ff:00:00:00:00),eth_type(0x0800),ipv4(dst=10.0.1.13,ttl=64,frag=no), packets:6422, bytes:475220, used:0.085s, flags:SF., actions:drop How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: CX-5 ovs hardware offload: [root@sriov-worker-0 core]# ethtool -i ens801f1 driver: mlx5e_rep version: 4.18.0-305.2.1.el8_4.x86_64 firmware-version: 16.29.2002 (MT_0000000012) expansion-rom-version: bus-info: 0000:b0:00.1 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no [root@sriov-worker-0 core]# lspci -vv -nn -mm -s 0000:b0:00.1 Slot: b0:00.1 Class: Ethernet controller [0200] Vendor: Mellanox Technologies [15b3] Device: MT27800 Family [ConnectX-5] [1017] SVendor: Mellanox Technologies [15b3] SDevice: Mellanox ConnectX®-5 MCX516A-CCAT [0007] NUMANode:
> > > //------------------- REPLY DIRECTION ----------------------// > > > recirc_id(0),in_port(ens801f1),eth_type(0x0800),ipv4(proto=6,frag=no), > packets:245178, bytes:115601664, used:0.000s, > actions:ct(zone=64000),recirc(0x9) > > CT ZONE 64000 > > ct_state(+est+trk),recirc_id(0x9),in_port(ens801f1),eth(src=3c:fd:fe:b5:80: > ac,dst=98:03:9b:97:38:df),eth_type(0x0800),ipv4(src=10.0.1.12,dst=10.0.1.13, > proto=6,ttl=64,frag=no), packets:804, bytes:60778, used:0.140s, > actions:ct_clear,ct(nat),recirc(0x1bfd57) > > CT ZONE 0 > > recirc_id(0x1bfd57),in_port(ens801f1),ct_state(-new-est-trk),eth(), > eth_type(0x0800),ipv4(frag=no), packets:102977, bytes:8012828, used:0.086s, > flags:SFPR., actions:ct(zone=21,nat),recirc(0x1bfd58) > > CT ZONE 21 > > recirc_id(0x1bfd58),in_port(ens801f1),ct_state(-new-est-rel-rpl+inv+trk), > ct_label(0/0x1),eth(src=3c:fd:fe:b5:18:8c,dst=98:03:00:00:00:00/ff:ff:00:00: > 00:00),eth_type(0x0800),ipv4(dst=10.0.1.13,ttl=64,frag=no), packets:6422, > bytes:475220, used:0.085s, flags:SF., actions:drop > Pasted the wrong flow, should be: recirc_id(0x1bfd58),in_port(ens801f1),ct_state(-new-est-rel-rpl+inv+trk),ct_label(0/0x1),eth(src=3c:fd:fe:b5:80:ac,dst=98:03:00:00:00:00/ff:ff:00:00:00:00),eth_type(0x0800),ipv4(dst=10.0.1.13,ttl=64,frag=no), packets:794, bytes:59234, used:0.146s, flags:SFP., actions:drop The above flow doesn't pass the packet either.
Considering the impacts of https://bugzilla.redhat.com/show_bug.cgi?id=1961097, I do believe this is a dupe/side effect of that one. Needs retesting after we backport that fix.
(In reply to Marcelo Ricardo Leitner from comment #2) > Considering the impacts of > https://bugzilla.redhat.com/show_bug.cgi?id=1961097, I do believe this is a > dupe/side effect of that one. Needs retesting after we backport that fix. Tested with kernel 4.18.0-305.3.1.el8_4.mr634_210522_0128.x86_64, the issue can be reproduced. OS: 4.18.0-305.3.1.el8_4.mr634_210522_0128.x86_64 OVN: ovn2.13-20.12.0-115.el8fdp.x86_64 OVS: openvswitch2.15-2.15.0-15.el8fdp.x86_64 OVN-K8s master: 58b09851bfd564a09d7358b552a9d60bd25a7508 Hostnetwork pod IP: 10.0.1.13 Service backend IPs: 10.0.1.10/10.0.1.11/10.0.1.12 //------------------- ORIG DIRECTION ----------------------// recirc_id(0),in_port(br-ex),eth_type(0x0800),ipv4(dst=172.30.0.0/255.255.0.0,frag=no), packets:6, bytes:444, used:0.670s, actions:ct(commit,zone=64001,nat(src=169.254.169.2)),recirc(0x15178) CT ZONE 64001 ct_state(+new-est+trk),recirc_id(0x15178),in_port(br-ex),eth(src=98:03:9b:97:38:df,dst=3c:fd:fe:a0:d7:e1),eth_type(0x0800),ipv4(src=128.0.0.0/192.0.0.0,dst=172.30.0.1,proto=6,ttl=64,frag=no), packets:5, bytes:370, used:0.670s, actions:ct_clear,set(eth(dst=98:03:9b:97:38:df)),ct(zone=22),recirc(0x15179) recirc_id(0x15179),in_port(br-ex),ct_state(+new+trk),eth(),eth_type(0x0800),ipv4(dst=172.30.0.1,proto=6,frag=no),tcp(dst=443), packets:6, bytes:444, used:0.686s, flags:S, actions:hash(l4(0)),recirc(0x1517a) recirc_id(0x1517a),dp_hash(0x6/0xf),in_port(br-ex),eth(),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=22,label=0x2/0x2,nat(dst=10.0.1.10:6443)),recirc(0x1517b) CT ZONE 22 ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),recirc_id(0x1517b),in_port(br-ex),eth(src=98:03:9b:97:38:df,dst=98:03:9b:97:38:df),eth_type(0x0800),ipv4(dst=10.0.1.10,proto=6,ttl=64,frag=no), packets:0, bytes:0, used:4.510s, actions:set(eth(dst=3c:fd:fe:b5:18:8c)),set(ipv4(ttl=63)),ct(commit,nat(src=10.0.1.13)),recirc(0x17f30) CT ZONE 0 ct_state(+new-est+trk),recirc_id(0x17f30),in_port(br-ex),eth(src=98:03:9b:97:38:df,dst=3c:fd:fe:b5:18:8c),eth_type(0x0800),ipv4(dst=0.0.0.0/128.0.0.0,frag=no), packets:0, bytes:0, used:4.510s, actions:ct_clear,ct(commit,zone=64000),ens801f1 //------------------- REPLY DIRECTION ----------------------// recirc_id(0),in_port(ens801f1),eth_type(0x0800),ipv4(proto=6,frag=no), packets:383937, bytes:360298773, used:0.000s, actions:ct(zone=64000),recirc(0x8) CT ZONE 64000 ct_state(+est+trk),recirc_id(0x8),in_port(ens801f1),eth(src=3c:fd:fe:b5:18:8c,dst=98:03:9b:97:38:df),eth_type(0x0800),ipv4(src=10.0.1.8/255.255.255.252,dst=10.0.1.13,proto=6,ttl=64,frag=no), packets:22, bytes:2783, used:0.290s, actions:ct_clear,ct(nat),recirc(0x10) CT ZONE 0 ct_state(+new-est+trk),recirc_id(0x10),in_port(ens801f1),eth_type(0x0800),ipv4(dst=0.0.0.0/128.0.0.0,frag=no), packets:0, bytes:0, used:10.320s, actions:ct(zone=22,nat),recirc(0x11) ct_state(-new-est+trk),recirc_id(0x10),in_port(ens801f1),eth_type(0x0800),ipv4(frag=no), packets:935292, bytes:915539357, used:0.010s, actions:ct(zone=22,nat),recirc(0x11) CT ZONE 22 ct_state(-new-est-rel-rpl+inv+trk),ct_label(0/0x1),recirc_id(0x11),in_port(ens801f1),eth(src=3c:fd:fe:b5:18:8c,dst=98:03:00:00:00:00/ff:ff:00:00:00:00),eth_type(0x0800),ipv4(dst=10.0.1.13,ttl=64,frag=no), packets:8, bytes:480, used:0.290s, actions:drop
Created attachment 1786697 [details] full ovs flow dump full ovs flows for comment #3
Created attachment 1786859 [details] ovs flow dump with -m `ovs-appctl dpctl/dump-flows -m` flow dump on the same environment as comment #3
Flow analysis from attachment in comment #5 OS: 4.18.0-305.3.1.el8_4.mr634_210522_0128.x86_64 OVN: ovn2.13-20.12.0-115.el8fdp.x86_64 OVS: openvswitch2.15-2.15.0-15.el8fdp.x86_64 OVN-K8s master: 58b09851bfd564a09d7358b552a9d60bd25a7508 Hostnetwork pod IP: 10.0.1.13 Service backend IPs: 10.0.1.10/10.0.1.11/10.0.1.12 //------------------- ORIG DIRECTION ----------------------// ufid:48cd5b5a-dde6-446f-8184-59d22e8db57b, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(br-ex),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=172.30.0.0/255.255.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:1325, bytes:104130, used:0.050s, dp:tc, actions:ct(commit,zone=64001,nat(src=169.254.169.2)),recirc(0x9b04) CT ZONE 64001 ufid:83fe8393-8166-47d5-b89d-a1fc585f6dcc, skb_priority(0/0),skb_mark(0/0),ct_state(0x21/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x9b04),dp_hash(0/0),in_port(br-ex),packet_type(ns=0/0,id=0/0),eth(src=98:03:9b:97:38:df,dst=3c:fd:fe:a0:d7:e1),eth_type(0x0800),ipv4(src=128.0.0.0/192.0.0.0,dst=172.30.0.1,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=0/0), packets:1050, bytes:77700, used:1.170s, dp:tc, actions:ct_clear,set(eth(dst=98:03:9b:97:38:df)),ct(zone=22),recirc(0x8c77) CT ZONE 22 ufid:57ac030a-5800-4619-a3a4-f6c3377ee989, recirc_id(0x8c77),dp_hash(0/0),skb_priority(0/0),in_port(br-ex),skb_mark(0/0),ct_state(0x21/0x21),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=172.30.0.1,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=443),tcp_flags(0/0), packets:1052, bytes:77848, used:1.182s, flags:S, dp:ovs, actions:hash(l4(0)),recirc(0x8c78) CT ZONE 0 ufid:ad539f8d-71a2-4a33-8e3b-6571e386d418, recirc_id(0x8c78),dp_hash(0xb/0xf),skb_priority(0/0),in_port(br-ex),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, actions:ct(commit,zone=22,label=0x2/0x2,nat(dst=10.0.1.12:6443)),recirc(0x8c79) CT ZONE 22 ufid:7c65a0c3-1d58-4553-b1bf-4f1b8e22614b, skb_priority(0/0),skb_mark(0/0),ct_state(0x21/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x8c79),dp_hash(0/0),in_port(br-ex),packet_type(ns=0/0,id=0/0),eth(src=98:03:9b:97:38:df,dst=98:03:9b:97:38:df),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=10.0.1.12,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=0/0), packets:0, bytes:0, used:1.810s, dp:tc, actions:set(eth(dst=3c:fd:fe:b5:80:ac)),set(ipv4(ttl=63)),ct(commit,nat(src=10.0.1.13)),recirc(0xcd08) CT ZONE 0 ufid:48b66e8f-82e7-4558-8abc-e8c9a6118833, skb_priority(0/0),skb_mark(0/0),ct_state(0x21/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0xcd08),dp_hash(0/0),in_port(br-ex),packet_type(ns=0/0,id=0/0),eth(src=98:03:9b:97:38:df,dst=3c:fd:fe:b5:80:ac),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/128.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:0, bytes:0, used:1.810s, dp:tc, actions:ct_clear,ct(commit,zone=64000),ens801f1 //------------------- REPLY DIRECTION ----------------------// ufid:1e81a69b-0a8a-4dba-aa27-5304508c8af9, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens801f1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=0/0), packets:9974662, bytes:1548850913, used:0.010s, offloaded:yes, dp:tc, actions:ct(zone=64000),recirc(0x9) CT ZONE 64000 ufid:35896391-d93c-4e4a-9571-cd78baff869c, skb_priority(0/0),skb_mark(0/0),ct_state(0x22/0x22),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x9),dp_hash(0/0),in_port(ens801f1),packet_type(ns=0/0,id=0/0),eth(src=3c:fd:fe:b5:80:ac,dst=98:03:9b:97:38:df),eth_type(0x0800),ipv4(src=10.0.1.12,dst=10.0.1.13,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=0/0), packets:54621, bytes:56577533, used:0.490s, offloaded:yes, dp:tc, actions:ct_clear,ct(nat),recirc(0x6) CT ZONE 0 ufid:cf5092dc-ab93-4f8d-af78-1cdf73a3893d, skb_priority(0/0),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x6),dp_hash(0/0),in_port(ens801f1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:3341399, bytes:756969215, used:0.070s, offloaded:yes, dp:tc, actions:ct(zone=22,nat),recirc(0x7) CT ZONE 22 ufid:efc7f282-b9f4-4c84-9d48-e2706903f4a2, skb_priority(0/0),skb_mark(0/0),ct_state(0x30/0x3f),ct_zone(0/0),ct_mark(0/0),ct_label(0/0x1),recirc_id(0x7),dp_hash(0/0),in_port(ens801f1),packet_type(ns=0/0,id=0/0),eth(src=3c:fd:fe:b5:80:ac,dst=98:03:00:00:00:00/ff:ff:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=10.0.1.13,proto=0/0,tos=0/0,ttl=64,frag=no), packets:1049, bytes:62940, used:1.810s, dp:tc, actions:drop
> > //------------------- REPLY DIRECTION ----------------------// > > CT ZONE 0 > > ufid:cf5092dc-ab93-4f8d-af78-1cdf73a3893d, > skb_priority(0/0),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/ ^^^^^^^^^^ -new-est+trk should not be offloaded > 0),ct_label(0/0),recirc_id(0x6),dp_hash(0/0),in_port(ens801f1), > packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00, > dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0. > 0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), > packets:3341399, bytes:756969215, used:0.070s, offloaded:yes, dp:tc, ^^^^^^^^^^^^^ marked as offloaded:yes by ovs > actions:ct(zone=22,nat),recirc(0x7) Could the above flow results in a drop flow as below? > > CT ZONE 22 > > ufid:efc7f282-b9f4-4c84-9d48-e2706903f4a2, > skb_priority(0/0),skb_mark(0/0),ct_state(0x30/0x3f),ct_zone(0/0),ct_mark(0/ > 0),ct_label(0/0x1),recirc_id(0x7),dp_hash(0/0),in_port(ens801f1), > packet_type(ns=0/0,id=0/0),eth(src=3c:fd:fe:b5:80:ac,dst=98:03:00:00:00:00/ ^^^^^^^^^^^^^^^^ match the src mac > ff:ff:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=10.0.1.13, > proto=0/0,tos=0/0,ttl=64,frag=no), packets:1049, bytes:62940, used:1.810s, > dp:tc, actions:drop ^^^^ dropped
(In reply to zenghui.shi from comment #7) > > > > //------------------- REPLY DIRECTION ----------------------// > > > > > CT ZONE 0 > > > > ufid:cf5092dc-ab93-4f8d-af78-1cdf73a3893d, > > skb_priority(0/0),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/ > ^^^^^^^^^^ -new-est+trk should > not be offloaded This should be fine, actually. +new, +inv and +rel are the non-offloadable ones.
Although I can't explain why the action:drop, I guess Zenghui is also hitting the issues being fixed on https://patchwork.ozlabs.org/project/ovn/patch/20210520230114.3697365-1-numans@ovn.org/ which is https://bugzilla.redhat.com/show_bug.cgi?id=1953278 and https://bugzilla.redhat.com/show_bug.cgi?id=1956740 due to ct zone swinging 22/0/22/0. If it makes sense, OVN team, can we have a test package please?
(In reply to zenghui.shi from comment #6) > Flow analysis from attachment in comment #5 > > OS: 4.18.0-305.3.1.el8_4.mr634_210522_0128.x86_64 For easy reference and FTR, this is https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/634/commits https://gitlab.com/redhat/red-hat-ci-tools/kernel/cki-internal-pipelines/cki-internal-contributors/-/jobs/1284676077
(In reply to zenghui.shi from comment #6) > Flow analysis from attachment in comment #5 > > OS: 4.18.0-305.3.1.el8_4.mr634_210522_0128.x86_64 > OVN: ovn2.13-20.12.0-115.el8fdp.x86_64 > OVS: openvswitch2.15-2.15.0-15.el8fdp.x86_64 > OVN-K8s master: 58b09851bfd564a09d7358b552a9d60bd25a7508 > > > Hostnetwork pod IP: 10.0.1.13 > Service backend IPs: 10.0.1.10/10.0.1.11/10.0.1.12 > > > //------------------- ORIG DIRECTION ----------------------// > > ufid:48cd5b5a-dde6-446f-8184-59d22e8db57b, > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0), > ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(br-ex),packet_type(ns=0/0, > id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00: > 00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=172.30.0.0/255. > 255.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:1325, bytes:104130, > used:0.050s, dp:tc, > actions:ct(commit,zone=64001,nat(src=169.254.169.2)),recirc(0x9b04) > > CT ZONE 64001 > > ufid:83fe8393-8166-47d5-b89d-a1fc585f6dcc, > skb_priority(0/0),skb_mark(0/0),ct_state(0x21/0x23),ct_zone(0/0),ct_mark(0/ > 0),ct_label(0/0),recirc_id(0x9b04),dp_hash(0/0),in_port(br-ex), > packet_type(ns=0/0,id=0/0),eth(src=98:03:9b:97:38:df,dst=3c:fd:fe:a0:d7:e1), > eth_type(0x0800),ipv4(src=128.0.0.0/192.0.0.0,dst=172.30.0.1,proto=6,tos=0/0, > ttl=64,frag=no),tcp(src=0/0,dst=0/0), packets:1050, bytes:77700, > used:1.170s, dp:tc, > actions:ct_clear,set(eth(dst=98:03:9b:97:38:df)),ct(zone=22),recirc(0x8c77) > > CT ZONE 22 > > ufid:57ac030a-5800-4619-a3a4-f6c3377ee989, > recirc_id(0x8c77),dp_hash(0/0),skb_priority(0/0),in_port(br-ex),skb_mark(0/ > 0),ct_state(0x21/0x21),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00: > 00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00), > eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=172.30.0.1,proto=6,tos=0/0, > ttl=0/0,frag=no),tcp(src=0/0,dst=443),tcp_flags(0/0), packets:1052, > bytes:77848, used:1.182s, flags:S, dp:ovs, actions:hash(l4(0)),recirc(0x8c78) > > CT ZONE 0 > > ufid:ad539f8d-71a2-4a33-8e3b-6571e386d418, > recirc_id(0x8c78),dp_hash(0xb/0xf),skb_priority(0/0),in_port(br-ex), > skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0), > eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00: > 00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0, > proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no pkt hit this rule > actions:ct(commit,zone=22,label=0x2/0x2,nat(dst=10.0.1.12:6443)), > recirc(0x8c79) > From conntrack entries, pkt is not dnated on the original direction: Service IP: 172.30.0.1 Backend endpoint IPs: 10.0.1.10/10.0.1.11/10.0.1.12 Node IP: 10.0.1.13 ipv4 2 tcp 6 116 SYN_SENT src=169.254.169.2 dst=172.30.0.1 sport=12346 dport=443 [UNREPLIED] src=172.30.0.1 dst=169.254.169.2 sport=443 dport=29409 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64001 use=2 ^^^^^^^^^^^ It is expected that pkg goes to zone 22 and gets dnated, but the entry shows it is still in zone 64000 The expected entry is something like: ipv4 2 tcp 6 8 CLOSE src=169.254.169.2 dst=172.30.0.1 sport=12345 dport=443 src=10.0.1.12 dst=169.254.169.2 \ sport=6443 dport=12345 zone=22 ipv4 2 tcp 6 116 SYN_SENT src=10.0.1.13 dst=172.30.0.1 sport=12346 dport=443 [UNREPLIED] src=172.30.0.1 dst=169.254.169.2 sport=443 dport=12346 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64001 use=2 ipv4 2 tcp 6 116 SYN_SENT src=10.0.1.13 dst=172.30.0.1 sport=12346 dport=443 [UNREPLIED] src=172.30.0.1 dst=10.0.1.13 sport=443 dport=12346 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2
(In reply to zenghui.shi from comment #11) > (In reply to zenghui.shi from comment #6) ... > > CT ZONE 0 ^^^^^^ > > > > ufid:ad539f8d-71a2-4a33-8e3b-6571e386d418, > > recirc_id(0x8c78),dp_hash(0xb/0xf),skb_priority(0/0),in_port(br-ex), > > skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0), > > eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00: > > 00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0, > > proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no pkt > hit this rule > > actions:ct(commit,zone=22,label=0x2/0x2,nat(dst=10.0.1.12:6443)), > > recirc(0x8c79) > > > > From conntrack entries, pkt is not dnated on the original direction: ^^^^^^^^^^^^^^^^^ (omitted the rest for simplicity) Super! This matches the issue Ariel just root caused: https://bugzilla.redhat.com/show_bug.cgi?id=1881824#c14 I'll have a new test kernel soon with his original patch.
New test kernel with Ariel's patch available here: yum repo file: https://s3.upshift.redhat.com/DH-PROD-CKI/internal/310282800/repo-x86_64 Built from: https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/680
(In reply to Marcelo Ricardo Leitner from comment #13) > New test kernel with Ariel's patch available here: > yum repo file: > https://s3.upshift.redhat.com/DH-PROD-CKI/internal/310282800/repo-x86_64 > Built from: > https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/680 Issue remains with the new test kernel 4.18.0-305.4.1.el8_4.mr680_210527_0238.x86_64 From conntrack entries, pkt is not dnated correctly on the original direction: ipv4 2 tcp 6 4 SYN_RECV src=169.254.169.2 dst=172.30.0.1 sport=12346 dport=443 src=172.30.0.1 dst=169.254.169.2 sport=443 dport=62378 mark=0 secctx=system_u:object_r:unlabeled_t:s0 ^^^^^^^^ pkt is not dnated correctly, but we got SYN_RECV? zone=64001 use=2 ipv4 2 tcp 6 64 SYN_SENT src=10.0.1.13 dst=172.30.0.1 sport=12346 dport=443 [UNREPLIED] src=172.30.0.1 dst=169.254.169.2 sport=443 dport=12346 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64001 use=2 ipv4 2 tcp 6 64 SYN_SENT src=10.0.1.13 dst=172.30.0.1 sport=12346 dport=443 [UNREPLIED] src=172.30.0.1 dst=10.0.1.13 sport=443 dport=12346 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2
@zshi, as discussed in the meeting earlier, this is a scratch ovn2.13 build including Mark's and Numan's v3 [0]: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=37028480 Regards, Dumitru [0] https://github.com/numansiddique/ovn/commit/97574a3844527e801ad01c4f3b5a5de6ce6abfec
(In reply to Dumitru Ceara from comment #15) > @zshi, as discussed in the meeting earlier, this is a scratch > ovn2.13 build including Mark's and Numan's v3 [0]: > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=37028480 > > Regards, > Dumitru > > [0] > https://github.com/numansiddique/ovn/commit/ > 97574a3844527e801ad01c4f3b5a5de6ce6abfec Dumitru, thanks for the build! The issue remains after upgrading ovn to ovn2.13-20.12.0-136.el8fdp.x86_64 and kernel to kernel 4.18.0-305.4.1.el8_4.mr680_210527_0238.x86_64 symptom is the same as in comment #14.
Internally we are also seeing issues with host network pod to service backed by host network . tested with Mark's/Numan's V4 OVN patch running : # curl https://10.96.0.1/livez --insecure first time it works OK, the next execution takes a long time to complete on the wire I see packets without SNAT & DNAT applied. 10.96.0.1.https > 169.254.169.2.43752 Conntrack output: tcp 6 src=169.254.169.2 dst=10.96.0.1 sport=37154 dport=443 src=10.11.0.10 dst=169.254.169.2 sport=6443 dport=37154 [ASSURED] mark=0 secctx=null zone=8 use=2 tcp 6 116 SYN_SENT src=10.11.0.11 dst=10.96.0.1 sport=37172 dport=443 [UNREPLIED] src=10.96.0.1 dst=169.254.169.2 sport=443 dport=37172 mark=0 secctx=null zone=64001 use=1 tcp 6 116 SYN_SENT src=10.11.0.11 dst=10.96.0.1 sport=37172 dport=443 [UNREPLIED] src=10.96.0.1 dst=10.11.0.11 sport=443 dport=37172 mark=0 secctx=null use=1 tcp 6 56 SYN_RECV src=169.254.169.2 dst=10.11.0.10 sport=42125 dport=6443 src=10.11.0.10 dst=10.11.0.11 sport=6443 dport=42125 mark=0 secctx=null use=1 tcp 6 src=169.254.169.2 dst=10.11.0.10 sport=37154 dport=6443 src=10.11.0.10 dst=10.11.0.11 sport=6443 dport=37154 [ASSURED] mark=0 secctx=null use=2 tcp 6 56 SYN_RECV src=10.11.0.11 dst=10.11.0.10 sport=42125 dport=6443 src=10.11.0.10 dst=10.11.0.11 sport=6443 dport=42125 mark=0 secctx=null zone=64000 use=1 tcp 6 56 SYN_RECV src=169.254.169.2 dst=10.96.0.1 sport=42125 dport=443 src=10.11.0.10 dst=169.254.169.2 sport=6443 dport=42125 mark=0 secctx=null zone=8 use=1 tcp 6 56 SYN_RECV src=169.254.169.2 dst=10.96.0.1 sport=37172 dport=443 src=10.96.0.1 dst=169.254.169.2 sport=443 dport=42125 mark=0 secctx=null zone=64001 use=1
(In reply to Adrian Chiris from comment #17) > Internally we are also seeing issues with host network pod to service backed > by host network . tested with Mark's/Numan's V4 OVN patch With upstream or downstream kernel? I would assume upstream at this stage, but please confirm. Thanks
> With upstream or downstream kernel? I would assume upstream at this stage, but please confirm. Thanks its a downstream kernel but not RH. However it contains all needed upstream kernel fixes/support that were mapped.
note to self: internal RM ticket 2648680
I have a theory on this. After hours troubleshooting this with Zenghui today, one thing caught my eye: taking tcpdumps on br-ex is showing that the SYN packet already has the 1st SNAT done. That happens because since 95255018a83e ("ovs-tc: allow offloading TC rules to egress qdiscs") ovs will use egress rules, to cope with the lack of representor ports. As TC rules on egress are executed BEFORE the taps, that means some TC rule got executed. On today's tests, we always took a big while to re-run the test, so datapath flows would end up expiring. No specific reason for doing the tests like this, though. Yet, some other traffic could be lighting flows in the background. Then, as Adrian noted: (In reply to Adrian Chiris from comment #17) > first time it works OK, the next execution takes a long time to complete > on the wire I see packets without SNAT & DNAT applied. Now my theory: That's likely because the very first one was an upcall, handled entirely by vswitchd, while the subsequent ones, well, triggers an unexpected situation. We didn't see un-NATed packets on the wire, but we did see a missing one on br-ex on ingress side, the last SNAT was not undone. Because: When the egress rules attached to br-ex by the above hit a miss (like after doing the 1st SNAT), I don't see a way that it can tell OVS "hey I went up to chain X" like we have on the ingress side of it. So vswitchd handles it as if nothing ever happened over TC land! __dev_queue_xmit sch_handle_egress tcf_classify <--- listed below dev_hard_start_xmit xmit_one dev_queue_xmit_nit <-- our tap point for tcpdump netdev_start_xmit __netdev_start_xmit ops->ndo_start_xmit (which is internal_dev_xmit , from vport-internal_dev.c) ovs_vport_receive ovs_dp_process_packet ovs_flow_tbl_lookup_stats returns NULL, no flow in dp:ovs, so ovs_dp_upcall <--- with 0 knowledge that dp:tc already SNATed this packet. int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct tcf_result *res, bool compat_mode) { u32 last_executed_chain = 0; return __tcf_classify(skb, tp, tp, res, compat_mode, &last_executed_chain); } Which explains why we saw 2 conntrack entries on today's tests for the same connection: ipv4 2 tcp 6 59 SYN_RECV src=169.254.169.2 dst=172.30.0.1 sport=12345 dport=443 src=172.30.0.1 dst=169.254.169.2 sport=443 dport=56804 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64001 use=2 ipv4 2 tcp 6 119 SYN_SENT src=192.168.111.25 dst=172.30.0.1 sport=12345 dport=443 [UNREPLIED] src=172.30.0.1 dst=169.254.169.2 sport=443 dport=12345 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64001 use=2 Note that both are on zone=64001, and the 1st one has a bogus src ip. The 2nd one is the right one. The 1st one is created out of the issue above, when vswitchd handles it unaware of previous handling. Sounds like we have 2 features colliding here. tc on egress and tc chain fallback for CT are not integrated in here.
Adding more people for awareness. The above is likely fixable with a kernel patch on tc/core stack. (doesn't mean it's an easy fix!) kernel ovs will already handle properly the chain information once available to it.
Hi, Marcelo. Following your analysis, Roi suggested this fix, can you try it? --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3973,7 +3973,8 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) qdisc_skb_cb(skb)->post_ct = false; mini_qdisc_bstats_cpu_update(miniq, skb); - switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) { + switch (tcf_classify_ingress(skb, miniq->block, miniq->filter_list, + &cl_res, false)) { case TC_ACT_OK: case TC_ACT_RECLASSIFY: skb->tc_index = TC_H_MIN(cl_res.classid);
Ariel shared on the mtg today that they found another bug around this. When mirred sends a packet towards an internal port, there is no scrubbing, and thus the skb may carry a previous conntrack state on it. tcf_mirred_forward (skb with CT info on it) netif_receive_skb netif_receive_skb_internal __netif_receive_skb ... This affects packets going from a representor to an internal port.
(In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #23) > Hi, Marcelo. > > Following your analysis, Roi suggested this fix, can you try it? Nice! Thanks. Yes. I'll build a test kernel as soon as Ariel share the patch for the scrubbing issue.
He probably refered to this fix, Roi said it fixed host to pod. the upstream patch might be different, but it would be nice to get initial testing. net: sched: act_mirred: Reset ct when reinserting skb into queue When we reinsert an skb back we should reset ct for reclassification. Signed-off-by: Roi Dayan <roid> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 5ae3e3197fb5..65560032b496 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -286,6 +286,8 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, /* let's the caller reinsert the packet, if possible */ if (use_reinsert) { + if (want_ingress) + nf_reset_ct(skb); res->ingress = want_ingress; if (skb_tc_reinsert(skb, res)) tcf_action_inc_overlimit_qstats(&m->common);
Yes, this one. Thanks.
(In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #26) > net: sched: act_mirred: Reset ct when reinserting skb into queue > > When we reinsert an skb back we should reset ct for reclassification. ... > @@ -286,6 +286,8 @@ static int tcf_mirred_act(struct sk_buff *skb, const > struct tc_action *a, > > /* let's the caller reinsert the packet, if possible */ > if (use_reinsert) { > + if (want_ingress) > + nf_reset_ct(skb); Btw I wonder why not just call skb_scrub_packet() here. It will overwrite skb->pkt_type , but that's what is used today in OVS cases, at least.
Test kernel is being built. It will be available in the draft MR below: Commit list: https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/942/commits This is based on latest 8.5 and also added the fix for ct_label 0.
(In reply to Marcelo Ricardo Leitner from comment #29) > Test kernel is being built. It will be available in the draft MR below: > Commit list: > https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/942/commits > > This is based on latest 8.5 and also added the fix for ct_label 0. Infrastructure issues are preventing this build from completing.
(In reply to Marcelo Ricardo Leitner from comment #28) > (In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #26) > > net: sched: act_mirred: Reset ct when reinserting skb into queue > > > > When we reinsert an skb back we should reset ct for reclassification. > ... > > @@ -286,6 +286,8 @@ static int tcf_mirred_act(struct sk_buff *skb, const > > struct tc_action *a, > > > > /* let's the caller reinsert the packet, if possible */ > > if (use_reinsert) { > > + if (want_ingress) > > + nf_reset_ct(skb); > > Btw I wonder why not just call skb_scrub_packet() here. > It will overwrite skb->pkt_type , but that's what is used today in OVS > cases, at least. I talked to Roi, he said that they started with scrub and saw that it fixed the issue. However, they weren't sure if that was too much or not, so they thought about doing something smaller and then they switched to the reset. Anyway, they didn't get a chance to fully test both ways and really decide about the right approach (that's why they didn't post it yet). But if OVS does scrub, it sounds reasonable to do it here too. It will be great if you could try it, and then you could even post the fix upstream, we'll ack it :) Thanks for the help! Alaa
I applied the kernel builds[1] to the openshift worker nodes and it fixed the issue (host networked pod cannot access k8s service backed by hostnetworked pod). ovs: openvswitch2.15-2.15.0-24.el8fdp.x86_64 ovn: ovn2.13-20.12.0-140.el8fdp.x86_64 kernel: 4.18.0-322.el8.mr942_210708_1548.x86_64 [1]: https://s3.upshift.redhat.com/DH-PROD-CKI/internal/333962177/repo-x86_64.repo
(In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #31) > (In reply to Marcelo Ricardo Leitner from comment #28) > > (In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #26) > > > net: sched: act_mirred: Reset ct when reinserting skb into queue > > > > > > When we reinsert an skb back we should reset ct for reclassification. > > ... > > > @@ -286,6 +286,8 @@ static int tcf_mirred_act(struct sk_buff *skb, const > > > struct tc_action *a, > > > > > > /* let's the caller reinsert the packet, if possible */ > > > if (use_reinsert) { > > > + if (want_ingress) > > > + nf_reset_ct(skb); > > > > Btw I wonder why not just call skb_scrub_packet() here. > > It will overwrite skb->pkt_type , but that's what is used today in OVS > > cases, at least. > > I talked to Roi, he said that they started with scrub and saw that it fixed > the issue. > However, they weren't sure if that was too much or not, so they thought > about doing something smaller and then they switched to the reset. > Anyway, they didn't get a chance to fully test both ways and really decide > about the right approach (that's why they didn't post it yet). > > But if OVS does scrub, it sounds reasonable to do it here too. I think it does only when crossing net namespaces, but not interfaces. https://elixir.bootlin.com/linux/latest/source/net/openvswitch/vport.c#L443 Now I'm wondering if: - this is really an issue - or an expected (and weird) behavior, - or if OvS is also affected. - or I am missing something :D I seem to recall many flows having a ct_clear before hitting mirred. Maybe that's why. > It will be great if you could try it, and then you could even post the fix > upstream, we'll ack it :) Yup, ok. Thanks!
(lets move this discussion to the other bug, bz1980532)
what about the egress chains restore? are we good? can we submit it upstream?
we're good, as in, we're working on it :) Davide is working on an upstreamable version of it at: https://bugzilla.redhat.com/show_bug.cgi?id=1980537
Thank a lot guys :)
Folks, AFAICT this bz is only waiting for bz1980537 and bz1980532 to get applied downstream now. I built a MR with both fixes so Zenghui can try again, now with the final fixes, and confirm that it works. With that, I'll take this bug for now. Zenghui, I'll share a more direct URL once the kernel is built. https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/1127
There it is: https://s3.upshift.redhat.com/DH-PROD-CKI/internal/351141150/repo-x86_64.repo https://gitlab.com/redhat/red-hat-ci-tools/kernel/cki-internal-pipelines/cki-internal-contributors/-/jobs/1493253795/artifacts/browse/artifacts/repo/4.18.0-330.el8.mr1127_210810_2027.x86_64/
(In reply to Marcelo Ricardo Leitner from comment #39) > There it is: > https://s3.upshift.redhat.com/DH-PROD-CKI/internal/351141150/repo-x86_64.repo Marcelo, the kernel (4.18.0-330.el8.mr1127_210810_2027.x86_64) doesn't fix this bug. When I applied the above kernel, host network pod failed to communicate with service backed by host network pod. I then reverted back to previous kernel (4.18.0-322.el8.mr942_210708_1548.x86_64) which works. What's the difference between these two kernels?
That is some unexpected news, Zenghui. I'm not sure why that happened. The only related difference is that now I used the patches that got accepted upstream. Hmm...
(In reply to zenghui.shi from comment #0) > //------------------- ORIG DIRECTION ----------------------// ... > recirc_id(0x1cf7e6),in_port(br-ex),ct_state(+new-est+trk),eth(src=98:03:9b: ^^^^^^^^^^^^^^ > 97:38:df,dst=3c:fd:fe:b5:80:ac),eth_type(0x0800),ipv4(dst=0.0.0.0/128.0.0.0, > frag=no), packets:9, bytes:666, used:7.445s, flags:S, > actions:ct_clear,ct(commit,zone=64000),ens801f1 ^^^^^^^^ And I don't see a log on the bz but it should be simply swapped for reply direction. Original patch for clearing CT info was: @@ -303,6 +306,8 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, /* let's the caller reinsert the packet, if possible */ if (use_reinsert) { + if (want_ingress) + nf_reset(skb); res->ingress = want_ingress; err = tcf_mirred_forward(res->ingress, skb); if (err) While now it is: @@ -278,6 +278,9 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, goto out; } + /* All mirred/redirected skbs should clear previous ct info */ + nf_reset(skb2); + want_ingress = tcf_mirred_act_wants_ingress(m_eaction); expects_nh = want_ingress || !m_mac_header_xmit; Thing is, nothing should be using this information by when mirred is done with this setup, and it doesn't affect misses from dp:tc. Yet, this is the biggest delta between the test kernels. Checking how OVS configs mirred: nl_msg_put_flower_acts() if (i == flower->action_count - 1) { if (ingress) { nl_msg_put_act_mirred(request, ifindex, TC_ACT_STOLEN, TCA_INGRESS_REDIR); <---- hit this on reply } else { nl_msg_put_act_mirred(request, ifindex, TC_ACT_STOLEN, TCA_EGRESS_REDIR); <---- hit this on orig } } else { if (ingress) { nl_msg_put_act_mirred(request, ifindex, TC_ACT_PIPE, TCA_INGRESS_MIRROR); } else { nl_msg_put_act_mirred(request, ifindex, TC_ACT_PIPE, TCA_EGRESS_MIRROR); } } because the output action is the last one on the action list. With that: is_redirect = tcf_mirred_is_act_redirect(m_eaction); // true in both cases use_reinsert = skb_at_tc_ingress(skb) && is_redirect && tcf_mirred_can_reinsert(retval); // false for orig, true for reply if (!use_reinsert) { skb2 = skb_clone(skb, GFP_ATOMIC); with the new patch: nf_reset(skb2); // clears CT info on cloned packet if orig, or on the same packet if reply // due to difference in skb_at_tc_ingress() want_ingress = tcf_mirred_act_wants_ingress(m_eaction); // false for orig, true for reply then with the previous patch: /* let's the caller reinsert the packet, if possible */ if (use_reinsert) { // false for orig traffic, so the change had no effect + if (want_ingress) // true for reply traffic + nf_reset(skb); res->ingress = want_ingress; err = tcf_mirred_forward(res->ingress, skb); ... } err = tcf_mirred_forward(want_ingress, skb2); // triggered on orig traffic The only difference I can tell, is that the new patch is clearing CT info on the orig direction and it wasn't on the previous patch. Yet, as I mentioned earlier in the comment, this shouldn't affect this test. This is puzzling.
I'm afraid we either need another test or more debug info. I can revert one of the patches to the previous version, the one on clearing CT info above, and see how it goes. Or, we need dumps of datapath flows and so, like we had on comment #0, for the good and the bad kernel, so we can spot differences. This one is more promising. Zenghui, please let me know which one you prefer.
I wrote a reproducer here as close as I could, it failed on kernel-core-4.18.0-329.el8.x86_64. Then tested on the new test kernel, and it works. OVS flows: addf="ovs-ofctl add-flow ${ovs_name}" # client -> server $addf "table=0,ipv4,in_port=$ovs_name,actions=ct(zone=1,table=1,nat)" $addf "table=1,ipv4,ct_state=+trk+new,actions=ct(zone=1,commit,nat(dst=12.0.0.1)),ct_clear,resubmit(,2)" $addf "table=1,ipv4,ct_state=+trk+est,actions=ct_clear,resubmit(,2)" $addf "table=2,ipv4,tcp,in_port=$ovs_name,actions=ct(zone=2,table=3,nat)" $addf "table=3,ipv4,tcp,tcp_dst=3000,actions=drop" $addf "table=3,ipv4,tcp,ct_state=+trk+new,actions=ct(zone=2,commit,nat(src=12.0.0.2)),resubmit(,4)" $addf "table=3,ipv4,tcp,ct_state=+trk+est,actions=resubmit(,4)" $addf "table=4,ipv4,in_port=$ovs_name,action=output:$REP" # server -> client $addf "table=0,ipv4,in_port=$REP,actions=ct(zone=2,table=11,nat)" $addf "table=11,ipv4,tcp,ct_state=+trk+est,actions=ct_clear,ct(zone=1,table=12,nat)" $addf "table=12,ipv4,in_port=$REP,action=output:$ovs_name" $addf "table=0,ipv6,actions=drop" $addf "table=0,actions=NORMAL" And test to trigger a miss after a NAT: sleep 10 | ip netns exec $VF nc -l 5000 & pid=$! ip netns exec $VF tcpdump -n -i $VF -w $VF.cap & pid="$pid $!" tcpdump -n -i $ovs_name -w $ovs_name.cap & pid="$pid $!" sleep 2 echo | nc -w 1 11.0.0.1 3000 || : ovs-appctl dpctl/dump-flows -m sleep 10 | nc 11.0.0.1 5000 & sleep 1 grep 'zone=[12]' /proc/net/nf_conntrack || : ovs-appctl dpctl/dump-flows -m sleep 1 grep 'zone=[12]' /proc/net/nf_conntrack || : sleep 1 grep 'zone=[12]' /proc/net/nf_conntrack || : # uname -r 4.18.0-330.el8.mr1127_210810_2027.x86_64 # tshark -r br0.cap ip Running as user "root" and group "root". This could be dangerous. 1 0.000000 11.0.0.2 → 11.0.0.1 TCP 74 36864 → 3000 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2031774104 TSecr=0 WS=128 2 1.012988 11.0.0.2 → 12.0.0.1 TCP 74 38066 → 5000 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2031775117 TSecr=0 WS=128 3 1.043074 11.0.0.1 → 11.0.0.2 TCP 74 5000 → 38066 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=3731674343 TSecr=2031775117 WS=128 4 1.043106 11.0.0.2 → 12.0.0.1 TCP 66 38066 → 5000 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=2031775147 TSecr=3731674343 #1: warm up packet #2: DNAT already applied by TC #3: we got a SYN/ACK # tshark -r enp130s0f0v0.cap ip Running as user "root" and group "root". This could be dangerous. 6 2.480917 12.0.0.2 → 12.0.0.1 TCP 74 38066 → 5000 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2031775117 TSecr=0 WS=128 7 2.480941 12.0.0.1 → 12.0.0.2 TCP 74 5000 → 38066 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=3731674343 TSecr=2031775117 WS=128 8 2.511118 12.0.0.2 → 12.0.0.1 TCP 66 38066 → 5000 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=2031775147 TSecr=3731674343 While with -330.el8: # tshark -r br0.cap ip Running as user "root" and group "root". This could be dangerous. 1 0.000000 11.0.0.2 → 11.0.0.1 TCP 74 54732 → 3000 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1895223696 TSecr=0 WS=128 2 1.012617 11.0.0.2 → 12.0.0.1 TCP 74 47548 → 5000 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1895224709 TSecr=0 WS=128 3 2.067455 11.0.0.2 → 12.0.0.1 TCP 74 [TCP Retransmission] 47548 → 5000 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1895225764 TSecr=0 WS=128 #1: warm up packet #2: packet with DNAT applied by TC #3: retrans
Created attachment 1815420 [details] bz1961063/worker-3-flows-m.txt OVS DATAPATH FLOW comparison: worker-2 - working kernel: 4.18.0-322.el8.mr942_210708_1548.x86_64 //------------------- ORIG DIRECTION ----------------------// recirc_id(0),in_port(br-ex),eth_type(0x0800),ipv4(dst=172.30.0.0/255.255.0.0,frag=no), packets:51, bytes:5447, used:2.170s, actions:ct(commit,zone=64001,nat(src=169.254.169.2)),recirc(0x3450d) Zone 64001 ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x3),recirc_id(0x3450d),in_port(br-ex),eth(src=0c:42:a1:00:b6:9c,dst=52:54:00:a0:9d:83),eth_type(0x0800),ipv4(src=128.0.0.0/192.0.0.0,dst=172.30.0.1,proto=6,ttl=64,frag=no), packets:0, bytes:0, used:2.180s, actions:ct_clear,set(eth(dst=0c:42:a1:00:b6:9c)),ct(zone=45),recirc(0x3450e) recirc_id(0x3450e),in_port(br-ex),ct_state(+new-est+trk),eth(),eth_type(0x0800),ipv4(dst=172.30.0.1,proto=6,frag=no),tcp(dst=443), packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x34541) recirc_id(0x34541),dp_hash(0x5/0xf),in_port(br-ex),eth(),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=45,label=0x2/0x2,nat(dst=192.168.111.22:6443)),recirc(0x3450f) Zone 45 ct_state(+new-est-rel-rpl-inv+trk),ct_label(0x2/0x3),recirc_id(0x3450f),in_port(br-ex),eth(src=0c:42:a1:00:b6:9c,dst=0c:42:a1:00:b6:9c),eth_type(0x0800),ipv4(dst=192.168.111.22,proto=6,ttl=64,frag=no), packets:0, bytes:0, used:2.180s, actions:set(eth(dst=00:8a:2e:2b:9d:e0)),set(ipv4(ttl=63)),ct(commit,nat(src=192.168.111.26)),recirc(0x34510) Zone 0 ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x3),recirc_id(0x34510),in_port(br-ex),eth(src=0c:42:a1:00:b6:9c,dst=00:8a:2e:2b:9d:e0),eth_type(0x0800),ipv4(dst=192.0.0.0/192.0.0.0,frag=no), packets:0, bytes:0, used:2.180s, actions:ct_clear,ct(commit,zone=64000),ens8f0 Zone 64000 //------------------- REPLY DIRECTION ----------------------// recirc_id(0),in_port(ens8f0),eth_type(0x0800),ipv4(proto=6,frag=no), packets:5532000, bytes:4473169296, used:0.000s, actions:ct(zone=64000),recirc(0x8) Zone 64000 ct_state(+est-rel+rpl-inv+trk),ct_label(0/0x3),recirc_id(0x8),in_port(ens8f0),eth(src=00:8a:2e:2b:9d:e0,dst=0c:42:a1:00:b6:9c),eth_type(0x0800),ipv4(src=192.168.111.16/255.255.255.248,dst=192.168.111.26,proto=6,ttl=64,frag=no), packets:499, bytes:403323, used:0.490s, actions:ct_clear,ct(nat),recirc(0x3429d) Zone 0 ct_state(+est+trk),recirc_id(0x3429d),in_port(ens8f0),eth_type(0x0800),ipv4(dst=0.0.0.0/128.0.0.0,frag=no), packets:349, bytes:310513, used:0.490s, actions:ct(zone=45,nat),recirc(0x3429e) Zone 45 ct_state(+est-rel+rpl-inv+trk),ct_label(0x2/0x3),recirc_id(0x3429e),in_port(ens8f0),eth(src=00:8a:2e:2b:9d:e0,dst=0c:42:a1:00:b6:9c),eth_type(0x0800),ipv4(src=172.30.0.0/255.255.0.0,dst=169.254.169.2,proto=6,ttl=64,frag=no), packets:26, bytes:17159, used:2.170s, actions:ct_clear,set(eth(src=0c:42:a1:00:b6:9c,dst=52:54:00:a0:9d:83)),set(ipv4(ttl=63)),ct(zone=64001,nat),recirc(0x3450c) Zone 64001 recirc_id(0x3450c),in_port(ens8f0),eth(src=0c:42:a1:00:b6:9c,dst=52:54:00:a0:9d:83),eth_type(0x0800),ipv4(frag=no), packets:26, bytes:17159, used:2.170s, actions:set(eth(src=52:54:00:a0:9d:83,dst=0c:42:a1:00:b6:9c)),br-ex worker-3 - NOT working kernel: 4.18.0-330.el8.mr1127_210810_2027.x86_64 //------------------- ORIG DIRECTION ----------------------// recirc_id(0),in_port(br-ex),eth_type(0x0800),ipv4(dst=172.30.0.0/255.255.0.0,frag=no), packets:32, bytes:3752, used:2.840s, actions:ct(commit,zone=64001,nat(src=169.254.169.2)),recirc(0xe7) Zone 64001 ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x3),recirc_id(0xe7),in_port(br-ex),eth(src=0c:42:a1:08:0a:da,dst=52:54:00:a0:9d:83),eth_type(0x0800),ipv4(src=128.0.0.0/192.0.0.0,dst=172.30.0.1,proto=6,ttl=64,frag=no), packets:0, bytes:0, used:2.851s, actions:ct_clear,set(eth(dst=0c:42:a1:08:0a:da)),ct(zone=53),recirc(0xe8) recirc_id(0xe8),in_port(br-ex),ct_state(+new+trk),eth(),eth_type(0x0800),ipv4(dst=172.30.0.1,proto=6,frag=no),tcp(dst=443), packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x105) recirc_id(0x105),dp_hash(0x3/0xf),in_port(br-ex),eth(),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=53,label=0x2/0x2,nat(dst=192.168.111.20:6443)),recirc(0xea) Zone 53 ct_state(+new-est-rel-rpl-inv+trk),ct_label(0x2/0x3),recirc_id(0xea),in_port(br-ex),eth(src=0c:42:a1:08:0a:da,dst=0c:42:a1:08:0a:da),eth_type(0x0800),ipv4(dst=192.168.111.20,proto=6,ttl=64,frag=no), packets:0, bytes:0, used:2.850s, actions:set(eth(dst=00:8a:2e:2b:9d:d8)),set(ipv4(ttl=63)),ct(commit,nat(src=192.168.111.27)),recirc(0xeb) Zone 0 ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x3),recirc_id(0xeb),in_port(br-ex),eth(src=0c:42:a1:08:0a:da,dst=00:8a:2e:2b:9d:d8),eth_type(0x0800),ipv4(dst=192.0.0.0/192.0.0.0,frag=no), packets:0, bytes:0, used:2.850s, actions:ct_clear,ct(commit,zone=64000),ens8f0 Zone 64000 //------------------- REPLY DIRECTION ----------------------// recirc_id(0),in_port(ens8f0),eth_type(0x0800),ipv4(proto=6,frag=no), packets:13728, bytes:11573335, used:0.000s, actions:ct(zone=64000),recirc(0x7) Zone 64000 ct_state(+est-rel+rpl-inv+trk),ct_label(0/0x3),recirc_id(0x7),in_port(ens8f0),eth(src=00:8a:2e:2b:9d:d8,dst=0c:42:a1:08:0a:da),eth_type(0x0800),ipv4(src=192.168.111.16/255.255.255.248,dst=192.168.111.27,proto=6,ttl=64,frag=no), packets:9, bytes:8392, used:0.560s, actions:ct_clear,ct(nat),recirc(0x10) Zone 0 ct_state(+est+trk),recirc_id(0x10),in_port(ens8f0),eth_type(0x0800),ipv4(dst=0.0.0.0/128.0.0.0,frag=no), packets:0, bytes:0, used:1.820s, actions:ct(zone=53,nat),recirc(0xe1) Zone 53 ct_state(+est-rel+rpl-inv+trk),ct_label(0x2/0x3),recirc_id(0xe1),in_port(ens8f0),eth(src=00:8a:2e:2b:9d:d8,dst=0c:42:a1:08:0a:da),eth_type(0x0800),ipv4(src=172.30.0.0/255.255.0.0,dst=169.254.169.2,proto=6,ttl=64,frag=no), packets:8, bytes:8326, used:0.560s, actions:ct_clear,set(eth(src=0c:42:a1:08:0a:da,dst=52:54:00:a0:9d:83)),set(ipv4(ttl=63)),ct(zone=64001,nat),recirc(0xec) Zone 64001 recirc_id(0xec),in_port(ens8f0),eth(src=0c:42:a1:08:0a:da,dst=52:54:00:a0:9d:83),eth_type(0x0800),ipv4(frag=no), packets:8, bytes:416, used:2.840s, actions:set(eth(src=52:54:00:a0:9d:83,dst=0c:42:a1:08:0a:da)),br-ex CT comparison: worker-2 - working kernel: 4.18.0-322.el8.mr942_210708_1548.x86_64 Use local port 12345: curl --insecure --local-port 12345 https://172.30.0.1:443 ipv4 2 tcp 6 src=169.254.169.2 dst=192.168.111.22 sport=12345 dport=6443 src=192.168.111.22 dst=192.168.111.26 sport=6443 dport=12345 [OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=3 ipv4 2 tcp 6 src=169.254.169.2 dst=172.30.0.1 sport=12345 dport=443 src=192.168.111.22 dst=169.254.169.2 sport=6443 dport=12345 [OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=45 use=3 ipv4 2 tcp 6 src=192.168.111.26 dst=192.168.111.22 sport=12345 dport=6443 src=192.168.111.22 dst=192.168.111.26 sport=6443 dport=12345 [OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64000 use=3 ipv4 2 tcp 6 src=192.168.111.26 dst=172.30.0.1 sport=12345 dport=443 src=172.30.0.1 dst=169.254.169.2 sport=443 dport=12345 [OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64001 use=3 ipv4 2 tcp 6 8 CLOSE src=192.168.111.26 dst=172.30.0.1 sport=12345 dport=443 src=172.30.0.1 dst=192.168.111.26 sport=443 dport=12345 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 worker-3 - NOT working kernel: 4.18.0-330.el8.mr1127_210810_2027.x86_64 Use local port 12345: curl --insecure --local-port 12345 https://172.30.0.1:443 ipv4 2 tcp 6 src=192.168.111.27 dst=172.30.0.1 sport=12345 dport=443 src=172.30.0.1 dst=169.254.169.2 sport=443 dport=12345 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64001 use=3 ipv4 2 tcp 6 431996 ESTABLISHED src=192.168.111.27 dst=172.30.0.1 sport=12345 dport=443 src=172.30.0.1 dst=192.168.111.27 sport=443 dport=12345 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 ipv4 2 tcp 6 src=169.254.169.2 dst=192.168.111.20 sport=12345 dport=6443 src=192.168.111.20 dst=192.168.111.27 sport=6443 dport=12345 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=3 ipv4 2 tcp 6 src=169.254.169.2 dst=172.30.0.1 sport=12345 dport=443 src=192.168.111.20 dst=169.254.169.2 sport=6443 dport=12345 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=53 use=3 ipv4 2 tcp 6 src=192.168.111.27 dst=192.168.111.20 sport=12345 dport=6443 src=192.168.111.20 dst=192.168.111.27 sport=6443 dport=12345 [HW_OFFLOAD] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=64000 use=3
(In reply to Marcelo Ricardo Leitner from comment #43) > I'm afraid we either need another test or more debug info. I can revert one > of the patches to the previous version, the one on clearing CT info above, > and see how it goes. > > Or, we need dumps of datapath flows and so, like we had on comment #0, for > the good and the bad kernel, so we can spot differences. This one is more > promising. > > Zenghui, please let me know which one you prefer. Added comparisons for ovs datapath and CT between working node (worker-2) and non-working node (worker-3) in comment #45. Full ovs datapath flows are attached as tarball in comment #45: bz1961063/worker-3-flows-m.txt -- worker-3 ovs datapath dump with -m bz1961063/worker-3-flows-name.txt -- worker-3 ovs datapath dump with --names bz1961063/worker-2-flows-m.txt -- worker-2 ovs datapath dump with -m bz1961063/worker-2-flows-name.txt -- worker-3 ovs datapath dump with --names
The only difference I could spot so far between working and non-working setups is that on the non-working one, the CT entries are HW_OFFLOAD while on the working setup, they were just OFFLOAD (which is not in HW). This means that despite nearly all datapath flows being installed on dp:tc and (this one I didn't check one by one yet: ) offloaded:yes, they are not processed in HW because the ct action would be a miss on the very first ct action already. Between both kernels, the differences in tc and netfilter are not big and quite known. The effect above reminded me of 0cc254e5aa37 ("net/sched: act_ct: Offload connections with commit action") but a) there are no commits in the reply direction and b) it's present in the working kernel (bz1965817). Point being, on the changes in tc and netfilter, I can't tell one that could lead to such difference in the CT entries above. One thing I had overlooked is that there is also a major driver rebase between these two kernels. Maybe something in it. I'll rebase the non-working kernel on the same branch point as the working one, to minimize all changes to just these two commits, see how it goes and then go from there. So maybe we can say that the commits work, and something else broke it again, or maybe the new commits are simply not working well. What I don't fully understand, and don't like, in BOTH setups, is a weird CT entry on zone 0. Both seem to have an extra CT entry on it. Working: ipv4 2 tcp 6 8 CLOSE src=192.168.111.26 dst=172.30.0.1 sport=12345 dport=443 src=172.30.0.1 dst=192.168.111.26 sport=443 dport=12345 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 Non-working: ipv4 2 tcp 6 431996 ESTABLISHED src=192.168.111.27 dst=172.30.0.1 sport=12345 dport=443 src=172.30.0.1 dst=192.168.111.27 sport=443 dport=12345 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2 it seems an entry created by netfilter itself. It would be best if OVN could refrain from using zone 0.
(In reply to Marcelo Ricardo Leitner from comment #47) > One thing I had overlooked is that there is also a major driver rebase > between these two kernels. Maybe something in it. > I'll rebase the non-working kernel on the same branch point as the working > one, to minimize all changes to just these two commits, see how it goes and > then go from there. So maybe we can say that the commits work, and something > else broke it again, or maybe the new commits are simply not working well. There it goes: https://s3.upshift.redhat.com/DH-PROD-CKI/internal/356198025/repo-x86_64.repo https://s3.upshift.redhat.com/DH-PROD-CKI/internal/356198025/x86_64/4.18.0-333.el8.mr1196_210820_0050.x86_64 This one is based on the same branch point as the working one, and really the only difference is the 2 refreshed tc patches.
(In reply to Marcelo Ricardo Leitner from comment #48) > (In reply to Marcelo Ricardo Leitner from comment #47) > > One thing I had overlooked is that there is also a major driver rebase > > between these two kernels. Maybe something in it. > > I'll rebase the non-working kernel on the same branch point as the working > > one, to minimize all changes to just these two commits, see how it goes and > > then go from there. So maybe we can say that the commits work, and something > > else broke it again, or maybe the new commits are simply not working well. > > There it goes: > https://s3.upshift.redhat.com/DH-PROD-CKI/internal/356198025/repo-x86_64.repo > https://s3.upshift.redhat.com/DH-PROD-CKI/internal/356198025/x86_64/4.18.0- > 333.el8.mr1196_210820_0050.x86_64 This new kernel works. By just replacing the kernel to 4.18.0-333.el8.mr1196_210820_0050.x86_64, the issue disappeared. All other relevant components (ovn-k8s, ovn and ovs) stay unchanged in the testing. > > This one is based on the same branch point as the working one, and really > the only difference is the 2 refreshed tc patches.
That's good. Thanks Zenghui. So the 8.4.z's of those bugs should still be sound. We still need to understand what broke it in 8.5 and fix it, though.
I reviewed the commits between the good and bad branching points again and the few non-driver related doesn't seem even related to me. I know this flow is not supposed to be offloaded, but it really seems that the driver update affected it. Alaa, WDYT? One way of checking it is with: git log --oneline gl8u/merge-requests/1196..gl8u/merge-requests/1127 -- net/ drivers/net/ethernet/mellanox/mlx5 with 'gl8u' being your git remote for the gitlab rhel8 repository, and: [remote "gl8u"] url = git:redhat/rhel/src/kernel/rhel-8.git fetch = +refs/heads/*:refs/remotes/gl8u/* fetch = +refs/merge-requests/*/head:refs/remotes/gl8u/merge-requests/* MR 1196: test kernel (with backported upstream patches) using the original branch point, works. MR 1127: test kernel (with backported upstream patches) using a refreshed branch point, doesn't work.
Latest testing revealed that it is now working with latest 8.5 kernel and the 8.4.z MR kernel for bz1992230. With that, as discussed with Zenghui and Ariel, next steps here are to just wait the z-stream above be merged into an official build, re-test, make sure it works, and close this bz as CURRENTRELEASE. I'm taking the bz as a place holder till then. Thanks folks!
Zenghui confirmed that with the official build for bz1992230 this usecase is working in 8.4.z. As it is also working on 8.5.0, lets finally close this bz! Thanks folks. Side note that a new bug was found while trying this: https://bugzilla.redhat.com/show_bug.cgi?id=2014027 Some conntrack entries linger after the test.
*** Bug 1908570 has been marked as a duplicate of this bug. ***