Description of problem: When using check_pkt_larger action on an older kernel where the datapath doesn't support the action, userspace will handle the packet. However, we see that the packet never actually makes it to the next action which is to be sent to conntrack. Version-Release number of selected component (if applicable): openvswitch2.13-2.13.0-90.el7fdp.x86_64 RHEL 7.9 sh-4.2# uname -a Linux ip-10-0-58-75.us-east-2.compute.internal 3.10.0-1160.31.1.el7.x86_64 #1 SMP Wed May 26 20:18:08 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux The scenario here is OVN with OCP where we have a packet arriving at a node destined to be DNAT'ed by OVN after it passes through the OCP bridge br-ex. So topology is like this: client ----> eth0---br-ex----br-int--- pod The client sends an http request towards the node IP of 10.0.55.195 port 32721. br-ex contains a flow: cookie=0xa360a49979cae3e0, duration=1397.161s, table=0, n_packets=0, n_bytes=0, idle_age=16891, priority=100,tcp,in_port=1,tp_dst=32721 actions=check_pkt_larger(8915)->NXM_NX_RE G0[0],resubmit(,11) the check_pkt_larger will cause the packet to be punted to userspace, and the following flow in OVN will send it to CT zone 29: 12. ip,metadata=0x19,nw_dst=10.0.55.195, priority 100, cookie 0xc0e55757 ct(table=13,zone=NXM_NX_REG11[0..15]) drop -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 13. -> Sets the packet to an untracked state, and clears all the conntrack fields. Final flow: unchanged Megaflow: recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_label=0/0x3,eth,tcp,in_port=1,dl_src=02:3b:8f:03:17:04,dl_dst=02:18:09:84:de:de,nw_src=10.0.64.0/18,nw_dst=10.0.55.195,nw_ttl=64,nw_frag=no,tp_dst=32721 Datapath actions: ct_clear,ct(zone=29),recirc(0x70) This flow is handled by the userspace slow path because it: - Uses action(s) not supported by datapath. =============================================================================== recirc(0x70) - resume conntrack with default ct_state=trk|new (use --ct-next to customize) =============================================================================== Flow: recirc_id=0x70,dp_hash=0x1,ct_state=new|trk,ct_zone=29,eth,tcp,reg0=0x218,reg1=0x984dede,reg9=0x4,reg11=0x1d,reg13=0x21,reg14=0x2,metadata=0x19,in_port=9,vlan_tci=0x0000,dl_src=02:3b:8f:03:17:04,dl_dst=02:18:09:84:de:de,nw_src=10.0.74.171,nw_dst=10.0.55.195,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=2222,tp_dst=32721,tcp_flags=0 However the packet never hits kernel conntrack. We see no errors incremented in conntrack, no event hits the log, and no conntrack entry. If we override the flow in br-ex to not do the check_pkt_length check, the traffic works as expected.
Created attachment 1791919 [details] output showing the datapath and ofproto traces
I think the urgency of this bug is reduced because we've decided to detect if OVS supports the action or not in datapath using ovn-kubernetes and then disable those flows if not: https://github.com/ovn-org/ovn-kubernetes/pull/2267
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days