Description of problem: I have a svc and a backend pod, usually curling the service from the backend pod works well (service ip gets dnatted to the pod's own ip), but if I add an allow ACL like this, hairpinning doesn't work anymore. _uuid : 9c9da6b5-e8f7-4c8a-a72f-a8ef9ca82acf action : allow direction : from-lport external_ids : {egressFirewall=efDefaultACL} label : 0 log : false match : "ip4.dst == 10.244.0.0/16" meter : acl-logging name : egressFirewallDefaultAllowClusterSubnets options : {apply-after-lb="true"} priority : 9001 severity : [] How reproducible: Always Steps to Reproduce: 1. add given ACL to clusterPortGroup 2. create a svc, and pod apiVersion: v1 kind: Pod metadata: name: webserver labels: backend: "true" spec: containers: - name: client image: k8s.gcr.io/e2e-test-images/agnhost:2.26 args: ["netexec", "--http-port=8083", "--udp-port=8081"] --- kind: Service apiVersion: v1 metadata: name: svc1 spec: ports: - name: http port: 80 protocol: TCP targetPort: 8083 selector: backend: "true" type: ClusterIP 3. run from webserver container curl http://<svc cluster ip>:80/hostname Actual results: times out Expected results: return "webserver" Additional info: Slack thread https://ovn-org.slack.com/archives/C010SQ5FSNL/p1656662501189769
Hi Nadia, could you please provide the OVN northbound database from the cluster where this error is occurring? Thanks! Also, just so we can see how it affects things, what happens if you remove "options:apply-after-lb=true" from the ACL? Does the traffic pass in this case?
Created attachment 1893982 [details] NB DB
Created attachment 1893983 [details] SB DB
Created attachment 1893984 [details] OVS DB
(In reply to Mark Michelson from comment #1) > Hi Nadia, could you please provide the OVN northbound database from the > cluster where this error is occurring? Thanks! > > Also, just so we can see how it affects things, what happens if you remove > "options:apply-after-lb=true" from the ACL? Does the traffic pass in this > case? I had tried out the DBs from Nadia's setup and indeed the problem is because the "apply-after-lb" ACL is executed before the ls_in_*_hairpin stages. Doing a diff of an ofproto/trace | ovn-detrace between the non-working and working case (after the ACL was removed we see): A. working: ----------- 19. ct_state=+new+trk,tcp,metadata=0x5,nw_dst=10.96.43.56,tp_dst=80, priority 120, cookie 0x36a94ce6 set_field:0/0x2000000000000000000000000->xxreg0 set_field:0xa602b380000000000000000/0xffffffff0000000000000000->xxreg0 set_field:0x5000000000/0xffff00000000->xxreg0 group:5 -> using bucket 0 bucket 0 ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=10.244.1.6:8083),exec(set_field:0x2/0x2->ct_mark)) nat(dst=10.244.1.6:8083) set_field:0x2/0x2->ct_mark -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 20. -> Sets the packet to an untracked state, and clears all the conntrack fields. * Logical datapaths: * "ovn-worker" (63ab56b7-59f7-42f6-b991-86462f39176a) [ingress] * "ovn-control-plane" (defc7dfa-c1b7-4cd9-90e1-0bab83d2f44e) [ingress] * "ovn-worker2" (f128ef00-66a0-477b-9aa2-9d72e26f2063) [ingress] * Logical flow: table=11 (ls_in_lb), priority=120, match=(ct.new && ip4.dst == 10.96.43.56 && tcp.dst == 80), actions=(reg0[1] = 0; reg1 = 10.96.43.56; reg2[0..15] = 80; ct_lb_mark(backends=10.244.1.6:8083);) * Load Balancer: Service_test/svc1_TCP_cluster protocol ['tcp'] vips {'10.96.43.56:80': '10.244.1.6:8083'} ip_port_mappings {} Final flow: recirc_id=0x4e,dp_hash=0x1,ct_state=new|trk,eth,tcp,reg0=0x285,reg1=0xa602b38,reg2=0x50,reg11=0x7,reg12=0x3,reg14=0x1,metadata=0x5,in_port=5,vlan_tci=0x0000,dl_src=0a:58:0a:f4:01:06,dl_dst=0a:58:0a:f4:01:01,nw_src=10.244.1.6,nw_dst=10.96.43.56,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=46880,tp_dst=80,tcp_flags=syn Megaflow: recirc_id=0x4e,dp_hash=0x1/0xf,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,tcp,in_port=5,dl_dst=0a:58:0a:f4:01:01,nw_dst=10.96.43.56,nw_frag=no,tp_dst=80 Datapath actions: ct(commit,mark=0x2/0x2,nat(dst=10.244.1.6:8083)),recirc(0x4f) =============================================================================== recirc(0x4f) - resume conntrack with default ct_state=trk|new (use --ct-next to customize) Replacing src/dst IP/ports to simulate NAT: Initial flow: nw_src=10.244.1.6,tp_src=46880,nw_dst=10.96.43.56,tp_dst=80 Modified flow: nw_src=10.244.1.6,tp_src=46880,nw_dst=10.244.1.6,tp_dst=8083 =============================================================================== Flow: recirc_id=0x4f,dp_hash=0x1,ct_state=new|trk,ct_mark=0x2,eth,tcp,reg0=0x285,reg1=0xa602b38,reg2=0x50,reg11=0x7,reg12=0x3,reg14=0x1,metadata=0x5,in_port=5,vlan_tci=0x0000,dl_src=0a:58:0a:f4:01:06,dl_dst=0a:58:0a:f4:01:01,nw_src=10.244.1.6,nw_dst=10.244.1.6,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=46880,tp_dst=8083,tcp_flags=syn bridge("br-int") ---------------- thaw Resuming from table 20 [..] 21. metadata=0x5, priority 0, cookie 0x3de23d56 resubmit(,22) * Logical datapaths: * "ext_ovn-worker2" (1f39c1c7-f5a0-4db0-a11d-037447aa46c5) [ingress] * "ext_ovn-control-plane" (5d7829d3-ede5-4a33-b105-6a857b4eec5a) [ingress] * "ovn-worker" (63ab56b7-59f7-42f6-b991-86462f39176a) [ingress] * "ext_ovn-worker" (c5b22ec7-2554-439a-8368-f335648f6327) [ingress] * "join" (d29aa4e7-b1ec-4e3e-ae31-9abf5405b195) [ingress] * "ovn-control-plane" (defc7dfa-c1b7-4cd9-90e1-0bab83d2f44e) [ingress] * "ovn-worker2" (f128ef00-66a0-477b-9aa2-9d72e26f2063) [ingress] * Logical flow: table=13 (ls_in_stateful), priority=0, match=(1), actions=(next;) 22. ct_state=+trk,ip,metadata=0x5, priority 100, cookie 0xd26533d4 set_field:0/0x80->reg10 resubmit(,68) * Logical datapaths: * "ovn-worker" (63ab56b7-59f7-42f6-b991-86462f39176a) [ingress] * "ovn-control-plane" (defc7dfa-c1b7-4cd9-90e1-0bab83d2f44e) [ingress] * "ovn-worker2" (f128ef00-66a0-477b-9aa2-9d72e26f2063) [ingress] * Logical flow: table=14 (ls_in_pre_hairpin), priority=100, match=(ip && ct.trk), actions=(reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); next;) 68. ct_mark=0x2/0x2,tcp,reg1=0xa602b38,reg2=0x50/0xffff,nw_src=10.244.1.6,nw_dst=10.244.1.6,tp_dst=8083, priority 100, cookie 0x11a30fba set_field:0x80/0x80->reg10 learn(table=69,delete_learned,cookie=0x11a30fba,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=10.96.43.56,nw_proto=6,NXM_OF_TCP_SRC[]=NXM_OF_TCP_DST[],load:0x1->NXM_NX_REG10[7]) -> table=69 tcp,metadata=0x5,nw_src=10.244.1.6,nw_dst=10.96.43.56,tp_src=8083 priority=32768 cookie=0x11a30fba actions=load:0x1->NXM_NX_REG10[7] move:NXM_NX_REG10[7]->NXM_NX_XXREG0[102] -> NXM_NX_XXREG0[102] is now 0x1 set_field:0/0x80->reg10 resubmit(,69) * Load Balancer: Service_test/svc1_TCP_cluster protocol ['tcp'] vips {'10.96.43.56:80': '10.244.1.6:8083'} [...] 23. ct_state=+new+trk,ip,reg0=0x40/0x40,metadata=0x5, priority 100, cookie 0xcf0a3420 resubmit(,70) * Logical datapaths: * "ovn-worker" (63ab56b7-59f7-42f6-b991-86462f39176a) [ingress] * "ovn-control-plane" (defc7dfa-c1b7-4cd9-90e1-0bab83d2f44e) [ingress] * "ovn-worker2" (f128ef00-66a0-477b-9aa2-9d72e26f2063) [ingress] * Logical flow: table=15 (ls_in_nat_hairpin), priority=100, match=(ip && ct.new && ct.trk && reg0[6] == 1), actions=(ct_snat_to_vip; next;) <<<< This enables LB Hairpin to work, ensuring return traffic comes back via OVN. 70. tcp,reg1=0xa602b38,reg2=0x50/0xffff, priority 100, cookie 0x11a30fba ct(commit,zone=NXM_NX_REG12[0..15],nat(src=10.96.43.56)) nat(src=10.96.43.56) -> Sets the packet to an untracked state, and clears all the conntrack fields. resubmit(,24) * Load Balancer: Service_test/svc1_TCP_cluster protocol ['tcp'] vips {'10.96.43.56:80': '10.244.1.6:8083'} B. non-working: --------------- 19. ct_state=+new+trk,tcp,metadata=0x5,nw_dst=10.96.43.56,tp_dst=80, priority 120, cookie 0x36a94ce6 set_field:0/0x2000000000000000000000000->xxreg0 set_field:0xa602b380000000000000000/0xffffffff0000000000000000->xxreg0 set_field:0x5000000000/0xffff00000000->xxreg0 group:5 -> using bucket 0 bucket 0 ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=10.244.1.6:8083),exec(set_field:0x2/0x2->ct_mark)) nat(dst=10.244.1.6:8083) set_field:0x2/0x2->ct_mark -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 20. -> Sets the packet to an untracked state, and clears all the conntrack fields. * Logical datapaths: * "ovn-worker" (63ab56b7-59f7-42f6-b991-86462f39176a) [ingress] * "ovn-control-plane" (defc7dfa-c1b7-4cd9-90e1-0bab83d2f44e) [ingress] * "ovn-worker2" (f128ef00-66a0-477b-9aa2-9d72e26f2063) [ingress] * Logical flow: table=11 (ls_in_lb), priority=120, match=(ct.new && ip4.dst == 10.96.43.56 && tcp.dst == 80), actions=(reg0[1] = 0; reg1 = 10.96.43.56; reg2[0..15] = 80; ct_lb_mark(backends=10.244.1.6:8083);) * Load Balancer: Service_test/svc1_TCP_cluster protocol ['tcp'] vips {'10.96.43.56:80': '10.244.1.6:8083'} ip_port_mappings {} Final flow: recirc_id=0x51,dp_hash=0x1,ct_state=new|trk,eth,tcp,reg0=0x285,reg1=0xa602b38,reg2=0x50,reg11=0x7,reg12=0x3,reg14=0x1,metadata=0x5,in_port=5,vlan_tci=0x0000,dl_src=0a:58:0a:f4:01:06,dl_dst=0a:58:0a:f4:01:01,nw_src=10.244.1.6,nw_dst=10.96.43.56,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=46880,tp_dst=80,tcp_flags=syn Megaflow: recirc_id=0x51,dp_hash=0x1/0xf,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,tcp,in_port=5,dl_dst=0a:58:0a:f4:01:01,nw_dst=10.96.43.56,nw_frag=no,tp_dst=80 Datapath actions: ct(commit,mark=0x2/0x2,nat(dst=10.244.1.6:8083)),recirc(0x52) =============================================================================== recirc(0x52) - resume conntrack with default ct_state=trk|new (use --ct-next to customize) Replacing src/dst IP/ports to simulate NAT: Initial flow: nw_src=10.244.1.6,tp_src=46880,nw_dst=10.96.43.56,tp_dst=80 Modified flow: nw_src=10.244.1.6,tp_src=46880,nw_dst=10.244.1.6,tp_dst=8083 =============================================================================== Flow: recirc_id=0x52,dp_hash=0x1,ct_state=new|trk,ct_mark=0x2,eth,tcp,reg0=0x285,reg1=0xa602b38,reg2=0x50,reg11=0x7,reg12=0x3,reg14=0x1,metadata=0x5,in_port=5,vlan_tci=0x0000,dl_src=0a:58:0a:f4:01:06,dl_dst=0a:58:0a:f4:01:01,nw_src=10.244.1.6,nw_dst=10.244.1.6,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=46880,tp_dst=8083,tcp_flags=syn bridge("br-int") ---------------- thaw Resuming from table 20 [...] 21. ip,reg0=0x2/0x2002,metadata=0x5, priority 100, cookie 0x98782521 ct(commit,zone=NXM_NX_REG13[0..15],nat(src),exec(set_field:0/0x1->ct_mark)) nat(src) set_field:0/0x1->ct_mark -> Sets the packet to an untracked state, and clears all the conntrack fields. resubmit(,22) * Logical datapaths: * "ext_ovn-worker2" (1f39c1c7-f5a0-4db0-a11d-037447aa46c5) [ingress] * "ext_ovn-control-plane" (5d7829d3-ede5-4a33-b105-6a857b4eec5a) [ingress] * "ovn-worker" (63ab56b7-59f7-42f6-b991-86462f39176a) [ingress] * "ext_ovn-worker" (c5b22ec7-2554-439a-8368-f335648f6327) [ingress] * "join" (d29aa4e7-b1ec-4e3e-ae31-9abf5405b195) [ingress] * "ovn-control-plane" (defc7dfa-c1b7-4cd9-90e1-0bab83d2f44e) [ingress] * "ovn-worker2" (f128ef00-66a0-477b-9aa2-9d72e26f2063) [ingress] * Logical flow: table=13 (ls_in_stateful), priority=100, match=(reg0[1] == 1 && reg0[13] == 0), actions=(ct_commit { ct_mark.blocked = 0; }; next;) <<< this is an additional commit compared to the working case [...] 22. metadata=0x5, priority 0, cookie 0xb0377ab3 resubmit(,23) * Logical datapaths: * "ext_ovn-worker2" (1f39c1c7-f5a0-4db0-a11d-037447aa46c5) [ingress] * "ext_ovn-control-plane" (5d7829d3-ede5-4a33-b105-6a857b4eec5a) [ingress] * "ovn-worker" (63ab56b7-59f7-42f6-b991-86462f39176a) [ingress] * "ext_ovn-worker" (c5b22ec7-2554-439a-8368-f335648f6327) [ingress] * "join" (d29aa4e7-b1ec-4e3e-ae31-9abf5405b195) [ingress] * "ovn-control-plane" (defc7dfa-c1b7-4cd9-90e1-0bab83d2f44e) [ingress] * "ovn-worker2" (f128ef00-66a0-477b-9aa2-9d72e26f2063) [ingress] * Logical flow: table=14 (ls_in_pre_hairpin), priority=0, match=(1), actions=(next;) <<< Here we don't hit the hairpin flow because ct_mark.natted is not set anymore! Even though the zone is the same we need to figure out why the second commit overwrites the mark completely. Alternatively, I think we could also see if it's possible to execute the "apply-after-lb" acls after the ls_in*hairpin stages.
upstream patch: https://patchwork.ozlabs.org/project/ovn/patch/f8f3be89a8a63c728eefc7988a5fbc861ad7d32a.1666730286.git.lorenzo.bianconi@redhat.com/
ovn22.12 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2162195
The problem occurs in the case of hairpin scenario in the presence of an acl with apply-after-lb=true. Reproduced the bug in: # rpm -qa | grep -E 'ovn|openvswitch' ovn22.12-central-22.12.0-10.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-29.el8fdp.noarch openvswitch2.17-2.17.0-74.el8fdp.x86_64 ovn22.12-22.12.0-10.el8fdp.x86_64 ovn22.12-host-22.12.0-10.el8fdp.x86_64 And verified in: # rpm -qa | grep -E 'ovn|openvswitch' ovn22.12-central-22.12.0-20.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-29.el8fdp.noarch openvswitch2.17-2.17.0-74.el8fdp.x86_64 ovn22.12-22.12.0-20.el8fdp.x86_64 ovn22.12-host-22.12.0-20.el8fdp.x86_64 Here is the reproducer: systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 systemctl start openvswitch ovs-vsctl set open . external_ids:system-id=hv1 # IP address configuration to physical interface #ifconfig ens1f0 42.42.42.1 netmask 255.0.0.0 ovs-vsctl set open . external_ids:ovn-remote=tcp:42.42.42.2:6642 ovs-vsctl set open . external_ids:ovn-encap-type=geneve ovs-vsctl set open . external_ids:ovn-encap-ip=42.42.42.2 systemctl restart ovn-controller ovn-nbctl lr-add R1 ovn-nbctl ls-add sw1 ovn-nbctl lsp-add sw1 sw11 -- lsp-set-addresses sw11 "f0:00:00:02:02:03 192.168.1.1 2002::1" ovn-nbctl lrp-add R1 rp-sw1 00:00:03:01:02:03 192.168.1.254/24 2002::254/64 ovn-nbctl lsp-add sw1 sw1-rp -- set Logical_Switch_Port sw1-rp type=router options:router-port=rp-sw1 -- lsp-set-addresses sw1-rp router ovs-vsctl add-port br-int sw11 -- set interface sw11 type=internal external_ids:iface-id=sw11 ip netns add sw11 ip link set sw11 netns sw11 ip netns exec sw11 ip link set sw11 address f0:00:00:02:02:03 ip netns exec sw11 ip link set sw11 up ip netns exec sw11 ip link set lo up ip netns exec sw11 ip addr add 192.168.1.1/24 dev sw11 ip netns exec sw11 ip route add default via 192.168.1.254 dev sw11 ip netns exec sw11 ip addr add 2002::1/64 dev sw11 ip netns exec sw11 ip -6 route add default via 2002::254 dev sw11 ovn-nbctl lb-add lb0 30.0.0.1 192.168.1.1 ovn-nbctl ls-lb-add sw1 lb0 ip netns exec sw11 nc -l 8000 -k & ip netns exec sw11 nc 30.0.0.1 8000 <<< hello ip netns exec sw12 nc 30.0.0.1 8000 <<< hello ip netns exec sw11 nc 192.168.1.1 8000 <<< hello ip netns exec sw12 nc 192.168.1.1 8000 <<< hello ovn-nbctl --apply-after-lb acl-add sw1 from-lport 1002 "ip" allow ovn-nbctl --wait=hv sync # Following is hairpin scenario which is failed in non-fixed version but works fine in fixed version ip netns exec sw11 nc 30.0.0.1 8000 <<< hello In the non-fixed version, the flows show that hairpin is executed after the lb stage, as shown below: ovn-sbctl dump-flows | grep -E "ls_.*hairpin|ls_.*after_lb" table=14(ls_in_acl_after_lb ), priority=2002 , match=(reg0[7] == 1 && (ip)), action=(reg0[1] = 1; next;) table=14(ls_in_acl_after_lb ), priority=2002 , match=(reg0[8] == 1 && (ip)), action=(next;) table=14(ls_in_acl_after_lb ), priority=0 , match=(1), action=(next;) table=16(ls_in_pre_hairpin ), priority=100 , match=(ip && ct.trk), action=(reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); next;) table=16(ls_in_pre_hairpin ), priority=0 , match=(1), action=(next;) table=17(ls_in_nat_hairpin ), priority=100 , match=(ip && ct.est && ct.trk && reg0[6] == 1), action=(ct_snat;) table=17(ls_in_nat_hairpin ), priority=100 , match=(ip && ct.new && ct.trk && reg0[6] == 1), action=(ct_snat_to_vip; next;) table=17(ls_in_nat_hairpin ), priority=90 , match=(ip && reg0[12] == 1), action=(ct_snat;) table=17(ls_in_nat_hairpin ), priority=0 , match=(1), action=(next;) table=18(ls_in_hairpin ), priority=1000 , match=(reg0[14] == 1), action=(next(pipeline=ingress, table=25);) table=18(ls_in_hairpin ), priority=1 , match=((reg0[6] == 1 || reg0[12] == 1)), action=(eth.dst <-> eth.src; outport = inport; flags.loopback = 1; output;) table=18(ls_in_hairpin ), priority=0 , match=(1), action=(next;) On the other hand, in the fixed version, it is noted that hairpin executes before the lb stage: table=14(ls_in_pre_hairpin ), priority=100 , match=(ip && ct.trk), action=(reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); next;) table=14(ls_in_pre_hairpin ), priority=0 , match=(1), action=(next;) table=15(ls_in_nat_hairpin ), priority=100 , match=(ip && ct.est && ct.trk && reg0[6] == 1), action=(ct_snat;) table=15(ls_in_nat_hairpin ), priority=100 , match=(ip && ct.new && ct.trk && reg0[6] == 1), action=(ct_snat_to_vip; next;) table=15(ls_in_nat_hairpin ), priority=90 , match=(ip && reg0[12] == 1), action=(ct_snat;) table=15(ls_in_nat_hairpin ), priority=0 , match=(1), action=(next;) table=16(ls_in_hairpin ), priority=1000 , match=(reg0[14] == 1), action=(next(pipeline=ingress, table=25);) table=16(ls_in_hairpin ), priority=1 , match=((reg0[6] == 1 || reg0[12] == 1)), action=(eth.dst <-> eth.src; outport = inport; flags.loopback = 1; output;) table=16(ls_in_hairpin ), priority=0 , match=(1), action=(next;) table=17(ls_in_acl_after_lb ), priority=65532, match=(reg0[17] == 1), action=(next;) table=17(ls_in_acl_after_lb ), priority=2002 , match=(reg0[7] == 1 && (ip)), action=(reg0[1] = 1; next;) table=17(ls_in_acl_after_lb ), priority=2002 , match=(reg0[8] == 1 && (ip)), action=(next;) table=17(ls_in_acl_after_lb ), priority=0 , match=(1), action=(next;)
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.