Hi @dceara, can you please help share the reproduce steps or methods of how to verify this bug?
@yingwang I think @trozet can help us out and point to an OCP test case that exercises the ECMP symmetric reply configuration. For the rest, it should be enough to provision a cluster service and then access that service from ovn networked and host networked pods and verify that while traffic is running there are no OVS datapath flows that match on a masked value of ct_label. To dump the OVS datapath flows on a node you can (for example): oc exec -n openshift-ovn ovnkube-node-xyzt ovs-appctl dpctl/dump-flows
@trozet can you help guide @yingwang on how to reproduce the issue and verify?
@yingwang you can exercise ECMP symmetric reply by doing the following: 1. Make sure you have at least 1 external IP outside of your cluster you can use to send traffic from on the node subnet. Let's just say this is IP 1.1.1.1. 2. Create a namespace. Annotate the namespace with the following: k8s.ovn.org/routing-external-gws: 1.1.1.1 This will indicate that OVN should have a configured ecmp route with symmetric reply on the gateway router where pods in this namespace live. 3. Create a pod in this namespace. 4. Add a route to your external IP server for the pod IP via the node IP where the pod lives. 5. send traffic to the pod IP from 1.1.1.1 that requires a reply. 6. When the pod replies the traffic should be returned to 1.1.1.1 (symmetric reply). Check the flows as Dumitru mentioned ot make sure there are no ct_label matching flows.
Hi @trozet, @dceara, Thank you very much for sharing the verify steps. I verified on version 4.12.0-0.nightly-2022-10-25-121937, created ecmp route following the steps above, ping from external ip to pod and checked flows. I didn't see ct_label matched flows. Please let me know if anything wrong. # oc version Client Version: 4.12.0-ec.5 Kustomize Version: v4.5.7 Server Version: 4.12.0-0.nightly-2022-10-25-121937 Kubernetes Version: v1.25.2+4bd0702 # oc exec -n openshift-ovn-kubernetes -it ovnkube-master-ghkh4 -- ovn-nbctl --no-leader-only lr-route-list GR_dell-per740-14.rhts.eng.pek2.redhat.com Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker IPv4 Routes Route Table <main>: 10.128.2.47 10.73.116.56 src-ip rtoe-GR_dell-per740-14.rhts.eng.pek2.redhat.com ecmp-symmetric-reply 10.128.0.0/14 100.64.0.1 dst-ip 0.0.0.0/0 10.73.117.254 dst-ip rtoe-GR_dell-per740-14.rhts.eng.pek2.redhat.com # oc exec -n openshift-ovn-kubernetes ovnkube-node-9q7rs ovs-appctl dpctl/dump-flows | grep 10.128.2.47 kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy, kube-rbac-proxy-ovn-metrics, ovnkube-node recirc_id(0x28ba6),in_port(1),ct_state(-new+est-rel-rpl-inv+trk),ct_mark(0/0x3),eth(src=0a:58:64:40:00:06,dst=0a:58:64:40:00:01),eth_type(0x0800),ipv4(src=10.73.116.56/255.255.255.252,dst=10.128.2.47,proto=1,ttl=63,frag=no), packets:87, bytes:8526, used:5.539s, actions:ct_clear,set(eth(src=0a:58:0a:80:02:01,dst=0a:58:0a:80:02:2f)),set(ipv4(ttl=62)),ct(zone=23,nat),recirc(0x2cd96) recirc_id(0x2cd97),in_port(18),ct_state(-new+est-rel+rpl-inv+trk),ct_mark(0/0x3),eth(src=0a:58:0a:80:02:2f,dst=0a:58:0a:80:02:01),eth_type(0x0800),ipv4(src=10.128.2.47,dst=10.73.116.56/255.255.255.254,proto=1,ttl=64,frag=no), packets:88, bytes:8624, used:5.543s, actions:ct_clear,set(eth(src=e4:43:4b:5b:6c:28,dst=dc:f4:01:e7:81:44)),set(ipv4(ttl=62)),ct(zone=9,nat),recirc(0x2cd98) recirc_id(0x2cd96),in_port(1),ct_state(-new+est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=0a:58:0a:80:02:01,dst=0a:58:0a:80:02:2f),eth_type(0x0800),ipv4(src=10.0.0.0/255.128.0.0,dst=10.128.2.47,frag=no), packets:87, bytes:8526, used:5.543s, actions:18 recirc_id(0),in_port(18),ct_state(-new-est-trk),ct_mark(0/0x2),eth(src=0a:58:0a:80:02:2f,dst=0a:58:0a:80:02:01),eth_type(0x0800),ipv4(src=10.128.2.47,dst=0.0.0.0/128.0.0.0,proto=1,frag=no), packets:88, bytes:8624, used:5.543s, actions:ct(zone=23,nat),recirc(0x2cd97) # oc exec -n openshift-ovn-kubernetes ovnkube-node-9q7rs ovs-appctl dpctl/dump-flows | grep 10.73.116.56 kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy, kube-rbac-proxy-ovn-metrics, ovnkube-node recirc_id(0x6),in_port(1),ct_state(-new+est+trk),ct_mark(0x1),eth(src=e4:43:4b:5b:95:a4,dst=e4:43:4b:5b:6c:28),eth_type(0x0800),ipv4(src=10.73.116.56/255.255.255.252,dst=10.73.116.62,proto=6,ttl=64,frag=no), packets:161161, bytes:399092543, used:0.503s, flags:SFPR., actions:check_pkt_len(size=1414,gt(2),le(ct(nat),recirc(0x28ba5))) recirc_id(0x28ba6),in_port(1),ct_state(-new+est-rel+rpl-inv+trk),ct_mark(0/0x3),eth(src=0a:58:64:40:00:06,dst=0a:58:64:40:00:01),eth_type(0x0800),ipv4(src=10.73.116.56/255.255.255.252,dst=10.128.2.38,proto=6,ttl=63,frag=no), packets:150180, bytes:390939527, used:1.572s, flags:SFP., actions:ct_clear,set(eth(src=0a:58:0a:80:02:01,dst=0a:58:0a:80:02:26)),set(ipv4(ttl=62)),ct(zone=12,nat),recirc(0x28be4) recirc_id(0x28ba6),in_port(1),ct_state(-new+est-rel+rpl-inv+trk),ct_mark(0/0x3),eth(src=0a:58:64:40:00:06,dst=0a:58:64:40:00:01),eth_type(0x0800),ipv4(src=10.73.116.56/255.255.255.252,dst=10.128.2.7,proto=6,ttl=63,frag=no), packets:0, bytes:0, used:never, actions:ct_clear,set(eth(src=0a:58:0a:80:02:01,dst=0a:58:0a:80:02:07)),set(ipv4(ttl=62)),ct(zone=14,nat),recirc(0x2dac2) recirc_id(0x28ba6),in_port(1),ct_state(-new+est-rel+rpl-inv+trk),ct_mark(0/0x3),eth(src=0a:58:64:40:00:06,dst=0a:58:64:40:00:01),eth_type(0x0800),ipv4(src=10.73.116.56/255.255.255.252,dst=10.128.2.40,proto=6,ttl=63,frag=no), packets:21, bytes:6668, used:2.849s, flags:P., actions:ct_clear,set(eth(src=0a:58:0a:80:02:01,dst=0a:58:0a:80:02:28)),set(ipv4(ttl=62)),ct(zone=19,nat),recirc(0x2dab0) recirc_id(0x28ba6),in_port(1),ct_state(-new+est-rel+rpl-inv+trk),ct_mark(0/0x3),eth(src=0a:58:64:40:00:06,dst=0a:58:64:40:00:01),eth_type(0x0800),ipv4(src=10.73.116.56/255.255.255.252,dst=10.128.2.41,proto=6,ttl=63,frag=no), packets:1, bytes:230, used:0.512s, flags:P., actions:ct_clear,set(eth(src=0a:58:0a:80:02:01,dst=0a:58:0a:80:02:29)),set(ipv4(ttl=62)),ct(zone=18,nat),recirc(0x2dad0)
Also tried curl traffic from external to pod. # oc exec -n openshift-ovn-kubernetes ovnkube-node-j9mm2 ovs-appctl dpctl/dump-flows | grep 10.131.0.127 kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy, kube-rbac-proxy-ovn-metrics, ovnkube-node recirc_id(0xd9),in_port(1),ct_state(-new+est-rel-rpl-inv+trk),ct_mark(0/0x3),eth(src=0a:58:64:40:00:05,dst=0a:58:64:40:00:01),eth_type(0x0800),ipv4(src=10.73.116.56/255.255.255.248,dst=10.131.0.127,proto=6,ttl=63,frag=no), packets:3, bytes:198, used:4.416s, flags:F., actions:ct_clear,set(eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:7f)),set(ipv4(ttl=62)),ct(zone=28,nat),recirc(0x45799) recirc_id(0xd9),in_port(1),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x3),eth(src=0a:58:64:40:00:05,dst=0a:58:64:40:00:01),eth_type(0x0800),ipv4(src=10.73.116.56/255.255.255.248,dst=10.131.0.127,proto=6,ttl=63,frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=12,nat(src)),set(eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:7f)),set(ipv4(ttl=62)),ct(zone=28,nat),recirc(0x45799) recirc_id(0),in_port(21),ct_state(-new-est-trk),ct_mark(0/0x2),eth(src=0a:58:0a:83:00:7f,dst=0a:58:0a:83:00:01),eth_type(0x0800),ipv4(src=10.131.0.127,dst=0.0.0.0/128.0.0.0,proto=6,frag=no), packets:3, bytes:332, used:4.418s, flags:FP., actions:ct(zone=28,nat),recirc(0x4579a) recirc_id(0x45799),in_port(1),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:7f),eth_type(0x0800),ipv4(src=10.0.0.0/255.128.0.0,dst=10.131.0.127,frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=28,mark=0/0x1,nat(src)),21 recirc_id(0x45799),in_port(1),ct_state(-new+est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=0a:58:0a:83:00:01,dst=0a:58:0a:83:00:7f),eth_type(0x0800),ipv4(src=10.0.0.0/255.128.0.0,dst=10.131.0.127,frag=no), packets:3, bytes:198, used:4.421s, flags:F., actions:21 recirc_id(0x4579a),in_port(21),ct_state(-new+est-rel+rpl-inv+trk),ct_mark(0/0x3),eth(src=0a:58:0a:83:00:7f,dst=0a:58:0a:83:00:01),eth_type(0x0800),ipv4(src=10.131.0.127,dst=10.73.116.56/255.255.255.254,proto=6,ttl=64,frag=no), packets:3, bytes:332, used:4.422s, flags:FP., actions:ct_clear,set(eth(src=dc:f4:01:e7:5d:84,dst=dc:f4:01:e7:81:44)),set(ipv4(ttl=62)),ct(zone=12,nat),recirc(0x4579b)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399