Description of problem: We saw an inconsistency in how egressIPs behave when fixing https://bugzilla.redhat.com/show_bug.cgi?id=2070929. Depending on where the pod is located (egressNode OR non-egressNode), the srcIP of the packet will either be egressIP or nodeIP. In addition to this, we also have a 101 policy created on the cluster-router for every nodeIP on the cluster which conveniently picks the first default primary IP off the node. What about other nodeIPs? We need to fix this to behave like 1004's. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
The behaviour is also different in different gateway modes for the traffic pod->diffnode. While in SGW depending on whether the pod is on egress node or non-egress node, the srcIP is either egressIP or nodeIP. In LGW, its always nodeIP. This traffic isn't considered as egressIP traffic. sh-5.1# ovn-trace --ct new 'inport=="egressip-7887_e2e-egressip-pod-1" && eth.src==0a:58:0a:f4:00:13 && eth.dst==0a:58:0a:f4:00:01 && ip4.src==10.244.0.19 && ip4.dst==172.18.0.2 && ip.ttl==64 && tcp && tcp.src==80 && tcp.dst==80' # tcp,reg14=0x6,vlan_tci=0x0000,dl_src=0a:58:0a:f4:00:13,dl_dst=0a:58:0a:f4:00:01,nw_src=10.244.0.19,nw_dst=172.18.0.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=80,tcp_flags=0 ingress(dp="ovn-control-plane", inport="egressip-7887_e2e-egressip-pod-1") -------------------------------------------------------------------------- 0. ls_in_port_sec_l2 (northd.c:5516): inport == "egressip-7887_e2e-egressip-pod-1" && eth.src == {0a:58:0a:f4:00:13}, priority 50, uuid f17158be next; 1. ls_in_port_sec_ip (northd.c:5149): inport == "egressip-7887_e2e-egressip-pod-1" && eth.src == 0a:58:0a:f4:00:13 && ip4.src == {10.244.0.19}, priority 90, uuid 59c5fa9b next; 5. ls_in_pre_acl (northd.c:5777): ip, priority 100, uuid 3c4e85db reg0[0] = 1; next; 6. ls_in_pre_lb (northd.c:5909): ip, priority 100, uuid 89621558 reg0[2] = 1; next; 7. ls_in_pre_stateful (northd.c:5936): reg0[2] == 1 && ip4 && tcp, priority 120, uuid 1927646d reg1 = ip4.dst; reg2[0..15] = tcp.dst; ct_lb; ct_lb ----- 8. ls_in_acl_hint (northd.c:6007): ct.new && !ct.est, priority 7, uuid 3f0319d1 reg0[7] = 1; reg0[9] = 1; next; 9. ls_in_acl (northd.c:6508): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid 467627c5 reg0[1] = 1; next; 14. ls_in_stateful (northd.c:6854): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 6e341e16 ct_commit { ct_label.blocked = 0; }; next; 15. ls_in_pre_hairpin (northd.c:6879): ip && ct.trk, priority 100, uuid 28b3543b reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); *** chk_lb_hairpin_reply action not implemented next; 24. ls_in_l2_lkup (northd.c:8370): eth.dst == 0a:58:0a:f4:00:01, priority 50, uuid 61105717 outport = "stor-ovn-control-plane"; output; egress(dp="ovn-control-plane", inport="egressip-7887_e2e-egressip-pod-1", outport="stor-ovn-control-plane") ----------------------------------------------------------------------------------------------------------- 0. ls_out_pre_lb (northd.c:5666): ip && outport == "stor-ovn-control-plane", priority 110, uuid af2ab016 next; 1. ls_out_pre_acl (northd.c:5666): ip && outport == "stor-ovn-control-plane", priority 110, uuid 2691e886 next; 3. ls_out_acl_hint (northd.c:6007): ct.new && !ct.est, priority 7, uuid e0561d4b reg0[7] = 1; reg0[9] = 1; next; 4. ls_out_acl (northd.c:6511): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid 37b679a7 reg0[1] = 1; next; 7. ls_out_stateful (northd.c:6858): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 7f437e3c ct_commit { ct_label.blocked = 0; }; next; 9. ls_out_port_sec_l2 (northd.c:5613): outport == "stor-ovn-control-plane", priority 50, uuid 13e0a299 output; /* output to "stor-ovn-control-plane", type "patch" */ ingress(dp="ovn_cluster_router", inport="rtos-ovn-control-plane") ----------------------------------------------------------------- 0. lr_in_admission (northd.c:10601): eth.dst == 0a:58:0a:f4:00:01 && inport == "rtos-ovn-control-plane" && is_chassis_resident("cr-rtos-ovn-control-plane"), priority 50, uuid 28f2eee6 xreg0[0..47] = 0a:58:0a:f4:00:01; next; 1. lr_in_lookup_neighbor (northd.c:10745): 1, priority 0, uuid 30eb11a0 reg9[2] = 1; next; 2. lr_in_learn_neighbor (northd.c:10754): reg9[2] == 1 || reg9[3] == 0, priority 100, uuid ec8d6391 next; 10. lr_in_ip_routing_pre (northd.c:11004): 1, priority 0, uuid d18586d7 reg7 = 0; next; 11. lr_in_ip_routing (northd.c:9517): ip4.src == 10.244.0.0/24, priority 72, uuid 632c144e ip.ttl--; reg8[0..15] = 0; reg0 = 10.244.0.2; reg1 = 10.244.0.1; eth.src = 0a:58:0a:f4:00:01; outport = "rtos-ovn-control-plane"; flags.loopback = 1; next; 12. lr_in_ip_routing_ecmp (northd.c:11079): reg8[0..15] == 0, priority 150, uuid 6c5046ad next; 13. lr_in_policy (northd.c:8750): ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.2/32, priority 101, uuid 8ce10805 reg8[0..15] = 0; next; 14. lr_in_policy_ecmp (northd.c:11214): reg8[0..15] == 0, priority 150, uuid 4f52831d next; 15. lr_in_arp_resolve (northd.c:11415): outport == "rtos-ovn-control-plane" && reg0 == 10.244.0.2, priority 100, uuid 47976c50 eth.dst = 3e:db:a1:63:b7:06; next; 18. lr_in_gw_redirect (northd.c:11814): outport == "rtos-ovn-control-plane", priority 50, uuid a1de3ff1 outport = "cr-rtos-ovn-control-plane"; next; 19. lr_in_arp_request (northd.c:11895): 1, priority 0, uuid 8d41b73f output; /* Replacing type "chassisredirect" outport "cr-rtos-ovn-control-plane" with distributed port "rtos-ovn-control-plane". */ egress(dp="ovn_cluster_router", inport="rtos-ovn-control-plane", outport="rtos-ovn-control-plane") -------------------------------------------------------------------------------------------------- 0. lr_out_chk_dnat_local (northd.c:13120): 1, priority 0, uuid 888127ac reg9[4] = 0; next; 6. lr_out_delivery (northd.c:11942): outport == "rtos-ovn-control-plane", priority 100, uuid 4875b1e2 output; /* output to "rtos-ovn-control-plane", type "patch" */ ingress(dp="ovn-control-plane", inport="stor-ovn-control-plane") ---------------------------------------------------------------- 0. ls_in_port_sec_l2 (northd.c:5516): inport == "stor-ovn-control-plane", priority 50, uuid aca7f8f6 next; 5. ls_in_pre_acl (northd.c:5663): ip && inport == "stor-ovn-control-plane", priority 110, uuid fc8809b8 next; 6. ls_in_pre_lb (northd.c:5663): ip && inport == "stor-ovn-control-plane", priority 110, uuid 348534c1 next; 8. ls_in_acl_hint (northd.c:6007): ct.new && !ct.est, priority 7, uuid 3f0319d1 reg0[7] = 1; reg0[9] = 1; next; 9. ls_in_acl (northd.c:6508): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid 467627c5 reg0[1] = 1; next; 14. ls_in_stateful (northd.c:6854): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 6e341e16 ct_commit { ct_label.blocked = 0; }; next; 15. ls_in_pre_hairpin (northd.c:6879): ip && ct.trk, priority 100, uuid 28b3543b reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); *** chk_lb_hairpin_reply action not implemented next; 24. ls_in_l2_lkup (northd.c:8299): eth.dst == 3e:db:a1:63:b7:06, priority 50, uuid 875c8e0f outport = "k8s-ovn-control-plane"; output; egress(dp="ovn-control-plane", inport="stor-ovn-control-plane", outport="k8s-ovn-control-plane") ------------------------------------------------------------------------------------------------ 0. ls_out_pre_lb (northd.c:5911): ip, priority 100, uuid 7873615a reg0[2] = 1; next; 1. ls_out_pre_acl (northd.c:5779): ip, priority 100, uuid aa3eb857 reg0[0] = 1; next; 2. ls_out_pre_stateful (northd.c:5956): reg0[2] == 1, priority 110, uuid 0f11bec7 ct_lb; ct_lb /* default (use --ct to customize) */ ------------------------------------------- 3. ls_out_acl_hint (northd.c:6060): ct.est && ct_label.blocked == 0, priority 1, uuid 19de4846 reg0[10] = 1; next; 9. ls_out_port_sec_l2 (northd.c:5613): outport == "k8s-ovn-control-plane", priority 50, uuid d60692bf output; /* output to "k8s-ovn-control-plane", type "" */ sh-5.1# ovn-nbctl lr-policy-list ovn_cluster_router Routing Policies 1004 inport == "rtos-ovn-control-plane" && ip4.dst == 172.18.0.3 /* ovn-control-plane */ reroute 10.244.0.2 1004 inport == "rtos-ovn-worker" && ip4.dst == 172.18.0.2 /* ovn-worker */ reroute 10.244.2.2 1004 inport == "rtos-ovn-worker2" && ip4.dst == 172.18.0.4 /* ovn-worker2 */ reroute 10.244.1.2 101 ip4.src == 10.244.0.0/16 && ip4.dst == 10.244.0.0/16 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 100.64.0.0/16 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.2/32 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.3/32 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.4/32 allow 100 ip4.src == 10.244.0.19 reroute 100.64.0.2
*** Bug 2090103 has been marked as a duplicate of this bug. ***
Hello Surya, Do you have any update on this BZ? Customer is asking for update.
Thanks for the needinfo, didn't notice we had a case on this bug. Looking into what's the issue for the customer..
Surya bit confused with your multiple comments .Is my understanding correct that this is correct BZ for customer issue ? By the way i created this BZ 2090103 and akaris closed it calling it as duplicate of this bug.Please correct me if i am missing anything .
No all good Manish. This bug needs a design decision and we are in the process of fixing on it. Please note that bug is medium priority so it will take some time for a definitive fix. We are very much working on it and this is indeed the correct bug.
update: we had a meeting within the team last week: https://docs.google.com/document/d/1s5kwKImltuZdFUWxEeHd0KkOCNy4agrF5FtMztK9aGk/edit Consensus is to make SGW like LGW in the one case of pod on egressNode - where it exhibits different bevaiour. Will be documented in the PR. Essentially OVNK will fix the problem user is facing and agree we shouldn't EIP SNAT traffic towards clusterIP service if the backend is within the cluster. NOTE: SDN and OVNK are two different plugins, so OVNK has a different definition of what egress means and we are trying to fix the bug in SDN where we went wrong. More details in the google doc.
Probably best way here is to replace 101 ip4.src == 10.244.0.0/16 && ip4.dst == 10.244.0.0/16 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 100.64.0.0/16 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.2/32 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.3/32 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.4/32 allow with 101 ip4.src == 10.244.0.0/16 && ip4.dst == 10.244.0.0/16 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 100.64.0.0/16 allow 101 ip4.src == $eipAS && inport == "rtos-node1" && ip4.dst ==$ovn-host-network_v4 reroute mp0-node1IPv4 101 ip4.src == $eipAS && inport == "rtos-node2" && ip4.dst ==$ovn-host-network_v4 reroute mp0-node2IPv4 101 ip4.src == $eipAS && inport == "rtos-node3" && ip4.dst ==$ovn-host-network_v4 reroute mp0-node3IPv4 same for v6. This is the only way we can ensure we don't change the traffic flows for non-egressIP pods. NOTE: we have the 1004 policy for hairpin traffic but I doubt we should touch that/mix this up with that, since we have an option where we disable egressIP 1004 inport == "rtos-ovn-control-plane" && ip4.dst == 172.18.0.3 /* ovn-control-plane */ reroute 10.244.1.2 1004 inport == "rtos-ovn-worker" && ip4.dst == 172.18.0.2 /* ovn-worker */ reroute 10.244.0.2 1004 inport == "rtos-ovn-worker2" && ip4.dst == 172.18.0.4 /* ovn-worker2 */ reroute 10.244.2.2
UPDATE: ip4.dst ==$ovn-host-network_v4 here ovn-host-network means a set of all nodeIPs in the cluster.
In case someone is wondering why this is bug is taking longer than intended, hitting some new bugs that is slowing the development process: https://bugzilla.redhat.com/show_bug.cgi?id=2106444 and https://bugzilla.redhat.com/show_bug.cgi?id=2108026. Worked around these. Hitting another new bug now which I am investigating...
Hello surya , Do you have any update on this Bug ?
While testing the PR out, we figured out we can't exactly do the policy based routing upon conntrack states, since in OVN pipeline we hold conntrack states locally, so upon pkt going into ovn_cluster_router form node switch, we clear the ct_state. So we need to find an alternate way of steering traffic.
A workaround to this bug is to use LGW mode, where this bug doesn't happen. It only effects SGW users.
Upstream Fix Merged: https://github.com/ovn-org/ovn-kubernetes/pull/3064 Downstream Merge Opened: https://github.com/openshift/ovn-kubernetes/pull/1493
https://github.com/openshift/ovn-kubernetes/pull/1496 downstream merge done, moving bug to MODIFIED.
Any update on this please ? Customer is asking for update
(In reply to Manish Pandey from comment #25) > Any update on this please ? Customer is asking for update Start working on its verification
Verified in 4.13.0-0.nightly-2023-02-07-064924 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-2023-02-07-064924 True False 133m Cluster version is 4.13.0-0.nightly-2023-02-07-064924 $ oc get node -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME jechen-0207b-4v2jp-master-0.c.openshift-qe.internal Ready control-plane,master 124m v1.26.0+9eb81c2 10.0.0.7 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8 jechen-0207b-4v2jp-master-1.c.openshift-qe.internal Ready control-plane,master 126m v1.26.0+9eb81c2 10.0.0.5 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8 jechen-0207b-4v2jp-master-2.c.openshift-qe.internal Ready control-plane,master 126m v1.26.0+9eb81c2 10.0.0.6 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8 jechen-0207b-4v2jp-worker-a-nq7br.c.openshift-qe.internal Ready worker 113m v1.26.0+9eb81c2 10.0.128.4 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8 jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal Ready worker 113m v1.26.0+9eb81c2 10.0.128.2 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8 jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal Ready worker 113m v1.26.0+9eb81c2 10.0.128.3 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8 # label worker node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal as egress-assignable node $ oc label node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal labeled # create egressip object $ cat config_egressip1_ovn_ns_team_blue_gcp.yaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip-blue spec: egressIPs: - 10.0.128.101 namespaceSelector: matchLabels: team: blue $ oc create -f config_egressip1_ovn_ns_team_blue_gcp.yaml egressip.k8s.ovn.org/egressip-blue created $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip-blue 10.0.128.101 jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal 10.0.128.101 # create a test namespace, and label it with same namespace selector as that in egressip-blue $ oc new-project test $ oc label ns test security.openshift.io/scc.podSecurityLabelSync=false pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/audit=privileged pod-security.kubernetes.io/warn=privileged --overwrite namespace/test labeled $ oc label ns test team=blue namespace/test labeled # create hostnetwork pod and service in the test namespace $ cat hostnework-pod.yaml --- apiVersion: v1 kind: List items: - kind: Pod apiVersion: v1 metadata: name: hostnetwork-pod labels: name: hostnetwork-pod spec: containers: - name: hostnetwork-pod image: quay.io/openshifttest/hello-sdn@sha256:c89445416459e7adea9a5a416b3365ed3d74f2491beb904d61dc8d1eb89a72a4 hostNetwork: true - apiVersion: v1 kind: Service metadata: labels: name: test-service name: test-service spec: ports: - name: http port: 27017 protocol: TCP targetPort: 8080 selector: name: hostnetwork-pod type: NodePort $ oc create -f hostnework-pod.yaml pod/hostnetwork-pod created # create some tests pod in the test namespace $ oc get all -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/hostnetwork-pod 1/1 Running 0 9m32s 10.0.128.2 jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal <none> <none> pod/test-rc-b2vzk 1/1 Running 0 72m 10.128.2.15 jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal <none> <none> pod/test-rc-bc27l 1/1 Running 0 72m 10.129.2.17 jechen-0207b-4v2jp-worker-a-nq7br.c.openshift-qe.internal <none> <none> pod/test-rc-bhvfs 1/1 Running 0 72m 10.131.0.17 jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal <none> <none> pod/test-rc-mqb2m 1/1 Running 0 72m 10.131.0.18 jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal <none> <none> NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR replicationcontroller/test-rc 4 4 4 72m test-pod quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95 name=test-pods NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/test-service NodePort 172.30.79.29 <none> 27017:31459/TCP 9m32s name=hostnetwork-pod # on a separate terminal, enable on the tcpdump on egress node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal $ oc debug node/jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal Starting pod/jechen-0207b-4v2jp-worker-c-tl744copenshift-qeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.128.3 If you don't see a command prompt, try pressing enter. sh-4.4# tcpdump -n -i any -nneep "(src port 31661 and dst port 31459) or (src port 31459 and dst port 31661)" dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes # from test pod test-rc-b2vzk that is on egress node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal, curl the hostnetwork pod's ip with port number $ oc rsh test-rc-b2vzk ~ $ curl --local-port 31661 10.0.128.2:31459 Hello OpenShift! #Verified the fix in the following tcpdump on egress node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal $ oc debug node/jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal Starting pod/jechen-0207b-4v2jp-worker-c-tl744copenshift-qeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.128.3 If you don't see a command prompt, try pressing enter. sh-4.4# tcpdump -n -i any -nneep "(src port 31661 and dst port 31459) or (src port 31459 and dst port 31661)" dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes 20:56:07.074110 P 0a:58:0a:80:02:0f ethertype IPv4 (0x0800), length 76: 10.128.2.15.31661 > 10.0.128.2.31459: Flags [S], seq 2082181755, win 26400, options [mss 1320,sackOK,TS val 2065482948 ecr 0,nop,wscale 7], length 0 20:56:07.075185 Out 42:01:0a:00:80:03 ethertype IPv4 (0x0800), length 76: 10.0.128.3.31661 > 10.0.128.2.31459: Flags [S], seq 2082181755, win 26400, options [mss 1320,sackOK,TS val 2065482948 ecr 0,nop,wscale 7], length 0 20:56:07.077916 In 42:01:0a:00:80:01 ethertype IPv4 (0x0800), length 76: 10.0.128.2.31459 > 10.0.128.3.31661: Flags [S.], seq 1636338806, ack 2082181756, win 26160, options [mss 1320,sackOK,TS val 1945200435 ecr 2065482948,nop,wscale 7], length 0 20:56:07.078851 Out 0a:58:0a:80:02:01 ethertype IPv4 (0x0800), length 76: 10.0.128.2.31459 > 10.128.2.15.31661: Flags [S.], seq 1636338806, ack 2082181756, win 26160, options [mss 1320,sackOK,TS val 1945200435 ecr 2065482948,nop,wscale 7], length 0 20:56:07.078906 P 0a:58:0a:80:02:0f ethertype IPv4 (0x0800), length 68: 10.128.2.15.31661 > 10.0.128.2.31459: Flags [.], ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 0 20:56:07.078975 P 0a:58:0a:80:02:0f ethertype IPv4 (0x0800), length 148: 10.128.2.15.31661 > 10.0.128.2.31459: Flags [P.], seq 1:81, ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 80 20:56:07.079503 Out 42:01:0a:00:80:03 ethertype IPv4 (0x0800), length 68: 10.0.128.3.31661 > 10.0.128.2.31459: Flags [.], ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 0 20:56:07.079563 Out 42:01:0a:00:80:03 ethertype IPv4 (0x0800), length 148: 10.0.128.3.31661 > 10.0.128.2.31459: Flags [P.], seq 1:81, ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 80 20:56:07.080127 In 42:01:0a:00:80:01 ethertype IPv4 (0x0800), length 68: 10.0.128.2.31459 > 10.0.128.3.31661: Flags [.], ack 81, win 205, options [nop,nop,TS val 1945200438 ecr 2065482953], length 0 20:56:07.080173 Out 0a:58:0a:80:02:01 ethertype IPv4 (0x0800), length 68: 10.0.128.2.31459 > 10.128.2.15.31661: Flags [.], ack 81, win 205, options [nop,nop,TS val 1945200438 ecr 2065482953], length 0 ==> packet coming out from the test pod 10.128.2.15 on egress node (jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal) got NATed to its nodeIP 10.0.128.3, not egressIP address 10.0.128.101
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:1326