Bug 2078222
| Summary: | egressIPs behave inconsistently towards in-cluster traffic (hosts and services backed by host-networked pods) | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Surya Seetharaman <surya> |
| Component: | Networking | Assignee: | Surya Seetharaman <surya> |
| Networking sub component: | ovn-kubernetes | QA Contact: | jechen <jechen> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | akaris, bmehra, jechen, mapandey, talessio |
| Version: | 4.8 | Flags: | surya:
needinfo-
mapandey: needinfo- |
| Target Milestone: | --- | ||
| Target Release: | 4.13.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-17 22:46:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2125247 | ||
| Bug Blocks: | |||
|
Description
Surya Seetharaman
2022-04-24 14:24:43 UTC
The behaviour is also different in different gateway modes for the traffic pod->diffnode. While in SGW depending on whether the pod is on egress node or non-egress node, the srcIP is either egressIP or nodeIP. In LGW, its always nodeIP. This traffic isn't considered as egressIP traffic.
sh-5.1# ovn-trace --ct new 'inport=="egressip-7887_e2e-egressip-pod-1" && eth.src==0a:58:0a:f4:00:13 && eth.dst==0a:58:0a:f4:00:01 && ip4.src==10.244.0.19 && ip4.dst==172.18.0.2 && ip.ttl==64 && tcp && tcp.src==80 && tcp.dst==80'
# tcp,reg14=0x6,vlan_tci=0x0000,dl_src=0a:58:0a:f4:00:13,dl_dst=0a:58:0a:f4:00:01,nw_src=10.244.0.19,nw_dst=172.18.0.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=80,tcp_flags=0
ingress(dp="ovn-control-plane", inport="egressip-7887_e2e-egressip-pod-1")
--------------------------------------------------------------------------
0. ls_in_port_sec_l2 (northd.c:5516): inport == "egressip-7887_e2e-egressip-pod-1" && eth.src == {0a:58:0a:f4:00:13}, priority 50, uuid f17158be
next;
1. ls_in_port_sec_ip (northd.c:5149): inport == "egressip-7887_e2e-egressip-pod-1" && eth.src == 0a:58:0a:f4:00:13 && ip4.src == {10.244.0.19}, priority 90, uuid 59c5fa9b
next;
5. ls_in_pre_acl (northd.c:5777): ip, priority 100, uuid 3c4e85db
reg0[0] = 1;
next;
6. ls_in_pre_lb (northd.c:5909): ip, priority 100, uuid 89621558
reg0[2] = 1;
next;
7. ls_in_pre_stateful (northd.c:5936): reg0[2] == 1 && ip4 && tcp, priority 120, uuid 1927646d
reg1 = ip4.dst;
reg2[0..15] = tcp.dst;
ct_lb;
ct_lb
-----
8. ls_in_acl_hint (northd.c:6007): ct.new && !ct.est, priority 7, uuid 3f0319d1
reg0[7] = 1;
reg0[9] = 1;
next;
9. ls_in_acl (northd.c:6508): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid 467627c5
reg0[1] = 1;
next;
14. ls_in_stateful (northd.c:6854): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 6e341e16
ct_commit { ct_label.blocked = 0; };
next;
15. ls_in_pre_hairpin (northd.c:6879): ip && ct.trk, priority 100, uuid 28b3543b
reg0[6] = chk_lb_hairpin();
reg0[12] = chk_lb_hairpin_reply();
*** chk_lb_hairpin_reply action not implemented
next;
24. ls_in_l2_lkup (northd.c:8370): eth.dst == 0a:58:0a:f4:00:01, priority 50, uuid 61105717
outport = "stor-ovn-control-plane";
output;
egress(dp="ovn-control-plane", inport="egressip-7887_e2e-egressip-pod-1", outport="stor-ovn-control-plane")
-----------------------------------------------------------------------------------------------------------
0. ls_out_pre_lb (northd.c:5666): ip && outport == "stor-ovn-control-plane", priority 110, uuid af2ab016
next;
1. ls_out_pre_acl (northd.c:5666): ip && outport == "stor-ovn-control-plane", priority 110, uuid 2691e886
next;
3. ls_out_acl_hint (northd.c:6007): ct.new && !ct.est, priority 7, uuid e0561d4b
reg0[7] = 1;
reg0[9] = 1;
next;
4. ls_out_acl (northd.c:6511): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid 37b679a7
reg0[1] = 1;
next;
7. ls_out_stateful (northd.c:6858): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 7f437e3c
ct_commit { ct_label.blocked = 0; };
next;
9. ls_out_port_sec_l2 (northd.c:5613): outport == "stor-ovn-control-plane", priority 50, uuid 13e0a299
output;
/* output to "stor-ovn-control-plane", type "patch" */
ingress(dp="ovn_cluster_router", inport="rtos-ovn-control-plane")
-----------------------------------------------------------------
0. lr_in_admission (northd.c:10601): eth.dst == 0a:58:0a:f4:00:01 && inport == "rtos-ovn-control-plane" && is_chassis_resident("cr-rtos-ovn-control-plane"), priority 50, uuid 28f2eee6
xreg0[0..47] = 0a:58:0a:f4:00:01;
next;
1. lr_in_lookup_neighbor (northd.c:10745): 1, priority 0, uuid 30eb11a0
reg9[2] = 1;
next;
2. lr_in_learn_neighbor (northd.c:10754): reg9[2] == 1 || reg9[3] == 0, priority 100, uuid ec8d6391
next;
10. lr_in_ip_routing_pre (northd.c:11004): 1, priority 0, uuid d18586d7
reg7 = 0;
next;
11. lr_in_ip_routing (northd.c:9517): ip4.src == 10.244.0.0/24, priority 72, uuid 632c144e
ip.ttl--;
reg8[0..15] = 0;
reg0 = 10.244.0.2;
reg1 = 10.244.0.1;
eth.src = 0a:58:0a:f4:00:01;
outport = "rtos-ovn-control-plane";
flags.loopback = 1;
next;
12. lr_in_ip_routing_ecmp (northd.c:11079): reg8[0..15] == 0, priority 150, uuid 6c5046ad
next;
13. lr_in_policy (northd.c:8750): ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.2/32, priority 101, uuid 8ce10805
reg8[0..15] = 0;
next;
14. lr_in_policy_ecmp (northd.c:11214): reg8[0..15] == 0, priority 150, uuid 4f52831d
next;
15. lr_in_arp_resolve (northd.c:11415): outport == "rtos-ovn-control-plane" && reg0 == 10.244.0.2, priority 100, uuid 47976c50
eth.dst = 3e:db:a1:63:b7:06;
next;
18. lr_in_gw_redirect (northd.c:11814): outport == "rtos-ovn-control-plane", priority 50, uuid a1de3ff1
outport = "cr-rtos-ovn-control-plane";
next;
19. lr_in_arp_request (northd.c:11895): 1, priority 0, uuid 8d41b73f
output;
/* Replacing type "chassisredirect" outport "cr-rtos-ovn-control-plane" with distributed port "rtos-ovn-control-plane". */
egress(dp="ovn_cluster_router", inport="rtos-ovn-control-plane", outport="rtos-ovn-control-plane")
--------------------------------------------------------------------------------------------------
0. lr_out_chk_dnat_local (northd.c:13120): 1, priority 0, uuid 888127ac
reg9[4] = 0;
next;
6. lr_out_delivery (northd.c:11942): outport == "rtos-ovn-control-plane", priority 100, uuid 4875b1e2
output;
/* output to "rtos-ovn-control-plane", type "patch" */
ingress(dp="ovn-control-plane", inport="stor-ovn-control-plane")
----------------------------------------------------------------
0. ls_in_port_sec_l2 (northd.c:5516): inport == "stor-ovn-control-plane", priority 50, uuid aca7f8f6
next;
5. ls_in_pre_acl (northd.c:5663): ip && inport == "stor-ovn-control-plane", priority 110, uuid fc8809b8
next;
6. ls_in_pre_lb (northd.c:5663): ip && inport == "stor-ovn-control-plane", priority 110, uuid 348534c1
next;
8. ls_in_acl_hint (northd.c:6007): ct.new && !ct.est, priority 7, uuid 3f0319d1
reg0[7] = 1;
reg0[9] = 1;
next;
9. ls_in_acl (northd.c:6508): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid 467627c5
reg0[1] = 1;
next;
14. ls_in_stateful (northd.c:6854): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 6e341e16
ct_commit { ct_label.blocked = 0; };
next;
15. ls_in_pre_hairpin (northd.c:6879): ip && ct.trk, priority 100, uuid 28b3543b
reg0[6] = chk_lb_hairpin();
reg0[12] = chk_lb_hairpin_reply();
*** chk_lb_hairpin_reply action not implemented
next;
24. ls_in_l2_lkup (northd.c:8299): eth.dst == 3e:db:a1:63:b7:06, priority 50, uuid 875c8e0f
outport = "k8s-ovn-control-plane";
output;
egress(dp="ovn-control-plane", inport="stor-ovn-control-plane", outport="k8s-ovn-control-plane")
------------------------------------------------------------------------------------------------
0. ls_out_pre_lb (northd.c:5911): ip, priority 100, uuid 7873615a
reg0[2] = 1;
next;
1. ls_out_pre_acl (northd.c:5779): ip, priority 100, uuid aa3eb857
reg0[0] = 1;
next;
2. ls_out_pre_stateful (northd.c:5956): reg0[2] == 1, priority 110, uuid 0f11bec7
ct_lb;
ct_lb /* default (use --ct to customize) */
-------------------------------------------
3. ls_out_acl_hint (northd.c:6060): ct.est && ct_label.blocked == 0, priority 1, uuid 19de4846
reg0[10] = 1;
next;
9. ls_out_port_sec_l2 (northd.c:5613): outport == "k8s-ovn-control-plane", priority 50, uuid d60692bf
output;
/* output to "k8s-ovn-control-plane", type "" */
sh-5.1# ovn-nbctl lr-policy-list ovn_cluster_router
Routing Policies
1004 inport == "rtos-ovn-control-plane" && ip4.dst == 172.18.0.3 /* ovn-control-plane */ reroute 10.244.0.2
1004 inport == "rtos-ovn-worker" && ip4.dst == 172.18.0.2 /* ovn-worker */ reroute 10.244.2.2
1004 inport == "rtos-ovn-worker2" && ip4.dst == 172.18.0.4 /* ovn-worker2 */ reroute 10.244.1.2
101 ip4.src == 10.244.0.0/16 && ip4.dst == 10.244.0.0/16 allow
101 ip4.src == 10.244.0.0/16 && ip4.dst == 100.64.0.0/16 allow
101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.2/32 allow
101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.3/32 allow
101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.4/32 allow
100 ip4.src == 10.244.0.19 reroute 100.64.0.2
*** Bug 2090103 has been marked as a duplicate of this bug. *** Hello Surya, Do you have any update on this BZ? Customer is asking for update. Thanks for the needinfo, didn't notice we had a case on this bug. Looking into what's the issue for the customer.. Surya bit confused with your multiple comments .Is my understanding correct that this is correct BZ for customer issue ? By the way i created this BZ 2090103 and akaris closed it calling it as duplicate of this bug.Please correct me if i am missing anything . No all good Manish. This bug needs a design decision and we are in the process of fixing on it. Please note that bug is medium priority so it will take some time for a definitive fix. We are very much working on it and this is indeed the correct bug. update: we had a meeting within the team last week: https://docs.google.com/document/d/1s5kwKImltuZdFUWxEeHd0KkOCNy4agrF5FtMztK9aGk/edit Consensus is to make SGW like LGW in the one case of pod on egressNode - where it exhibits different bevaiour. Will be documented in the PR. Essentially OVNK will fix the problem user is facing and agree we shouldn't EIP SNAT traffic towards clusterIP service if the backend is within the cluster. NOTE: SDN and OVNK are two different plugins, so OVNK has a different definition of what egress means and we are trying to fix the bug in SDN where we went wrong. More details in the google doc. Probably best way here is to replace 101 ip4.src == 10.244.0.0/16 && ip4.dst == 10.244.0.0/16 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 100.64.0.0/16 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.2/32 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.3/32 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.4/32 allow with 101 ip4.src == 10.244.0.0/16 && ip4.dst == 10.244.0.0/16 allow 101 ip4.src == 10.244.0.0/16 && ip4.dst == 100.64.0.0/16 allow 101 ip4.src == $eipAS && inport == "rtos-node1" && ip4.dst ==$ovn-host-network_v4 reroute mp0-node1IPv4 101 ip4.src == $eipAS && inport == "rtos-node2" && ip4.dst ==$ovn-host-network_v4 reroute mp0-node2IPv4 101 ip4.src == $eipAS && inport == "rtos-node3" && ip4.dst ==$ovn-host-network_v4 reroute mp0-node3IPv4 same for v6. This is the only way we can ensure we don't change the traffic flows for non-egressIP pods. NOTE: we have the 1004 policy for hairpin traffic but I doubt we should touch that/mix this up with that, since we have an option where we disable egressIP 1004 inport == "rtos-ovn-control-plane" && ip4.dst == 172.18.0.3 /* ovn-control-plane */ reroute 10.244.1.2 1004 inport == "rtos-ovn-worker" && ip4.dst == 172.18.0.2 /* ovn-worker */ reroute 10.244.0.2 1004 inport == "rtos-ovn-worker2" && ip4.dst == 172.18.0.4 /* ovn-worker2 */ reroute 10.244.2.2 UPDATE: ip4.dst ==$ovn-host-network_v4 here ovn-host-network means a set of all nodeIPs in the cluster. In case someone is wondering why this is bug is taking longer than intended, hitting some new bugs that is slowing the development process: https://bugzilla.redhat.com/show_bug.cgi?id=2106444 and https://bugzilla.redhat.com/show_bug.cgi?id=2108026. Worked around these. Hitting another new bug now which I am investigating... Hello surya , Do you have any update on this Bug ? While testing the PR out, we figured out we can't exactly do the policy based routing upon conntrack states, since in OVN pipeline we hold conntrack states locally, so upon pkt going into ovn_cluster_router form node switch, we clear the ct_state. So we need to find an alternate way of steering traffic. A workaround to this bug is to use LGW mode, where this bug doesn't happen. It only effects SGW users. Upstream Fix Merged: https://github.com/ovn-org/ovn-kubernetes/pull/3064 Downstream Merge Opened: https://github.com/openshift/ovn-kubernetes/pull/1493 https://github.com/openshift/ovn-kubernetes/pull/1496 downstream merge done, moving bug to MODIFIED. Any update on this please ? Customer is asking for update (In reply to Manish Pandey from comment #25) > Any update on this please ? Customer is asking for update Start working on its verification Verified in 4.13.0-0.nightly-2023-02-07-064924
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.13.0-0.nightly-2023-02-07-064924 True False 133m Cluster version is 4.13.0-0.nightly-2023-02-07-064924
$ oc get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
jechen-0207b-4v2jp-master-0.c.openshift-qe.internal Ready control-plane,master 124m v1.26.0+9eb81c2 10.0.0.7 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8
jechen-0207b-4v2jp-master-1.c.openshift-qe.internal Ready control-plane,master 126m v1.26.0+9eb81c2 10.0.0.5 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8
jechen-0207b-4v2jp-master-2.c.openshift-qe.internal Ready control-plane,master 126m v1.26.0+9eb81c2 10.0.0.6 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8
jechen-0207b-4v2jp-worker-a-nq7br.c.openshift-qe.internal Ready worker 113m v1.26.0+9eb81c2 10.0.128.4 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8
jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal Ready worker 113m v1.26.0+9eb81c2 10.0.128.2 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8
jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal Ready worker 113m v1.26.0+9eb81c2 10.0.128.3 <none> Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa) 4.18.0-372.41.1.el8_6.x86_64 cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8
# label worker node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal as egress-assignable node
$ oc label node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
node/jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal labeled
# create egressip object
$ cat config_egressip1_ovn_ns_team_blue_gcp.yaml
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
name: egressip-blue
spec:
egressIPs:
- 10.0.128.101
namespaceSelector:
matchLabels:
team: blue
$ oc create -f config_egressip1_ovn_ns_team_blue_gcp.yaml
egressip.k8s.ovn.org/egressip-blue created
$ oc get egressips.k8s.ovn.org
NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS
egressip-blue 10.0.128.101 jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal 10.0.128.101
# create a test namespace, and label it with same namespace selector as that in egressip-blue
$ oc new-project test
$ oc label ns test security.openshift.io/scc.podSecurityLabelSync=false pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/audit=privileged pod-security.kubernetes.io/warn=privileged --overwrite
namespace/test labeled
$ oc label ns test team=blue
namespace/test labeled
# create hostnetwork pod and service in the test namespace
$ cat hostnework-pod.yaml
---
apiVersion: v1
kind: List
items:
- kind: Pod
apiVersion: v1
metadata:
name: hostnetwork-pod
labels:
name: hostnetwork-pod
spec:
containers:
- name: hostnetwork-pod
image: quay.io/openshifttest/hello-sdn@sha256:c89445416459e7adea9a5a416b3365ed3d74f2491beb904d61dc8d1eb89a72a4
hostNetwork: true
- apiVersion: v1
kind: Service
metadata:
labels:
name: test-service
name: test-service
spec:
ports:
- name: http
port: 27017
protocol: TCP
targetPort: 8080
selector:
name: hostnetwork-pod
type: NodePort
$ oc create -f hostnework-pod.yaml
pod/hostnetwork-pod created
# create some tests pod in the test namespace
$ oc get all -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/hostnetwork-pod 1/1 Running 0 9m32s 10.0.128.2 jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal <none> <none>
pod/test-rc-b2vzk 1/1 Running 0 72m 10.128.2.15 jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal <none> <none>
pod/test-rc-bc27l 1/1 Running 0 72m 10.129.2.17 jechen-0207b-4v2jp-worker-a-nq7br.c.openshift-qe.internal <none> <none>
pod/test-rc-bhvfs 1/1 Running 0 72m 10.131.0.17 jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal <none> <none>
pod/test-rc-mqb2m 1/1 Running 0 72m 10.131.0.18 jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal <none> <none>
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicationcontroller/test-rc 4 4 4 72m test-pod quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95 name=test-pods
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/test-service NodePort 172.30.79.29 <none> 27017:31459/TCP 9m32s name=hostnetwork-pod
# on a separate terminal, enable on the tcpdump on egress node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal
$ oc debug node/jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal
Starting pod/jechen-0207b-4v2jp-worker-c-tl744copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.3
If you don't see a command prompt, try pressing enter.
sh-4.4# tcpdump -n -i any -nneep "(src port 31661 and dst port 31459) or (src port 31459 and dst port 31661)"
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
# from test pod test-rc-b2vzk that is on egress node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal, curl the hostnetwork pod's ip with port number
$ oc rsh test-rc-b2vzk
~ $ curl --local-port 31661 10.0.128.2:31459
Hello OpenShift!
#Verified the fix in the following tcpdump on egress node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal
$ oc debug node/jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal
Starting pod/jechen-0207b-4v2jp-worker-c-tl744copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.3
If you don't see a command prompt, try pressing enter.
sh-4.4# tcpdump -n -i any -nneep "(src port 31661 and dst port 31459) or (src port 31459 and dst port 31661)"
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
20:56:07.074110 P 0a:58:0a:80:02:0f ethertype IPv4 (0x0800), length 76: 10.128.2.15.31661 > 10.0.128.2.31459: Flags [S], seq 2082181755, win 26400, options [mss 1320,sackOK,TS val 2065482948 ecr 0,nop,wscale 7], length 0
20:56:07.075185 Out 42:01:0a:00:80:03 ethertype IPv4 (0x0800), length 76: 10.0.128.3.31661 > 10.0.128.2.31459: Flags [S], seq 2082181755, win 26400, options [mss 1320,sackOK,TS val 2065482948 ecr 0,nop,wscale 7], length 0
20:56:07.077916 In 42:01:0a:00:80:01 ethertype IPv4 (0x0800), length 76: 10.0.128.2.31459 > 10.0.128.3.31661: Flags [S.], seq 1636338806, ack 2082181756, win 26160, options [mss 1320,sackOK,TS val 1945200435 ecr 2065482948,nop,wscale 7], length 0
20:56:07.078851 Out 0a:58:0a:80:02:01 ethertype IPv4 (0x0800), length 76: 10.0.128.2.31459 > 10.128.2.15.31661: Flags [S.], seq 1636338806, ack 2082181756, win 26160, options [mss 1320,sackOK,TS val 1945200435 ecr 2065482948,nop,wscale 7], length 0
20:56:07.078906 P 0a:58:0a:80:02:0f ethertype IPv4 (0x0800), length 68: 10.128.2.15.31661 > 10.0.128.2.31459: Flags [.], ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 0
20:56:07.078975 P 0a:58:0a:80:02:0f ethertype IPv4 (0x0800), length 148: 10.128.2.15.31661 > 10.0.128.2.31459: Flags [P.], seq 1:81, ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 80
20:56:07.079503 Out 42:01:0a:00:80:03 ethertype IPv4 (0x0800), length 68: 10.0.128.3.31661 > 10.0.128.2.31459: Flags [.], ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 0
20:56:07.079563 Out 42:01:0a:00:80:03 ethertype IPv4 (0x0800), length 148: 10.0.128.3.31661 > 10.0.128.2.31459: Flags [P.], seq 1:81, ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 80
20:56:07.080127 In 42:01:0a:00:80:01 ethertype IPv4 (0x0800), length 68: 10.0.128.2.31459 > 10.0.128.3.31661: Flags [.], ack 81, win 205, options [nop,nop,TS val 1945200438 ecr 2065482953], length 0
20:56:07.080173 Out 0a:58:0a:80:02:01 ethertype IPv4 (0x0800), length 68: 10.0.128.2.31459 > 10.128.2.15.31661: Flags [.], ack 81, win 205, options [nop,nop,TS val 1945200438 ecr 2065482953], length 0
==> packet coming out from the test pod 10.128.2.15 on egress node (jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal) got NATed to its nodeIP 10.0.128.3, not egressIP address 10.0.128.101
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:1326 |