Bug 2078222 - egressIPs behave inconsistently towards in-cluster traffic (hosts and services backed by host-networked pods)
Summary: egressIPs behave inconsistently towards in-cluster traffic (hosts and service...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.13.0
Assignee: Surya Seetharaman
QA Contact: jechen
URL:
Whiteboard:
: 2090103 (view as bug list)
Depends On: 2125247
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-24 14:24 UTC by Surya Seetharaman
Modified: 2023-05-17 22:46 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-17 22:46:32 UTC
Target Upstream Version:
Embargoed:
surya: needinfo-
mapandey: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 1493 0 None closed Bug 2078222, OCPBUGS-4119: [DownstreamMerge] 1-31-23 2023-02-06 12:49:50 UTC
Github openshift ovn-kubernetes pull 1496 0 None Merged Bug 2078222, OCPBUGS-4119, OCPBUGS-5930, OCPBUGS-4425: [DownstreamMerge] 1-31-23 2023-02-01 10:50:50 UTC
Github ovn-org ovn-kubernetes pull 3064 0 None Merged EIP & ESVC inconsistency 2023-01-31 07:15:01 UTC
Red Hat Product Errata RHSA-2023:1326 0 None None None 2023-05-17 22:46:45 UTC

Description Surya Seetharaman 2022-04-24 14:24:43 UTC
Description of problem:

We saw an inconsistency in how egressIPs behave when fixing https://bugzilla.redhat.com/show_bug.cgi?id=2070929. Depending on where the pod is located (egressNode OR non-egressNode), the srcIP of the packet will either be egressIP or nodeIP.

In addition to this, we also have a 101 policy created on the cluster-router for every nodeIP on the cluster which conveniently picks the first default primary IP off the node. What about other nodeIPs? We need to fix this to behave like 1004's.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Surya Seetharaman 2022-04-25 19:24:34 UTC
The behaviour is also different in different gateway modes for the traffic pod->diffnode. While in SGW depending on whether the pod is on egress node or non-egress node, the srcIP is either egressIP or nodeIP. In LGW, its always nodeIP. This traffic isn't considered as egressIP traffic.

sh-5.1# ovn-trace --ct new 'inport=="egressip-7887_e2e-egressip-pod-1" && eth.src==0a:58:0a:f4:00:13 && eth.dst==0a:58:0a:f4:00:01 && ip4.src==10.244.0.19 && ip4.dst==172.18.0.2 && ip.ttl==64 && tcp && tcp.src==80 && tcp.dst==80'
# tcp,reg14=0x6,vlan_tci=0x0000,dl_src=0a:58:0a:f4:00:13,dl_dst=0a:58:0a:f4:00:01,nw_src=10.244.0.19,nw_dst=172.18.0.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=80,tcp_flags=0

ingress(dp="ovn-control-plane", inport="egressip-7887_e2e-egressip-pod-1")
--------------------------------------------------------------------------
 0. ls_in_port_sec_l2 (northd.c:5516): inport == "egressip-7887_e2e-egressip-pod-1" && eth.src == {0a:58:0a:f4:00:13}, priority 50, uuid f17158be
    next;
 1. ls_in_port_sec_ip (northd.c:5149): inport == "egressip-7887_e2e-egressip-pod-1" && eth.src == 0a:58:0a:f4:00:13 && ip4.src == {10.244.0.19}, priority 90, uuid 59c5fa9b
    next;
 5. ls_in_pre_acl (northd.c:5777): ip, priority 100, uuid 3c4e85db
    reg0[0] = 1;
    next;
 6. ls_in_pre_lb (northd.c:5909): ip, priority 100, uuid 89621558
    reg0[2] = 1;
    next;
 7. ls_in_pre_stateful (northd.c:5936): reg0[2] == 1 && ip4 && tcp, priority 120, uuid 1927646d
    reg1 = ip4.dst;
    reg2[0..15] = tcp.dst;
    ct_lb;

ct_lb
-----
 8. ls_in_acl_hint (northd.c:6007): ct.new && !ct.est, priority 7, uuid 3f0319d1
    reg0[7] = 1;
    reg0[9] = 1;
    next;
 9. ls_in_acl (northd.c:6508): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid 467627c5
    reg0[1] = 1;
    next;
14. ls_in_stateful (northd.c:6854): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 6e341e16
    ct_commit { ct_label.blocked = 0; };
    next;
15. ls_in_pre_hairpin (northd.c:6879): ip && ct.trk, priority 100, uuid 28b3543b
    reg0[6] = chk_lb_hairpin();
    reg0[12] = chk_lb_hairpin_reply();
    *** chk_lb_hairpin_reply action not implemented
    next;
24. ls_in_l2_lkup (northd.c:8370): eth.dst == 0a:58:0a:f4:00:01, priority 50, uuid 61105717
    outport = "stor-ovn-control-plane";
    output;

egress(dp="ovn-control-plane", inport="egressip-7887_e2e-egressip-pod-1", outport="stor-ovn-control-plane")
-----------------------------------------------------------------------------------------------------------
 0. ls_out_pre_lb (northd.c:5666): ip && outport == "stor-ovn-control-plane", priority 110, uuid af2ab016
    next;
 1. ls_out_pre_acl (northd.c:5666): ip && outport == "stor-ovn-control-plane", priority 110, uuid 2691e886
    next;
 3. ls_out_acl_hint (northd.c:6007): ct.new && !ct.est, priority 7, uuid e0561d4b
    reg0[7] = 1;
    reg0[9] = 1;
    next;
 4. ls_out_acl (northd.c:6511): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid 37b679a7
    reg0[1] = 1;
    next;
 7. ls_out_stateful (northd.c:6858): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 7f437e3c
    ct_commit { ct_label.blocked = 0; };
    next;
 9. ls_out_port_sec_l2 (northd.c:5613): outport == "stor-ovn-control-plane", priority 50, uuid 13e0a299
    output;
    /* output to "stor-ovn-control-plane", type "patch" */

ingress(dp="ovn_cluster_router", inport="rtos-ovn-control-plane")
-----------------------------------------------------------------
 0. lr_in_admission (northd.c:10601): eth.dst == 0a:58:0a:f4:00:01 && inport == "rtos-ovn-control-plane" && is_chassis_resident("cr-rtos-ovn-control-plane"), priority 50, uuid 28f2eee6
    xreg0[0..47] = 0a:58:0a:f4:00:01;
    next;
 1. lr_in_lookup_neighbor (northd.c:10745): 1, priority 0, uuid 30eb11a0
    reg9[2] = 1;
    next;
 2. lr_in_learn_neighbor (northd.c:10754): reg9[2] == 1 || reg9[3] == 0, priority 100, uuid ec8d6391
    next;
10. lr_in_ip_routing_pre (northd.c:11004): 1, priority 0, uuid d18586d7
    reg7 = 0;
    next;
11. lr_in_ip_routing (northd.c:9517): ip4.src == 10.244.0.0/24, priority 72, uuid 632c144e
    ip.ttl--;
    reg8[0..15] = 0;
    reg0 = 10.244.0.2;
    reg1 = 10.244.0.1;
    eth.src = 0a:58:0a:f4:00:01;
    outport = "rtos-ovn-control-plane";
    flags.loopback = 1;
    next;
12. lr_in_ip_routing_ecmp (northd.c:11079): reg8[0..15] == 0, priority 150, uuid 6c5046ad
    next;
13. lr_in_policy (northd.c:8750): ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.2/32, priority 101, uuid 8ce10805
    reg8[0..15] = 0;
    next;
14. lr_in_policy_ecmp (northd.c:11214): reg8[0..15] == 0, priority 150, uuid 4f52831d
    next;
15. lr_in_arp_resolve (northd.c:11415): outport == "rtos-ovn-control-plane" && reg0 == 10.244.0.2, priority 100, uuid 47976c50
    eth.dst = 3e:db:a1:63:b7:06;
    next;
18. lr_in_gw_redirect (northd.c:11814): outport == "rtos-ovn-control-plane", priority 50, uuid a1de3ff1
    outport = "cr-rtos-ovn-control-plane";
    next;
19. lr_in_arp_request (northd.c:11895): 1, priority 0, uuid 8d41b73f
    output;
    /* Replacing type "chassisredirect" outport "cr-rtos-ovn-control-plane" with distributed port "rtos-ovn-control-plane". */

egress(dp="ovn_cluster_router", inport="rtos-ovn-control-plane", outport="rtos-ovn-control-plane")
--------------------------------------------------------------------------------------------------
 0. lr_out_chk_dnat_local (northd.c:13120): 1, priority 0, uuid 888127ac
    reg9[4] = 0;
    next;
 6. lr_out_delivery (northd.c:11942): outport == "rtos-ovn-control-plane", priority 100, uuid 4875b1e2
    output;
    /* output to "rtos-ovn-control-plane", type "patch" */

ingress(dp="ovn-control-plane", inport="stor-ovn-control-plane")
----------------------------------------------------------------
 0. ls_in_port_sec_l2 (northd.c:5516): inport == "stor-ovn-control-plane", priority 50, uuid aca7f8f6
    next;
 5. ls_in_pre_acl (northd.c:5663): ip && inport == "stor-ovn-control-plane", priority 110, uuid fc8809b8
    next;
 6. ls_in_pre_lb (northd.c:5663): ip && inport == "stor-ovn-control-plane", priority 110, uuid 348534c1
    next;
 8. ls_in_acl_hint (northd.c:6007): ct.new && !ct.est, priority 7, uuid 3f0319d1
    reg0[7] = 1;
    reg0[9] = 1;
    next;
 9. ls_in_acl (northd.c:6508): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid 467627c5
    reg0[1] = 1;
    next;
14. ls_in_stateful (northd.c:6854): reg0[1] == 1 && reg0[13] == 0, priority 100, uuid 6e341e16
    ct_commit { ct_label.blocked = 0; };
    next;
15. ls_in_pre_hairpin (northd.c:6879): ip && ct.trk, priority 100, uuid 28b3543b
    reg0[6] = chk_lb_hairpin();
    reg0[12] = chk_lb_hairpin_reply();
    *** chk_lb_hairpin_reply action not implemented
    next;
24. ls_in_l2_lkup (northd.c:8299): eth.dst == 3e:db:a1:63:b7:06, priority 50, uuid 875c8e0f
    outport = "k8s-ovn-control-plane";
    output;

egress(dp="ovn-control-plane", inport="stor-ovn-control-plane", outport="k8s-ovn-control-plane")
------------------------------------------------------------------------------------------------
 0. ls_out_pre_lb (northd.c:5911): ip, priority 100, uuid 7873615a
    reg0[2] = 1;
    next;
 1. ls_out_pre_acl (northd.c:5779): ip, priority 100, uuid aa3eb857
    reg0[0] = 1;
    next;
 2. ls_out_pre_stateful (northd.c:5956): reg0[2] == 1, priority 110, uuid 0f11bec7
    ct_lb;

ct_lb /* default (use --ct to customize) */
-------------------------------------------
 3. ls_out_acl_hint (northd.c:6060): ct.est && ct_label.blocked == 0, priority 1, uuid 19de4846
    reg0[10] = 1;
    next;
 9. ls_out_port_sec_l2 (northd.c:5613): outport == "k8s-ovn-control-plane", priority 50, uuid d60692bf
    output;
    /* output to "k8s-ovn-control-plane", type "" */
sh-5.1# ovn-nbctl lr-policy-list ovn_cluster_router
Routing Policies
      1004 inport == "rtos-ovn-control-plane" && ip4.dst == 172.18.0.3 /* ovn-control-plane */         reroute                10.244.0.2
      1004 inport == "rtos-ovn-worker" && ip4.dst == 172.18.0.2 /* ovn-worker */         reroute                10.244.2.2
      1004 inport == "rtos-ovn-worker2" && ip4.dst == 172.18.0.4 /* ovn-worker2 */         reroute                10.244.1.2
       101 ip4.src == 10.244.0.0/16 && ip4.dst == 10.244.0.0/16           allow
       101 ip4.src == 10.244.0.0/16 && ip4.dst == 100.64.0.0/16           allow
       101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.2/32           allow
       101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.3/32           allow
       101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.4/32           allow
       100                             ip4.src == 10.244.0.19         reroute                100.64.0.2

Comment 2 Andreas Karis 2022-05-25 13:34:17 UTC
*** Bug 2090103 has been marked as a duplicate of this bug. ***

Comment 3 Manish Pandey 2022-06-09 09:28:49 UTC
Hello Surya,

Do you have any update on this BZ?
Customer is asking for update.

Comment 4 Surya Seetharaman 2022-06-13 11:16:50 UTC
Thanks for the needinfo, didn't notice we had a case on this bug. Looking into what's the issue for the customer..

Comment 8 Manish Pandey 2022-06-14 03:16:16 UTC
Surya bit confused with your multiple comments .Is my understanding correct that this is correct BZ for customer issue ? By the way i created this BZ 2090103 and akaris closed it calling it as duplicate of this bug.Please correct me if i am missing anything .

Comment 9 Surya Seetharaman 2022-06-14 15:55:51 UTC
No all good Manish. This bug needs a design decision and we are in the process of fixing on it. Please note that bug is medium priority so it will take some time for a definitive fix. We are very much working on it and this is indeed the correct bug.

Comment 10 Surya Seetharaman 2022-06-27 08:29:36 UTC
update: we had a meeting within the team last week: https://docs.google.com/document/d/1s5kwKImltuZdFUWxEeHd0KkOCNy4agrF5FtMztK9aGk/edit

Consensus is to make SGW like LGW in the one case of pod on egressNode - where it exhibits different bevaiour. Will be documented in the PR. Essentially OVNK will fix the problem user is facing and agree we shouldn't EIP SNAT traffic towards clusterIP service if the backend is within the cluster.

NOTE: SDN and OVNK are two different plugins, so OVNK has a different definition of what egress means and we are trying to fix the bug in SDN where we went wrong. More details in the google doc.

Comment 11 Surya Seetharaman 2022-06-29 10:41:31 UTC
Probably best way here is to replace

101 ip4.src == 10.244.0.0/16 && ip4.dst == 10.244.0.0/16           allow
101 ip4.src == 10.244.0.0/16 && ip4.dst == 100.64.0.0/16           allow
101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.2/32           allow
101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.3/32           allow
101 ip4.src == 10.244.0.0/16 && ip4.dst == 172.18.0.4/32           allow

with 

101 ip4.src == 10.244.0.0/16 && ip4.dst == 10.244.0.0/16           allow
101 ip4.src == 10.244.0.0/16 && ip4.dst == 100.64.0.0/16           allow
101 ip4.src == $eipAS && inport == "rtos-node1" && ip4.dst ==$ovn-host-network_v4 reroute mp0-node1IPv4
101 ip4.src == $eipAS && inport == "rtos-node2" && ip4.dst ==$ovn-host-network_v4 reroute mp0-node2IPv4
101 ip4.src == $eipAS && inport == "rtos-node3" && ip4.dst ==$ovn-host-network_v4 reroute mp0-node3IPv4

same for v6. This is the only way we can ensure we don't change the traffic flows for non-egressIP pods.

NOTE: we have the 1004 policy for hairpin traffic but I doubt we should touch that/mix this up with that, since we have an option where we disable egressIP

1004 inport == "rtos-ovn-control-plane" && ip4.dst == 172.18.0.3 /* ovn-control-plane */    reroute              10.244.1.2                                     
1004 inport == "rtos-ovn-worker" && ip4.dst == 172.18.0.2 /* ovn-worker */         reroute                10.244.0.2                                                   
1004 inport == "rtos-ovn-worker2" && ip4.dst == 172.18.0.4 /* ovn-worker2 */         reroute                10.244.2.2

Comment 12 Surya Seetharaman 2022-06-29 10:45:39 UTC
UPDATE: ip4.dst ==$ovn-host-network_v4

here ovn-host-network means a set of all nodeIPs in the cluster.

Comment 13 Surya Seetharaman 2022-07-18 11:46:20 UTC
In case someone is wondering why this is bug is taking longer than intended, hitting some new bugs that is slowing the development process: https://bugzilla.redhat.com/show_bug.cgi?id=2106444 and https://bugzilla.redhat.com/show_bug.cgi?id=2108026. Worked around these. Hitting another new bug now which I am investigating...

Comment 14 Manish Pandey 2022-08-23 14:23:31 UTC
Hello surya ,

Do you have any update on this Bug ?

Comment 16 Surya Seetharaman 2022-09-05 12:44:49 UTC
While testing the PR out, we figured out we can't exactly do the policy based routing upon conntrack states, since in OVN pipeline we hold conntrack states locally, so upon pkt going into ovn_cluster_router form node switch, we clear the ct_state. So we need to find an alternate way of steering traffic.

Comment 18 Surya Seetharaman 2022-09-08 13:06:40 UTC
A workaround to this bug is to use LGW mode, where this bug doesn't happen. It only effects SGW users.

Comment 21 Surya Seetharaman 2023-01-31 07:14:43 UTC
Upstream Fix Merged: https://github.com/ovn-org/ovn-kubernetes/pull/3064
Downstream Merge Opened: https://github.com/openshift/ovn-kubernetes/pull/1493

Comment 22 Surya Seetharaman 2023-02-01 10:52:17 UTC
https://github.com/openshift/ovn-kubernetes/pull/1496 downstream merge done, moving bug to MODIFIED.

Comment 25 Manish Pandey 2023-02-06 09:50:34 UTC
Any update on this please ? Customer is asking for update

Comment 27 jechen 2023-02-06 20:43:05 UTC
(In reply to Manish Pandey from comment #25)
> Any update on this please ? Customer is asking for update

Start working on its verification

Comment 28 jechen 2023-02-07 21:48:52 UTC
Verified in 4.13.0-0.nightly-2023-02-07-064924

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-02-07-064924   True        False         133m    Cluster version is 4.13.0-0.nightly-2023-02-07-064924

$ oc get node -owide
NAME                                                        STATUS   ROLES                  AGE    VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
jechen-0207b-4v2jp-master-0.c.openshift-qe.internal         Ready    control-plane,master   124m   v1.26.0+9eb81c2   10.0.0.7      <none>        Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa)   4.18.0-372.41.1.el8_6.x86_64   cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8
jechen-0207b-4v2jp-master-1.c.openshift-qe.internal         Ready    control-plane,master   126m   v1.26.0+9eb81c2   10.0.0.5      <none>        Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa)   4.18.0-372.41.1.el8_6.x86_64   cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8
jechen-0207b-4v2jp-master-2.c.openshift-qe.internal         Ready    control-plane,master   126m   v1.26.0+9eb81c2   10.0.0.6      <none>        Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa)   4.18.0-372.41.1.el8_6.x86_64   cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8
jechen-0207b-4v2jp-worker-a-nq7br.c.openshift-qe.internal   Ready    worker                 113m   v1.26.0+9eb81c2   10.0.128.4    <none>        Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa)   4.18.0-372.41.1.el8_6.x86_64   cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8
jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal   Ready    worker                 113m   v1.26.0+9eb81c2   10.0.128.2    <none>        Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa)   4.18.0-372.41.1.el8_6.x86_64   cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8
jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal   Ready    worker                 113m   v1.26.0+9eb81c2   10.0.128.3    <none>        Red Hat Enterprise Linux CoreOS 413.86.202302061827-0 (Ootpa)   4.18.0-372.41.1.el8_6.x86_64   cri-o://1.26.1-6.rhaos4.13.git159cc9c.el8


# label worker node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal as egress-assignable node
$ oc label node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
node/jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal labeled

# create egressip object
$ cat config_egressip1_ovn_ns_team_blue_gcp.yaml
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egressip-blue
spec:
  egressIPs:
  - 10.0.128.101
  namespaceSelector:
    matchLabels:
      team: blue 


$ oc create -f config_egressip1_ovn_ns_team_blue_gcp.yaml
egressip.k8s.ovn.org/egressip-blue created

$ oc get egressips.k8s.ovn.org 
NAME            EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip-blue   10.0.128.101   jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal   10.0.128.101


# create a test namespace, and label it with same namespace selector as that in egressip-blue
$ oc new-project test
$ oc label ns test security.openshift.io/scc.podSecurityLabelSync=false pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/audit=privileged pod-security.kubernetes.io/warn=privileged --overwrite
namespace/test labeled

$ oc label ns test team=blue
namespace/test labeled


# create hostnetwork pod and service in the test namespace
$ cat hostnework-pod.yaml
---
apiVersion: v1
kind: List
items:
- kind: Pod
  apiVersion: v1
  metadata:
    name: hostnetwork-pod
    labels:
      name: hostnetwork-pod
  spec:
    containers:
    - name: hostnetwork-pod
      image: quay.io/openshifttest/hello-sdn@sha256:c89445416459e7adea9a5a416b3365ed3d74f2491beb904d61dc8d1eb89a72a4
    hostNetwork: true
- apiVersion: v1
  kind: Service
  metadata:
    labels:
      name: test-service
    name: test-service
  spec:
    ports:
    - name: http
      port: 27017
      protocol: TCP
      targetPort: 8080
    selector:
      name: hostnetwork-pod
    type: NodePort


$ oc create -f hostnework-pod.yaml
pod/hostnetwork-pod created

# create some tests pod in the test namespace
$ oc get all -owide
NAME                  READY   STATUS    RESTARTS   AGE     IP            NODE                                                        NOMINATED NODE   READINESS GATES
pod/hostnetwork-pod   1/1     Running   0          9m32s   10.0.128.2    jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal   <none>           <none>
pod/test-rc-b2vzk     1/1     Running   0          72m     10.128.2.15   jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal   <none>           <none>
pod/test-rc-bc27l     1/1     Running   0          72m     10.129.2.17   jechen-0207b-4v2jp-worker-a-nq7br.c.openshift-qe.internal   <none>           <none>
pod/test-rc-bhvfs     1/1     Running   0          72m     10.131.0.17   jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal   <none>           <none>
pod/test-rc-mqb2m     1/1     Running   0          72m     10.131.0.18   jechen-0207b-4v2jp-worker-b-q9wnj.c.openshift-qe.internal   <none>           <none>

NAME                            DESIRED   CURRENT   READY   AGE   CONTAINERS   IMAGES                                                                                                    SELECTOR
replicationcontroller/test-rc   4         4         4       72m   test-pod     quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95   name=test-pods

NAME                   TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)           AGE     SELECTOR
service/test-service   NodePort   172.30.79.29   <none>        27017:31459/TCP   9m32s   name=hostnetwork-pod


# on a separate terminal, enable on the tcpdump on egress node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal
$ oc debug node/jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal
Starting pod/jechen-0207b-4v2jp-worker-c-tl744copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.3
If you don't see a command prompt, try pressing enter.
sh-4.4# tcpdump -n -i any -nneep "(src port 31661 and  dst port 31459) or (src port 31459 and dst port 31661)"
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes


# from test pod test-rc-b2vzk that is on egress node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal, curl the hostnetwork pod's ip with port number
$ oc rsh test-rc-b2vzk
~ $  curl --local-port 31661 10.0.128.2:31459
Hello OpenShift!


#Verified the fix in the following tcpdump on egress node jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal

$ oc debug node/jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal
Starting pod/jechen-0207b-4v2jp-worker-c-tl744copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.3
If you don't see a command prompt, try pressing enter.
sh-4.4# tcpdump -n -i any -nneep "(src port 31661 and  dst port 31459) or (src port 31459 and dst port 31661)"
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
20:56:07.074110   P 0a:58:0a:80:02:0f ethertype IPv4 (0x0800), length 76: 10.128.2.15.31661 > 10.0.128.2.31459: Flags [S], seq 2082181755, win 26400, options [mss 1320,sackOK,TS val 2065482948 ecr 0,nop,wscale 7], length 0
20:56:07.075185 Out 42:01:0a:00:80:03 ethertype IPv4 (0x0800), length 76: 10.0.128.3.31661 > 10.0.128.2.31459: Flags [S], seq 2082181755, win 26400, options [mss 1320,sackOK,TS val 2065482948 ecr 0,nop,wscale 7], length 0
20:56:07.077916  In 42:01:0a:00:80:01 ethertype IPv4 (0x0800), length 76: 10.0.128.2.31459 > 10.0.128.3.31661: Flags [S.], seq 1636338806, ack 2082181756, win 26160, options [mss 1320,sackOK,TS val 1945200435 ecr 2065482948,nop,wscale 7], length 0
20:56:07.078851 Out 0a:58:0a:80:02:01 ethertype IPv4 (0x0800), length 76: 10.0.128.2.31459 > 10.128.2.15.31661: Flags [S.], seq 1636338806, ack 2082181756, win 26160, options [mss 1320,sackOK,TS val 1945200435 ecr 2065482948,nop,wscale 7], length 0
20:56:07.078906   P 0a:58:0a:80:02:0f ethertype IPv4 (0x0800), length 68: 10.128.2.15.31661 > 10.0.128.2.31459: Flags [.], ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 0
20:56:07.078975   P 0a:58:0a:80:02:0f ethertype IPv4 (0x0800), length 148: 10.128.2.15.31661 > 10.0.128.2.31459: Flags [P.], seq 1:81, ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 80
20:56:07.079503 Out 42:01:0a:00:80:03 ethertype IPv4 (0x0800), length 68: 10.0.128.3.31661 > 10.0.128.2.31459: Flags [.], ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 0
20:56:07.079563 Out 42:01:0a:00:80:03 ethertype IPv4 (0x0800), length 148: 10.0.128.3.31661 > 10.0.128.2.31459: Flags [P.], seq 1:81, ack 1, win 207, options [nop,nop,TS val 2065482953 ecr 1945200435], length 80
20:56:07.080127  In 42:01:0a:00:80:01 ethertype IPv4 (0x0800), length 68: 10.0.128.2.31459 > 10.0.128.3.31661: Flags [.], ack 81, win 205, options [nop,nop,TS val 1945200438 ecr 2065482953], length 0
20:56:07.080173 Out 0a:58:0a:80:02:01 ethertype IPv4 (0x0800), length 68: 10.0.128.2.31459 > 10.128.2.15.31661: Flags [.], ack 81, win 205, options [nop,nop,TS val 1945200438 ecr 2065482953], length 0

==> packet coming out from the test pod 10.128.2.15 on egress node (jechen-0207b-4v2jp-worker-c-tl744.c.openshift-qe.internal) got NATed to its nodeIP 10.0.128.3, not egressIP address 10.0.128.101

Comment 37 errata-xmlrpc 2023-05-17 22:46:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:1326


Note You need to log in before you can comment on or make changes to this bug.