Bug 2083116

Summary: OVN-Kubernetes: EgressIP breaks access from a pod to outside depending on if its on an egressNode or not
Product: OpenShift Container Platform Reporter: Surya Seetharaman <surya>
Component: NetworkingAssignee: Surya Seetharaman <surya>
Networking sub component: ovn-kubernetes QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: cpassare, huirwang, rravaiol
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-23 13:25:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2070929    
Bug Blocks:    

Description Surya Seetharaman 2022-05-09 10:28:14 UTC
This bug was initially created as a copy of Bug #2070929

I am copying this bug because: 



OVN-Kubernetes: EgressIP breaks access from a pod with EgressIP to other host networked pods on different nodes

Scenario:

* pod <podA> on node <nodeA> in namespace <nsA> accesses a host networked pod <podB> on node <nodeB>
* EgressIP <eipA> is assigned to namespace <nsA>
* Traffic from <podA> to <podB> breaks

Pods:
~~~
[akaris@linux 2070878]$ oc get pods -A -o wide | grep egress | grep -v debug
e2e-test-egressip-8wznm                            egressip-target-daemonset-sq27b                              1/1     Running     0               19m    10.0.135.215   ip-10-0-135-215.ec2.internal   <none>           <none>
e2e-test-egressip-pfvtq                            e2e-test-egressip-pfvtq-deployment-799497dc77-j9w2p          1/1     Running     0               19m    10.128.2.20    ip-10-0-144-143.ec2.internal   <none>           <none>
[akaris@linux 2070878]$ 
~~~

Before applying the EgressIP:
================================================

curl from 
~~~
~ $ curl 10.0.135.215:32667/clientip
10.0.144.143:49524~ $ 
~~~

Tcpdump on node/ip-10-0-144-143.ec2.internal - we see that the traffic is SNAT'ed to the source node and then sent to the destination node:
~~~
sh-4.4# tcpdump -nne -i ens5 host 10.0.135.215 and port 32667
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
10:44:21.980235 0e:61:09:b2:50:0b > 0e:4c:94:8d:88:a3, ethertype IPv4 (0x0800), length 74: 10.0.144.143.49524 > 10.0.135.215.32667: Flags [S], seq 337255231, win 26583, options [mss 8861,sackOK,TS val 2215492582 ecr 0,nop,wscale 7], length 0
10:44:21.981733 0e:4c:94:8d:88:a3 > 0e:61:09:b2:50:0b, ethertype IPv4 (0x0800), length 74: 10.0.135.215.32667 > 10.0.144.143.49524: Flags [S.], seq 1450458968, ack 337255232, win 26847, options [mss 8961,sackOK,TS val 2693376292 ecr 2215492582,nop,wscale 7], length 0
10:44:21.982334 0e:61:09:b2:50:0b > 0e:4c:94:8d:88:a3, ethertype IPv4 (0x0800), length 156: 10.0.144.143.49524 > 10.0.135.215.32667: Flags [P.], seq 1:91, ack 1, win 208, options [nop,nop,TS val 2215492584 ecr 2693376292], length 90
10:44:21.982339 0e:61:09:b2:50:0b > 0e:4c:94:8d:88:a3, ethertype IPv4 (0x0800), length 66: 10.0.144.143.49524 > 10.0.135.215.32667: Flags [.], ack 1, win 208, options [nop,nop,TS val 2215492584 ecr 2693376292], length 0
(...)
~~~

We see the same traffic on the destination host ip-10-0-135-215.ec2.internal:
~~~
h-4.4# tcpdump -nne -i ens5 host 10.0.135.215 and port 32667
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
10:44:21.980683 0a:a8:54:d9:eb:39 > 0a:71:d0:10:a3:bd, ethertype IPv4 (0x0800), length 74: 10.0.144.143.49524 > 10.0.135.215.32667: Flags [S], seq 337255231, win 26583, options [mss 8861,sackOK,TS val 2215492582 ecr 0,nop,wscale 7], length 0
10:44:21.981285 0a:71:d0:10:a3:bd > 0a:a8:54:d9:eb:39, ethertype IPv4 (0x0800), length 74: 10.0.135.215.32667 > 10.0.144.143.49524: Flags [S.], seq 1450458968, ack 337255232, win 26847, options [mss 8961,sackOK,TS val 2693376292 ecr 2215492582,nop,wscale 7], length 0
10:44:21.982748 0a:a8:54:d9:eb:39 > 0a:71:d0:10:a3:bd, ethertype IPv4 (0x0800), length 156: 10.0.144.143.49524 > 10.0.135.215.32667: Flags [P.], seq 1:91, ack 1, win 208, options [nop,nop,TS val 2215492584 ecr 2693376292], length 90
10:44:21.982796 0a:a8:54:d9:eb:39 > 0a:71:d0:10:a3:bd, ethertype IPv4 (0x0800), length 66: 10.0.144.143.49524 > 10.0.135.215.32667: Flags [.], ack 1, win 208, options [nop,nop,TS val 2215492584 ecr 2693376292], length 0
(...)
~~~

Comment 6 errata-xmlrpc 2022-05-23 13:25:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.15 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2258

Comment 7 Christian Passarelli 2022-06-13 14:13:33 UTC
*** Bug 2095713 has been marked as a duplicate of this bug. ***