Bug 1878071

Summary: [OVN] After configure egressIP, outgoing traffic broke
Product: OpenShift Container Platform Reporter: huirwang
Component: NetworkingAssignee: Alexander Constantinescu <aconstan>
Networking sub component: ovn-kubernetes QA Contact: huirwang
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: high Keywords: TestBlocker
Version: 4.6   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-11 09:57:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description huirwang 2020-09-11 09:10:20 UTC
Description of problem:
After configure egressIP, outgoing traffic broke

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-09-10-195619

How reproducible:
Always

Steps to Reproduce:
1.Label one node to be egressIP node
oc label node compute-1 "k8s.ovn.org/egress-assignable"=""

2.
 Create ns test and pods in it.
oc label ns test team=red

oc get pods -o wide -n test
NAME            READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
hello-pod       1/1     Running   0          15m   10.128.2.100   compute-0   <none>           <none>
test-rc-mks8g   1/1     Running   0          18s   10.128.2.106   compute-0   <none>           <none>
test-rc-t26dw   1/1     Running   0          17s   10.131.0.9     compute-1   <none>           <none>

3. Apply egressIP object
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egressip
spec:
  egressIPs:
  - 139.178.76.20 
  namespaceSelector:
    matchLabels:
      team: red

oc get egressip
NAME       EGRESSIPS       ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip   139.178.76.20   compute-1       139.178.76.20

4. From test pods to access outside websites.



Actual results:
oc rsh -n test hello-pod     
/ # curl ifconfig.me --connect-timeout 5
curl: (7) Failed to connect to ifconfig.me port 80: Operation timed out
/ # 

without patch egressIP, the outgoing traffic works
oc rsh -n test hello-pod
~ $ curl ifconfig.me
139.178.76.9~ $ exit

Expected results:
The outgoing traffic should work with egressIP as source IP



Additional info:

Comment 2 Alexander Constantinescu 2020-09-11 09:57:24 UTC
Hi Huiran

As I mentioned in comment: https://bugzilla.redhat.com/show_bug.cgi?id=1872098#c25, don't use nightly versions to test right now. The release process for nightly versions is broken, so it cannot be trusted. 

I figured out what is wrong on your cluster, the OVN version in that nightly: 4.6.0-0.nightly-2020-09-10-195619 is not correct. It contains:

ovn2.13-host-20.06.1-6.el7fdp.x86_64
ovn2.13-vtep-20.06.1-6.el7fdp.x86_64
ovn2.13-20.06.1-6.el7fdp.x86_64
ovn2.13-central-20.06.1-6.el7fdp.x86_64

It should however contain:

https://github.com/openshift/cluster-network-operator/pull/767#issuecomment-686374055

Without that OVN fix any pod matching an egress IP looses external connectivity, as you've now discovered.

You can use any of the latest green CI builds defined here: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/#4.6.0-0.ci

/Alex