Bug 2075475 - OVN-Kubernetes: egress router pod (redirect mode), access from pod on different worker-node (redirect) doesn't work
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
: 4.11.0
Assignee: Andreas Karis
QA Contact: Weibin Liang
Blocks: 2083593
Reported: 2022-04-14 10:33 UTC by Rainer Beyel
Modified: 2022-08-10 11:07 UTC (History)
8 users (show)

Doc Type: Bug Fix
Cause: The egress-router-cni relied on the gateway field of the CNI definition to delete the default route and inject its own. However, the CNI standard clearly defines that annotation k8s.v1.cni.cncf.io/networks: { "default-route" : ["{{.Gateway}}"] should be used for scenarios where a default route shall be injected via an additional network. Consequence: The egress-router-cni pods lacked some cluster internal routes which are usually injected by the CNI plugin / SDN provider. Thus, the pods could not reach some cluster internal destinations. Fix: Use k8s.v1.cni.cncf.io/networks: { "default-route" : ["{{.Gateway}}"] to inject correct routing information into the egress-router-cni pods. Result: egress-router-cni pods can reach external and cluster internal destinations.
Last Closed: 2022-08-10 11:07:06 UTC
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1390 0 None open Bug 2075475: Add default-route field to egress-router k8s.v1.cni.cncf.io/networks 2022-04-22 10:39:40 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:07:24 UTC

Description Rainer Beyel 2022-04-14 10:33:01 UTC
Description of problem:
  OVN-Kubernetes: egress router pod (redirect mode), access from pod on different worker-node (redirect) doesn't work

Version-Release number of selected component (if applicable):
  OCP 4.9.24, 4.9.26, 4.10.9

How reproducible:

- Installed
  - OCP 4.9.24 UPI bare-metal(libvirt) with OVN
    - 3 master, 3 worker
  - External application
    - while true; do cat netcat_curl.txt | nc -l 1234; done
- Created project/namespace "rregress"

- Added following label to all worker nodes
  - "k8s.ovn.org/egress-assignable" [1]
- Deployed EgressRouter
  - "... egress IP address ... must be in the same subnet as the primary IP address of the node ..." [1]
  - "... The additional IP address must not be assigned to any other node in the cluster ..." [1]
  - ip: "",
    gateway: ""
  - destinationIP: "",
    port: 1234
  - Pod egress-router-cni-deployment... running on worker3
- Created Service "egress-1" [2]

- # oc rsh egress-router-cni-deployment...      
  sh-4.4$ ip a s net1 | grep "inet "
  inet brd scope global net1

  sh-4.4$ curl 

- Created deployment for simple "test pods"
  - Replicaset 3, so 1 pod on each worker node

  - pod on worker1
    sh-4.4$ curl --max-time 5 egress-1:1234
    curl: (28) Connection timed out after 5000 milliseconds

  - pod on worker2
    sh-4.4$ curl --max-time 5 egress-1:1234
    curl: (28) Connection timed out after 5000 milliseconds

  - pod on worker3
    sh-4.4$ curl --max-time 5 egress-1:1234
- If both pods (test-pod and egress-pod) are on the same worker node (in this scenario worker3) it's working
- I tried the pod-IP (egress-pod) instead of "egress-1" (service), same results

Actual results:
  - test-pod and egress-pod on same (worker) node:
    test-pod --curl--> service --> egress-pod --> external-app--> successful

  - test-pod and egress-pod on different (worker) nodes:
    No connection to external application

Expected results:
  Regardless of which (worker) nodes test-pod and egress-pod are running on, the connection (to the external-app) succeeds

Additional info:
  1) https://docs.openshift.com/container-platform/4.9/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.html
  2) https://docs.openshift.com/container-platform/4.9/networking/ovn_kubernetes_network_provider/using-an-egress-router-ovn.html

Comment 3 Mohamed Mahmoud 2022-04-18 16:11:21 UTC
this link has more details about this feature and its configuration and debugging 

I have the following questions :-
- what is platform is it baremetal ?
- hmm was it intentional to have the external ip in the same subnet as the nodes IP ?
- can u describe the svc wanted to make sure it was labelled correctly 
- can u connect the node where egress router CNI is running on and collect the logs cat /tmp/egress-router-log, ip add 

the way this work is the egress router will act as bridge between pods and external system, egress router will have two interfaces eth0 for cluster internal networking and mcavlan0 has an IP and gateway from the external physical network.

Comment 4 Rainer Beyel 2022-04-19 10:10:57 UTC
(In reply to Mohamed Mahmoud from comment #3)
> I have the following questions :-
> - what is platform is it baremetal ?

  - yes, tested it with UPI (libvirt) "bare-metal" (comment 0)
  -  Customer also observes the issue on bare-metal

> - hmm was it intentional to have the external ip in the same subnet as the
> nodes IP ?

  - yes, I chose the same subnet (in my testenvironment) to keep it simple

> - can u describe the svc wanted to make sure it was labelled correctly 

  - I'll attach "service_egress-1.txt"

> - can u connect the node where egress router CNI is running on and collect
> the logs cat /tmp/egress-router-log, ip add 

  - I'll attach "worker1_egress-router-log.txt", "worker1_ip_address.txt", "egress-router-cni-deployment_ip_address.txt"

P.S. the "egress router pod" is currently running on worker1 (initially it was worker3)

Comment 5 Rainer Beyel 2022-04-19 10:12:08 UTC
Created attachment 1873486 [details]

Comment 6 Rainer Beyel 2022-04-19 10:13:14 UTC
Created attachment 1873487 [details]

Comment 7 Rainer Beyel 2022-04-19 10:21:11 UTC
Created attachment 1873490 [details]

Comment 8 Rainer Beyel 2022-04-19 10:21:48 UTC
Created attachment 1873492 [details]

Comment 9 Mohamed Mahmoud 2022-04-19 12:05:11 UTC
just to be sure customer dropped into the different test pod shell and ran curl ?
have we tried to create test pods 1st and scale them to whatever # then deploy egress router then create ClusterIP svc

would like to collect pcap files for working and none working curl to make sure iptables rules took effect and we see DNAT and SNAT took place

Comment 10 Weibin Liang 2022-04-19 14:22:05 UTC
QE reproduced this problem on local testing cluster:

[weliang@weliang tmp]$ oc get pod -o wide
NAME                                            READY   STATUS    RESTARTS   AGE    IP                NODE         NOMINATED NODE   READINESS GATES
egress-router-cni-deployment-5d659496ff-wn4rf   1/1     Running   0          14m       worker-0-0   <none>           <none>
test-pod-86879d8c8c-5jh5s                       1/1     Running   0          7m2s       worker-0-1   <none>           <none>
test-pod-86879d8c8c-c4cbv                       1/1     Running   0          7m2s       worker-0-0   <none>           <none>
test-pod-86879d8c8c-mc9xh                       1/1     Running   0          7m2s       worker-0-0   <none>           <none>
test-pod-86879d8c8c-n8dk8                       1/1     Running   0          7m2s       worker-0-1   <none>           <none>
test-pod-86879d8c8c-q97pj                       1/1     Running   0          7m2s       worker-0-0   <none>           <none>
test-pod-86879d8c8c-tzsqw                       1/1     Running   0          7m2s       worker-0-1   <none>           <none>
worker-0-0-debug                                1/1     Running   0          13m   worker-0-0   <none>           <none>
[weliang@weliang tmp]$ 

[weliang@weliang tmp]$ oc exec $pod --  curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:19 --:--:--     0^C
[weliang@weliang tmp]$ oc exec test-pod-86879d8c8c-mc9xh --  curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
100   219  100   219    0     0    793      0 --:--:-- --:--:-- --:--:--   796
[weliang@weliang tmp]$ oc exec test-pod-86879d8c8c-q97pj --  curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   219  100   219    0     0    820      0 --:--:-- --:--:-- --:--:--   817
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
[weliang@weliang tmp]$ oc exec test-pod-86879d8c8c-5jh5s --  curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:02:08 --:--:--     0
curl: (28) Failed to connect to port 80: Operation timed out
command terminated with exit code 28
[weliang@weliang tmp]$

Comment 11 Mohamed Mahmoud 2022-04-19 16:03:12 UTC
do we have both egressIP and egress-router configs on the same cluster ? can we get must-gather wanted also to know if ovnk is running with shared-gateway or local-gateway mode 

in theory svc that is tagged with egress router will be backed up by egress-router pod so traffic from any pod anywhere should reach to egress-router pod and traffic get redirected to destination

Comment 12 Weibin Liang 2022-04-19 19:04:13 UTC
Testing also failed on 4.8.33

[weliang@weliang tmp]$ oc exec test-pod-6686bd4977-z5lmm -- curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   219  100   219    0     0    793      0 --:--:-- --:--:-- --:--:--   793
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
[weliang@weliang tmp]$ oc exec test-pod-6686bd4977-7kclb -- curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:02:11 --:--:--     0
curl: (28) Failed to connect to port 80: Operation timed out
command terminated with exit code 28
[weliang@weliang tmp]$

Comment 24 Weibin Liang 2022-05-05 18:08:29 UTC
Tested and verified in 4.11.0-0.nightly-2022-05-05-015322

[weliang@weliang Test]$ oc get pod -o wide
NAME                                            READY   STATUS    RESTARTS   AGE    IP             NODE                                      NOMINATED NODE   READINESS GATES
dell-per740-14rhtsengpek2redhatcom-debug        1/1     Running   0          4m9s   dell-per740-14.rhts.eng.pek2.redhat.com   <none>           <none>
egress-router-cni-deployment-7f89795b59-jvxtb   1/1     Running   0          59s    dell-per740-35.rhts.eng.pek2.redhat.com   <none>           <none>
test-pod-86879d8c8c-87pbz                       1/1     Running   0          20s    dell-per740-35.rhts.eng.pek2.redhat.com   <none>           <none>
test-pod-86879d8c8c-9rsjl                       1/1     Running   0          20s    dell-per740-14.rhts.eng.pek2.redhat.com   <none>           <none>
test-pod-86879d8c8c-gw847                       1/1     Running   0          20s    dell-per740-14.rhts.eng.pek2.redhat.com   <none>           <none>
test-pod-86879d8c8c-nmtxd                       1/1     Running   0          20s    dell-per740-14.rhts.eng.pek2.redhat.com   <none>           <none>
test-pod-86879d8c8c-q462m                       1/1     Running   0          20s    dell-per740-35.rhts.eng.pek2.redhat.com   <none>           <none>
test-pod-86879d8c8c-x6zzk                       1/1     Running   0          20s    dell-per740-35.rhts.eng.pek2.redhat.com   <none>           <none>
[weliang@weliang Test]$ oc exec test-pod-86879d8c8c-9rsjl -- curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   219  100   219    0     0    361      0 --:--:-- --:--:-- --:--:-- <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
[weliang@weliang Test]$ oc exec test-pod-86879d8c8c-q462m -- curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   219  100   219    0     0    347      0 --:--:-- --:--:-- --:--:--   347
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
[weliang@weliang Test]$ 
[weliang@weliang Test]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-05-05-015322   True        False         33m     Cluster version is 4.11.0-0.nightly-2022-05-05-015322
[weliang@weliang Test]$

Comment 30 errata-xmlrpc 2022-08-10 11:07:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


