Bug 1937194 - Routes not accessible from projects with egress IP configured on cloud providers
Summary: Routes not accessible from projects with egress IP configured on cloud providers
Keywords:
Status: CLOSED DUPLICATE of bug 2092166
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Patryk Diak
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-10 06:11 UTC by Frederic Giloux
Modified: 2022-07-20 08:57 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-20 08:57:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Frederic Giloux 2021-03-10 06:11:14 UTC
Description of problem:
When an egress IP has been configured for a project it is not possible anymore to access routes from pods inside the project. This was seen on Azure but expected to impact AWS as well.

Version-Release number of selected component (if applicable):
RHOCP 4.6

How reproducible:
Always

Steps to Reproduce:
1. Configure an egress IP for a project following the documentation
2. Start a pod inside the project
3. oc rsh the pod
4. try to access a route from there

Actual results:
The route is not accessible
curl -v https://myapp.apps.corporoate.com/
*   Trying 10.0.116.7...
* TCP_NODELAY set
(and nothing more)

Expected results:
The route is accessible and the curl gets an answer from the application behind it.


Additional info:
Here is what I suspect that happens:
The packets fist go through the iptables PREROUTING chain. In this chain there is an entry to go to the KUBE-SERVICES chain. That's where the OpenShift/K8 services are implemented. The destination IP is changed with the IP of one of the routers. Later on in the POSTROUTING chain the source IP of the packets is changed with the egress IP. Packets should go to OVS but are dropped there as they don't have the source IP of the node.

Details below.


Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

In this chain among all the services there is:

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-FW-HEVFQXAKPPGAL4BV  tcp  --  0.0.0.0/0            10.0.116.7           /* openshift-ingress/router-default:http loadbalancer IP */ tcp dpt:80
KUBE-FW-MBAZS3WDHL45BPIZ  tcp  --  0.0.0.0/0            10.0.116.7           /* openshift-ingress/router-default:https loadbalancer IP */ tcp dpt:443

These 2 entries are being added in for the loadbalancer IP of the router. This is not the case with bare-metal deployments.
Like for any other service KUBE-FW-HEVFQXAKPPGAL4BV and KUBE-FW-MBAZS3WDHL45BPIZ change the destination IP to a pod IP, the IP of the router in this case. For instance:

Chain KUBE-FW-HEVFQXAKPPGAL4BV (1 references)
target     prot opt source               destination
KUBE-XLB-HEVFQXAKPPGAL4BV  all  --  0.0.0.0/0            0.0.0.0/0            /* openshift-ingress/router-default:http loadbalancer IP */
KUBE-MARK-DROP  all  --  0.0.0.0/0            0.0.0.0/0            /* openshift-ingress/router-default:http loadbalancer IP */

Chain KUBE-XLB-HEVFQXAKPPGAL4BV (2 references)
target     prot opt source               destination
KUBE-SVC-HEVFQXAKPPGAL4BV  all  --  10.128.0.0/14        0.0.0.0/0            /* Redirect pods trying to reach external loadbalancer VIP to clusterIP */
KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0            /* masquerade LOCAL traffic for openshift-ingress/router-default:http LB IP */ ADDRTYPE match src-type LOCAL
KUBE-SVC-HEVFQXAKPPGAL4BV  all  --  0.0.0.0/0            0.0.0.0/0            /* route LOCAL traffic for openshift-ingress/router-default:http LB IP to service chain */ ADDRTYPE match src-type LOCAL
KUBE-MARK-DROP  all  --  0.0.0.0/0            0.0.0.0/0            /* openshift-ingress/router-default:http has no local endpoints */

Chain KUBE-SVC-HEVFQXAKPPGAL4BV (3 references)
target     prot opt source               destination
KUBE-SEP-GMF6HD62GMXKGYYH  all  --  0.0.0.0/0            0.0.0.0/0            /* openshift-ingress/router-default:http */ statistic mode random probability 0.50000000000
KUBE-SEP-VHMDOZFN5UFV3U5O  all  --  0.0.0.0/0            0.0.0.0/0            /* openshift-ingress/router-default:http */

Chain KUBE-SEP-GMF6HD62GMXKGYYH (1 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all  --  10.130.4.33          0.0.0.0/0            /* openshift-ingress/router-default:http */
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            /* openshift-ingress/router-default:http */ tcp to:10.130.4.33:80

At this point the destination IP has been changed from the load balancer IP to the IP of a router pod.

Looking at POSTROUTING

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
OPENSHIFT-MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* rules for masquerading OpenShift traffic */
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */

Masquerading is done on all pods with the egress IP for packets that have a mark matching 0x6e52f2 and the packets not marked with 0x1/0x1 go to the chain OPENSHIFT-MASQUERADE-2 
Chain OPENSHIFT-MASQUERADE (1 references)
target     prot opt source               destination
SNAT       all  --  10.128.0.0/14        0.0.0.0/0            mark match 0x6e52f2 to:10.0.116.8
RETURN     all  --  0.0.0.0/0            0.0.0.0/0            mark match 0x1/0x1
OPENSHIFT-MASQUERADE-2  all  --  10.128.0.0/14        0.0.0.0/0            /* masquerade pod-to-external traffic */

In this chain masquerading is done with the IP of the node except for the packets, which have the IP of a pod as destination
Chain OPENSHIFT-MASQUERADE-2 (1 references)
target     prot opt source               destination
RETURN     all  --  0.0.0.0/0            10.128.0.0/14        /* masquerade pod-to-external traffic */
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0

Coming back to 0x6e52f2 this converts to: 7230194 in decimal. This should be the netid of the project where you the egress IP has been configured.

At this point we have the packets that have the egress IP as source IP and router pod IP as destination IP. This should go back to OVS but OVS will not accept external traffic (the egress IP is not the IP of the node) to get directly routed to the pod and the packets get dropped.

If my analysis is correct a solution would be to change this rule from:
SNAT       all  --  10.128.0.0/14        0.0.0.0/0            mark match 0x6e52f2 to:10.0.116.8
to
SNAT       all  --  10.128.0.0/14        !10.128.0.0/14            mark match 0x6e52f2 to:10.0.116.8

This has the advantage of being a very local change (only impacting egress).

I am aware of:
bug 1890494
bug 1920232

Comment 1 Frederic Giloux 2021-03-25 06:17:19 UTC
I could verify that changing
SNAT       all  --  10.128.0.0/14        0.0.0.0/0            mark match 0x6e52f2 to:10.0.116.8
to
SNAT       all  --  10.128.0.0/14        !10.128.0.0/14            mark match 0x6e52f2 to:10.0.116.8

allows the pod in the project where an egress IP has been assigned (test-src) to communicate with a route.

I ran the following command on the node where the pod of test-src project was running:
sudo iptables -t nat -R OPENSHIFT-MASQUERADE 1 -s 10.128.0.0/14 ! -d 10.128.0.0/14 -m mark --mark 0x1e1d286 -j SNAT --to-source 10.0.0.43

After running the command I was able to curl the destination route from the pod in test-src.
After reverting the change I was not able to curl the destination route from the pod in test-src.
sudo iptables -t nat -R OPENSHIFT-MASQUERADE 1 -s 10.128.0.0/14 -m mark --mark 0x1e1d286 -j SNAT --to-source 10.0.0.43

I will look at creating a pull request. It seems that there are only 2 lines in the source code to be changed:
https://github.com/openshift/sdn/blob/release-4.6/pkg/network/node/iptables.go#L230
_, err := n.ipt.EnsureRule(iptables.Prepend, iptables.TableNAT, iptables.Chain("OPENSHIFT-MASQUERADE"), "-s", cidr, "-m", "mark", "--mark", mark, "-j", "SNAT", "--to-source", egressIP)
=>
_, err := n.ipt.EnsureRule(iptables.Prepend, iptables.TableNAT, iptables.Chain("OPENSHIFT-MASQUERADE"), "-s", cidr, "!", "-d", cidr, "-m", "mark", "--mark", mark, "-j", "SNAT", "--to-source", egressIP)

https://github.com/openshift/sdn/blob/release-4.6/pkg/network/node/iptables.go#L257
err := n.ipt.DeleteRule(iptables.TableNAT, iptables.Chain("OPENSHIFT-MASQUERADE"), "-s", cidr, "-m", "mark", "--mark", mark, "-j", "SNAT", "--to-source", egressIP)
=>
err := n.ipt.DeleteRule(iptables.TableNAT, iptables.Chain("OPENSHIFT-MASQUERADE"), "-s", cidr, "!", "-d", cidr, "-m", "mark", "--mark", mark, "-j", "SNAT", "--to-source", egressIP)

Comment 2 Frederic Giloux 2021-03-25 09:07:13 UTC
Created pull request: https://github.com/openshift/sdn/pull/280

Comment 31 Patryk Diak 2022-07-20 08:57:22 UTC

*** This bug has been marked as a duplicate of bug 2092166 ***


Note You need to log in before you can comment on or make changes to this bug.