Description of problem: When an egress IP has been configured for a project it is not possible anymore to access routes from pods inside the project. This was seen on Azure but expected to impact AWS as well. Version-Release number of selected component (if applicable): RHOCP 4.6 How reproducible: Always Steps to Reproduce: 1. Configure an egress IP for a project following the documentation 2. Start a pod inside the project 3. oc rsh the pod 4. try to access a route from there Actual results: The route is not accessible curl -v https://myapp.apps.corporoate.com/ * Trying 10.0.116.7... * TCP_NODELAY set (and nothing more) Expected results: The route is accessible and the curl gets an answer from the application behind it. Additional info: Here is what I suspect that happens: The packets fist go through the iptables PREROUTING chain. In this chain there is an entry to go to the KUBE-SERVICES chain. That's where the OpenShift/K8 services are implemented. The destination IP is changed with the IP of one of the routers. Later on in the POSTROUTING chain the source IP of the packets is changed with the egress IP. Packets should go to OVS but are dropped there as they don't have the source IP of the node. Details below. Chain PREROUTING (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */ In this chain among all the services there is: Chain KUBE-SERVICES (2 references) target prot opt source destination KUBE-FW-HEVFQXAKPPGAL4BV tcp -- 0.0.0.0/0 10.0.116.7 /* openshift-ingress/router-default:http loadbalancer IP */ tcp dpt:80 KUBE-FW-MBAZS3WDHL45BPIZ tcp -- 0.0.0.0/0 10.0.116.7 /* openshift-ingress/router-default:https loadbalancer IP */ tcp dpt:443 These 2 entries are being added in for the loadbalancer IP of the router. This is not the case with bare-metal deployments. Like for any other service KUBE-FW-HEVFQXAKPPGAL4BV and KUBE-FW-MBAZS3WDHL45BPIZ change the destination IP to a pod IP, the IP of the router in this case. For instance: Chain KUBE-FW-HEVFQXAKPPGAL4BV (1 references) target prot opt source destination KUBE-XLB-HEVFQXAKPPGAL4BV all -- 0.0.0.0/0 0.0.0.0/0 /* openshift-ingress/router-default:http loadbalancer IP */ KUBE-MARK-DROP all -- 0.0.0.0/0 0.0.0.0/0 /* openshift-ingress/router-default:http loadbalancer IP */ Chain KUBE-XLB-HEVFQXAKPPGAL4BV (2 references) target prot opt source destination KUBE-SVC-HEVFQXAKPPGAL4BV all -- 10.128.0.0/14 0.0.0.0/0 /* Redirect pods trying to reach external loadbalancer VIP to clusterIP */ KUBE-MARK-MASQ all -- 0.0.0.0/0 0.0.0.0/0 /* masquerade LOCAL traffic for openshift-ingress/router-default:http LB IP */ ADDRTYPE match src-type LOCAL KUBE-SVC-HEVFQXAKPPGAL4BV all -- 0.0.0.0/0 0.0.0.0/0 /* route LOCAL traffic for openshift-ingress/router-default:http LB IP to service chain */ ADDRTYPE match src-type LOCAL KUBE-MARK-DROP all -- 0.0.0.0/0 0.0.0.0/0 /* openshift-ingress/router-default:http has no local endpoints */ Chain KUBE-SVC-HEVFQXAKPPGAL4BV (3 references) target prot opt source destination KUBE-SEP-GMF6HD62GMXKGYYH all -- 0.0.0.0/0 0.0.0.0/0 /* openshift-ingress/router-default:http */ statistic mode random probability 0.50000000000 KUBE-SEP-VHMDOZFN5UFV3U5O all -- 0.0.0.0/0 0.0.0.0/0 /* openshift-ingress/router-default:http */ Chain KUBE-SEP-GMF6HD62GMXKGYYH (1 references) target prot opt source destination KUBE-MARK-MASQ all -- 10.130.4.33 0.0.0.0/0 /* openshift-ingress/router-default:http */ DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 /* openshift-ingress/router-default:http */ tcp to:10.130.4.33:80 At this point the destination IP has been changed from the load balancer IP to the IP of a router pod. Looking at POSTROUTING Chain POSTROUTING (policy ACCEPT) target prot opt source destination OPENSHIFT-MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 /* rules for masquerading OpenShift traffic */ KUBE-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes postrouting rules */ Masquerading is done on all pods with the egress IP for packets that have a mark matching 0x6e52f2 and the packets not marked with 0x1/0x1 go to the chain OPENSHIFT-MASQUERADE-2 Chain OPENSHIFT-MASQUERADE (1 references) target prot opt source destination SNAT all -- 10.128.0.0/14 0.0.0.0/0 mark match 0x6e52f2 to:10.0.116.8 RETURN all -- 0.0.0.0/0 0.0.0.0/0 mark match 0x1/0x1 OPENSHIFT-MASQUERADE-2 all -- 10.128.0.0/14 0.0.0.0/0 /* masquerade pod-to-external traffic */ In this chain masquerading is done with the IP of the node except for the packets, which have the IP of a pod as destination Chain OPENSHIFT-MASQUERADE-2 (1 references) target prot opt source destination RETURN all -- 0.0.0.0/0 10.128.0.0/14 /* masquerade pod-to-external traffic */ MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 Coming back to 0x6e52f2 this converts to: 7230194 in decimal. This should be the netid of the project where you the egress IP has been configured. At this point we have the packets that have the egress IP as source IP and router pod IP as destination IP. This should go back to OVS but OVS will not accept external traffic (the egress IP is not the IP of the node) to get directly routed to the pod and the packets get dropped. If my analysis is correct a solution would be to change this rule from: SNAT all -- 10.128.0.0/14 0.0.0.0/0 mark match 0x6e52f2 to:10.0.116.8 to SNAT all -- 10.128.0.0/14 !10.128.0.0/14 mark match 0x6e52f2 to:10.0.116.8 This has the advantage of being a very local change (only impacting egress). I am aware of: bug 1890494 bug 1920232
I could verify that changing SNAT all -- 10.128.0.0/14 0.0.0.0/0 mark match 0x6e52f2 to:10.0.116.8 to SNAT all -- 10.128.0.0/14 !10.128.0.0/14 mark match 0x6e52f2 to:10.0.116.8 allows the pod in the project where an egress IP has been assigned (test-src) to communicate with a route. I ran the following command on the node where the pod of test-src project was running: sudo iptables -t nat -R OPENSHIFT-MASQUERADE 1 -s 10.128.0.0/14 ! -d 10.128.0.0/14 -m mark --mark 0x1e1d286 -j SNAT --to-source 10.0.0.43 After running the command I was able to curl the destination route from the pod in test-src. After reverting the change I was not able to curl the destination route from the pod in test-src. sudo iptables -t nat -R OPENSHIFT-MASQUERADE 1 -s 10.128.0.0/14 -m mark --mark 0x1e1d286 -j SNAT --to-source 10.0.0.43 I will look at creating a pull request. It seems that there are only 2 lines in the source code to be changed: https://github.com/openshift/sdn/blob/release-4.6/pkg/network/node/iptables.go#L230 _, err := n.ipt.EnsureRule(iptables.Prepend, iptables.TableNAT, iptables.Chain("OPENSHIFT-MASQUERADE"), "-s", cidr, "-m", "mark", "--mark", mark, "-j", "SNAT", "--to-source", egressIP) => _, err := n.ipt.EnsureRule(iptables.Prepend, iptables.TableNAT, iptables.Chain("OPENSHIFT-MASQUERADE"), "-s", cidr, "!", "-d", cidr, "-m", "mark", "--mark", mark, "-j", "SNAT", "--to-source", egressIP) https://github.com/openshift/sdn/blob/release-4.6/pkg/network/node/iptables.go#L257 err := n.ipt.DeleteRule(iptables.TableNAT, iptables.Chain("OPENSHIFT-MASQUERADE"), "-s", cidr, "-m", "mark", "--mark", mark, "-j", "SNAT", "--to-source", egressIP) => err := n.ipt.DeleteRule(iptables.TableNAT, iptables.Chain("OPENSHIFT-MASQUERADE"), "-s", cidr, "!", "-d", cidr, "-m", "mark", "--mark", mark, "-j", "SNAT", "--to-source", egressIP)
Created pull request: https://github.com/openshift/sdn/pull/280
*** This bug has been marked as a duplicate of bug 2092166 ***