Description of problem:
Request for redesigning of Egress DNS architecture.
When using egressips in projects - all traffic is forwarded towards openvswitch and to egressip system. However, that this is not perfect way to do this.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Let say we have one project which have egressip and one pod. This pod gets dnsIP from the node where this container is running. Lets say that node ip address is 192.168.1.5, so the dnsip is then 192.168.1.5. When pod running in this machine tries to resolve something from dns, the dns query is first going to openvswitch, which will forward that to egressnode and egressnode will forward this query back to original node.
There are problems in this behaviour:
- customer needs to open basically whole node subnet to all projects (default policy in egressnetworkpolicy is default deny all in production). Also customer needs to open this in firewalls. Basically it means that if project has egressip, that project can connect for instance node-exporter statistics in node itself.
- This is really single point of failure, if egressnode goes down. The DNS does not work either in pods anymore.
- It makes unnecessary traffic to overlay network and subnet.
Its proposes that DNS traffic should not be forwarded to openvswitch, instead it should use default behaviour without egressip. Customer has seen now several times that openshift internal dns in egressip project just breaks because of egressnode.
Created attachment 1409753 [details]
I am kind of asking following: currently this rule https://github.com/openshift/origin/blob/master/pkg/network/node/ovscontroller.go#L714 is in all "normal nodes" if the egressip is located in somewhere else. Could we modify this rule from "forward all traffic" to "forward all traffic except dns OR forward all traffic except currentnodeip"
currently we have this in cookie=0x0, duration=617721.503s, table=100, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xd594ee actions=move:NXM_NX_REG0->NXM_NX_TUN_ID[0..31],set_field:192.168.0.199->tun_dst,output:1
openvswitch. Could we add before this rule:
table=99 priority=x,ip,reg0=0xd594ee nw_dst=nodeip actions=output:2
where x is what? and nodeip is the address of node localip. In this case 192.168.1.5? Then it should not forward destination 192.168.1.5 packages to tun0?
maybe its better not allow all ports in that local node, so the correct new rule could be
table=99 priority=x,ip,reg0=0xd594ee nw_dst=nodeip,tp_dst=53 actions=output:2
what you think?
There are also problems with egress routers and using the node IP for DNS (bug 1552738). I was thinking that we could fix both of these problems at once by tweaking pods to use the node's tun0 IP address rather than its "eth0" IP address for DNS. I'm not sure if we can make that happen in *all* cases without a bunch of hacks though, so something like this might be better.
actually we have had dnsip fixed for node configuration in 3.5, 3.6 and in one cluster also in 3.7 where we used 172.17.0.1 as dnsip. That ip address in our case is docker interface ipaddress. It worked pretty well, but we ended to not use that because openshift-ansible is always overriding that and it causes downtime to cluster always when we upgrade something.
then we opened only 172.17.0.1 from egressnetworkconfiguration to all projects. Now we need basically open 192.168.0.0/16 because dnsip can be any node ip address.
*** Bug 1560651 has been marked as a duplicate of this bug. ***
Tested on 3.10.0-0.47.0 with step in https://bugzilla.redhat.com/show_bug.cgi?id=1570398#c5
Issue has been fixed.
*** Bug 1582441 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.