Bug 1557924 - Request for redesigning of Egress DNS architecture
Summary: Request for redesigning of Egress DNS architecture
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.7.1
Hardware: x86_64
OS: Linux
Target Milestone: ---
: 3.10.0
Assignee: Dan Winship
QA Contact: Meng Bo
: 1560651 1582441 (view as bug list)
Depends On:
Blocks: 1570398 1570399 1570400
TreeView+ depends on / blocked
Reported: 2018-03-19 09:27 UTC by Dmitry Zhukovski
Modified: 2018-10-18 10:36 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When using per-namespace static egress IPs, all external traffic is routed through the egress IP. "External" means all traffic which isn't directed to another pod, and so includes traffic from the pod to the pod's node. Consequence: When pods are told to use the node's IP address for DNS, and the pod is using a static egress IP, then DNS traffic will be routed to the egress node first, and then back to the original node, which might be configured to not accept DNS requests from other hosts, causing the pod to be unable to resolve DNS. Fix: pod-to-node DNS requests now bypass the egress IP and go directly to the node Result: DNS works
Clone Of:
: 1570398 (view as bug list)
Last Closed: 2018-07-30 19:10:40 UTC
Target Upstream Version:

Attachments (Terms of Use)
Current architecture (49.91 KB, image/png)
2018-03-19 09:27 UTC, Dmitry Zhukovski
no flags Details

System ID Priority Status Summary Last Updated
Origin (Github) 19279 None None None 2018-04-09 17:19:19 UTC
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:11:12 UTC

Description Dmitry Zhukovski 2018-03-19 09:27:01 UTC
Description of problem:
Request for redesigning of Egress DNS architecture.
When using egressips in projects - all traffic is forwarded towards openvswitch and to egressip system. However, that this is not perfect way to do this. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Let say we have one project which have egressip and one pod. This pod gets dnsIP from the node where this container is running. Lets say that node ip address is, so the dnsip is then When pod running in this machine tries to resolve something from dns, the dns query is first going to openvswitch, which will forward that to egressnode and egressnode will forward this query back to original node. 

Actual results:
There are problems in this behaviour:
- customer needs to open basically whole node subnet to all projects (default policy in egressnetworkpolicy is default deny all in production). Also customer needs to open this in firewalls. Basically it means that if project has egressip, that project can connect for instance node-exporter statistics in node itself.
- This is really single point of failure, if egressnode goes down. The DNS does not work either in pods anymore. 
- It makes unnecessary traffic to overlay network and subnet.

Expected results:
Its proposes that DNS traffic should not be forwarded to openvswitch, instead it should use default behaviour without egressip. Customer has seen now several times that openshift internal dns in egressip project just breaks because of egressnode.

Additional info:

Comment 1 Dmitry Zhukovski 2018-03-19 09:27:37 UTC
Created attachment 1409753 [details]
Current architecture

Comment 3 Jesse Haka 2018-03-19 10:29:29 UTC
I am kind of asking following: currently this rule https://github.com/openshift/origin/blob/master/pkg/network/node/ovscontroller.go#L714 is in all "normal nodes" if the egressip is located in somewhere else. Could we modify this rule from "forward all traffic" to "forward all traffic except dns OR forward all traffic except currentnodeip"

Comment 4 Jesse Haka 2018-03-19 10:38:39 UTC

currently we have this in cookie=0x0, duration=617721.503s, table=100, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xd594ee actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:>tun_dst,output:1
 openvswitch. Could we add before this rule:

table=99 priority=x,ip,reg0=0xd594ee nw_dst=nodeip actions=output:2

where x is what? and nodeip is the address of node localip. In this case Then it should not forward destination packages to tun0?

Comment 5 Jesse Haka 2018-03-19 10:41:25 UTC
maybe its better not allow all ports in that local node, so the correct new rule could be

table=99 priority=x,ip,reg0=0xd594ee nw_dst=nodeip,tp_dst=53 actions=output:2

what you think?

Comment 6 Dan Winship 2018-03-19 14:07:58 UTC
There are also problems with egress routers and using the node IP for DNS (bug 1552738). I was thinking that we could fix both of these problems at once by tweaking pods to use the node's tun0 IP address rather than its "eth0" IP address for DNS. I'm not sure if we can make that happen in *all* cases without a bunch of hacks though, so something like this might be better.

Comment 7 Jesse Haka 2018-03-19 14:50:33 UTC
actually we have had dnsip fixed for node configuration in 3.5, 3.6 and in one cluster also in 3.7 where we used as dnsip. That ip address in our case is docker interface ipaddress. It worked pretty well, but we ended to not use that because openshift-ansible is always overriding that and it causes downtime to cluster always when we upgrade something.

Comment 8 Jesse Haka 2018-03-19 14:51:22 UTC
then we opened only from egressnetworkconfiguration to all projects. Now we need basically open because dnsip can be any node ip address.

Comment 10 Dan Winship 2018-04-09 17:18:37 UTC
*** Bug 1560651 has been marked as a duplicate of this bug. ***

Comment 11 Dan Winship 2018-04-09 17:19:19 UTC

Comment 15 Meng Bo 2018-05-17 08:50:09 UTC
Tested on 3.10.0-0.47.0 with step in https://bugzilla.redhat.com/show_bug.cgi?id=1570398#c5

Issue has been fixed.

Comment 16 Dan Winship 2018-05-25 16:13:53 UTC
*** Bug 1582441 has been marked as a duplicate of this bug. ***

Comment 18 errata-xmlrpc 2018-07-30 19:10:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.