Bug 1557924

Summary: Request for redesigning of Egress DNS architecture
Product: OpenShift Container Platform Reporter: Dmitry Zhukovski <dzhukous>
Component: NetworkingAssignee: Dan Winship <danw>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.1CC: aos-bugs, bbennett, danw, dmoessne, jesse.haka, pkanthal, stwalter, tibrahim
Target Milestone: ---Keywords: NeedsTestCase
Target Release: 3.10.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When using per-namespace static egress IPs, all external traffic is routed through the egress IP. "External" means all traffic which isn't directed to another pod, and so includes traffic from the pod to the pod's node. Consequence: When pods are told to use the node's IP address for DNS, and the pod is using a static egress IP, then DNS traffic will be routed to the egress node first, and then back to the original node, which might be configured to not accept DNS requests from other hosts, causing the pod to be unable to resolve DNS. Fix: pod-to-node DNS requests now bypass the egress IP and go directly to the node Result: DNS works
Story Points: ---
Clone Of:
: 1570398 (view as bug list) Environment:
Last Closed: 2018-07-30 19:10:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1570398, 1570399, 1570400    
Attachments:
Description Flags
Current architecture none

Description Dmitry Zhukovski 2018-03-19 09:27:01 UTC
Description of problem:
Request for redesigning of Egress DNS architecture.
When using egressips in projects - all traffic is forwarded towards openvswitch and to egressip system. However, that this is not perfect way to do this. 

Version-Release number of selected component (if applicable):
3.7

How reproducible:
Always

Steps to Reproduce:
Let say we have one project which have egressip and one pod. This pod gets dnsIP from the node where this container is running. Lets say that node ip address is 192.168.1.5, so the dnsip is then 192.168.1.5. When pod running in this machine tries to resolve something from dns, the dns query is first going to openvswitch, which will forward that to egressnode and egressnode will forward this query back to original node. 

Actual results:
There are problems in this behaviour:
- customer needs to open basically whole node subnet to all projects (default policy in egressnetworkpolicy is default deny all in production). Also customer needs to open this in firewalls. Basically it means that if project has egressip, that project can connect for instance node-exporter statistics in node itself.
- This is really single point of failure, if egressnode goes down. The DNS does not work either in pods anymore. 
- It makes unnecessary traffic to overlay network and subnet.

Expected results:
Its proposes that DNS traffic should not be forwarded to openvswitch, instead it should use default behaviour without egressip. Customer has seen now several times that openshift internal dns in egressip project just breaks because of egressnode.

Additional info:

Comment 1 Dmitry Zhukovski 2018-03-19 09:27:37 UTC
Created attachment 1409753 [details]
Current architecture

Comment 3 Jesse Haka 2018-03-19 10:29:29 UTC
I am kind of asking following: currently this rule https://github.com/openshift/origin/blob/master/pkg/network/node/ovscontroller.go#L714 is in all "normal nodes" if the egressip is located in somewhere else. Could we modify this rule from "forward all traffic" to "forward all traffic except dns OR forward all traffic except currentnodeip"

Comment 4 Jesse Haka 2018-03-19 10:38:39 UTC

currently we have this in cookie=0x0, duration=617721.503s, table=100, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xd594ee actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:192.168.0.199->tun_dst,output:1
 openvswitch. Could we add before this rule:

table=99 priority=x,ip,reg0=0xd594ee nw_dst=nodeip actions=output:2

where x is what? and nodeip is the address of node localip. In this case 192.168.1.5? Then it should not forward destination 192.168.1.5 packages to tun0?

Comment 5 Jesse Haka 2018-03-19 10:41:25 UTC
maybe its better not allow all ports in that local node, so the correct new rule could be

table=99 priority=x,ip,reg0=0xd594ee nw_dst=nodeip,tp_dst=53 actions=output:2

what you think?

Comment 6 Dan Winship 2018-03-19 14:07:58 UTC
There are also problems with egress routers and using the node IP for DNS (bug 1552738). I was thinking that we could fix both of these problems at once by tweaking pods to use the node's tun0 IP address rather than its "eth0" IP address for DNS. I'm not sure if we can make that happen in *all* cases without a bunch of hacks though, so something like this might be better.

Comment 7 Jesse Haka 2018-03-19 14:50:33 UTC
actually we have had dnsip fixed for node configuration in 3.5, 3.6 and in one cluster also in 3.7 where we used 172.17.0.1 as dnsip. That ip address in our case is docker interface ipaddress. It worked pretty well, but we ended to not use that because openshift-ansible is always overriding that and it causes downtime to cluster always when we upgrade something.

Comment 8 Jesse Haka 2018-03-19 14:51:22 UTC
then we opened only 172.17.0.1 from egressnetworkconfiguration to all projects. Now we need basically open 192.168.0.0/16 because dnsip can be any node ip address.

Comment 10 Dan Winship 2018-04-09 17:18:37 UTC
*** Bug 1560651 has been marked as a duplicate of this bug. ***

Comment 11 Dan Winship 2018-04-09 17:19:19 UTC
https://github.com/openshift/origin/pull/19279

Comment 15 Meng Bo 2018-05-17 08:50:09 UTC
Tested on 3.10.0-0.47.0 with step in https://bugzilla.redhat.com/show_bug.cgi?id=1570398#c5

Issue has been fixed.

Comment 16 Dan Winship 2018-05-25 16:13:53 UTC
*** Bug 1582441 has been marked as a duplicate of this bug. ***

Comment 18 errata-xmlrpc 2018-07-30 19:10:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816