Bug 1557924
Summary: | Request for redesigning of Egress DNS architecture | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Dmitry Zhukovski <dzhukous> | ||||
Component: | Networking | Assignee: | Dan Winship <danw> | ||||
Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.7.1 | CC: | aos-bugs, bbennett, danw, dmoessne, jesse.haka, pkanthal, stwalter, tibrahim | ||||
Target Milestone: | --- | Keywords: | NeedsTestCase | ||||
Target Release: | 3.10.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: When using per-namespace static egress IPs, all external traffic is routed through the egress IP. "External" means all traffic which isn't directed to another pod, and so includes traffic from the pod to the pod's node.
Consequence: When pods are told to use the node's IP address for DNS, and the pod is using a static egress IP, then DNS traffic will be routed to the egress node first, and then back to the original node, which might be configured to not accept DNS requests from other hosts, causing the pod to be unable to resolve DNS.
Fix: pod-to-node DNS requests now bypass the egress IP and go directly to the node
Result: DNS works
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1570398 (view as bug list) | Environment: | |||||
Last Closed: | 2018-07-30 19:10:40 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1570398, 1570399, 1570400 | ||||||
Attachments: |
|
Description
Dmitry Zhukovski
2018-03-19 09:27:01 UTC
Created attachment 1409753 [details]
Current architecture
I am kind of asking following: currently this rule https://github.com/openshift/origin/blob/master/pkg/network/node/ovscontroller.go#L714 is in all "normal nodes" if the egressip is located in somewhere else. Could we modify this rule from "forward all traffic" to "forward all traffic except dns OR forward all traffic except currentnodeip" currently we have this in cookie=0x0, duration=617721.503s, table=100, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xd594ee actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:192.168.0.199->tun_dst,output:1 openvswitch. Could we add before this rule: table=99 priority=x,ip,reg0=0xd594ee nw_dst=nodeip actions=output:2 where x is what? and nodeip is the address of node localip. In this case 192.168.1.5? Then it should not forward destination 192.168.1.5 packages to tun0? maybe its better not allow all ports in that local node, so the correct new rule could be table=99 priority=x,ip,reg0=0xd594ee nw_dst=nodeip,tp_dst=53 actions=output:2 what you think? There are also problems with egress routers and using the node IP for DNS (bug 1552738). I was thinking that we could fix both of these problems at once by tweaking pods to use the node's tun0 IP address rather than its "eth0" IP address for DNS. I'm not sure if we can make that happen in *all* cases without a bunch of hacks though, so something like this might be better. actually we have had dnsip fixed for node configuration in 3.5, 3.6 and in one cluster also in 3.7 where we used 172.17.0.1 as dnsip. That ip address in our case is docker interface ipaddress. It worked pretty well, but we ended to not use that because openshift-ansible is always overriding that and it causes downtime to cluster always when we upgrade something. then we opened only 172.17.0.1 from egressnetworkconfiguration to all projects. Now we need basically open 192.168.0.0/16 because dnsip can be any node ip address. *** Bug 1560651 has been marked as a duplicate of this bug. *** Tested on 3.10.0-0.47.0 with step in https://bugzilla.redhat.com/show_bug.cgi?id=1570398#c5 Issue has been fixed. *** Bug 1582441 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816 |