Bug 1490820

Summary:	listen-address in dnsmasq when using flannel unreachable by pods
Product:	OpenShift Container Platform	Reporter:	Eduardo Minguez <eminguez>
Component:	Networking	Assignee:	Rajat Chopra <rchopra>
Status:	CLOSED DUPLICATE	QA Contact:	Meng Bo <bmeng>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	3.6.1	CC:	aos-bugs, bbennett, bdobreli, eminguez, erich, ghuang, jkaur
Target Milestone:	---
Target Release:	3.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-12-07 14:12:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Eduardo Minguez 2017-09-12 10:15:26 UTC

Description of problem:
When installing OCP using flannel and a second network interface for container traffic, the /etc/resolv.conf file copied into the pods shows an unreachable ip for the pods.

Version-Release number of selected component (if applicable):
oc v3.6.173.0.21
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ocp.edu.flannel.com:8443
openshift v3.6.173.0.21
kubernetes v1.6.1+5115d708d7

How reproducible:
Create an OCP cluster as the reference architecture using flannel (so eth0 a.b.c.d and eth1 w.x.y.z).
The /etc/resolv.conf in the pod shows the eth0 interface, that is provided by the "listen-address=a.b.c.d" setting in the /etc/dnsmasq.d/origin-dns.conf file and it is not reachable by the pod.


Steps to Reproduce:
1. Create the OCP cluster
2. Connect to any pod running in the cluster
3. Check /etc/resolv.conf
4. Try to resolve anything from that DNS

Actual results:
$ oc rsh docker-registry-1-14tm0
sh-4.2$ curl canihazip.com
curl: (6) Could not resolve host: canihazip.com; Unknown error
sh-4.2$ cat /etc/resolv.conf 
nameserver 10.19.115.248
search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal edu.flannel.com
options ndots:5

Expected results:
Resolve the DSN entry

Additional info:
Adding the following iptables rules solves the issue
# iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 53 -j ACCEPT
# iptables -A OS_FIREWALL_ALLOW -p udp -m state --state NEW -m udp --dport 53 -j ACCEPT

But I think there are two issues:

* Listen address should include the eth1 ip in the dnsmasq configuration
* The resolv.conf copied to the pod should point to that ip (I think this parameter is dnsIP in the node-config.yaml)

Comment 1 Bogdan Dobrelya 2017-09-27 15:21:07 UTC

Just a side note, I do not thing iptables rules would be a good fix in the end of the day, keeping in mind https://docs.openshift.com/container-platform/latest/admin_guide/iptables.html#iptables-service that makes any iptables rules ephemeral, which is a node boot time only (IIUC). Perhaps the better fix would be to fix the dnsmasq config and/or better document the iptables persistence caveats for openshift or provide poor users like me some help with translating iptables rules for firewalld.

Comment 2 Bogdan Dobrelya 2017-09-27 16:12:39 UTC

Proposed openshift-ansible fix https://github.com/openshift/openshift-ansible/pull/5560

Comment 3 Bogdan Dobrelya 2017-09-27 16:24:46 UTC

As I understood, the os_firewall_manage_iptables provider works fine for simple rules and can manage to handle this case fully, therefore the proposed fix based on iptables rules.

Although I'm not sure how to handle advanced flannel configuration steps described in https://bugzilla.redhat.com/show_bug.cgi?id=1490960, like masquerade rules. But that's another story.

Comment 4 Eduardo Minguez 2017-10-02 09:32:21 UTC

I think the DNS issue happens because the nodes have just one network interface as in the reference architecture the DNS iptables rules are not needed.

Comment 7 Rajat Chopra 2017-10-23 15:01:58 UTC

This requires an enhancement in Ansible so that dnsIP is overridable by flannel. Alternatively the fact 'ansible_default_ipv4' can be set to the desired interface's IP address either by the playbook or by changing the default route on the host.
(i.e. the interface that will route 8.8.8.8)

Comment 8 Rajat Chopra 2017-10-23 15:21:10 UTC

Workaround fix will be as per this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1493955
That means, the above bug's fixes will cover this bug's problems also.

Comment 9 Ben Bennett 2017-12-07 14:12:56 UTC


*** This bug has been marked as a duplicate of bug 1493955 ***