|Summary:||listen-address in dnsmasq when using flannel unreachable by pods|
|Product:||OpenShift Container Platform||Reporter:||Eduardo Minguez <eminguez>|
|Component:||Networking||Assignee:||Rajat Chopra <rchopra>|
|Status:||CLOSED DUPLICATE||QA Contact:||Meng Bo <bmeng>|
|Version:||3.6.1||CC:||aos-bugs, bbennett, bdobreli, eminguez, erich, ghuang, jkaur|
|Fixed In Version:||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|Last Closed:||2017-12-07 14:12:56 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Eduardo Minguez 2017-09-12 10:15:26 UTC
Description of problem: When installing OCP using flannel and a second network interface for container traffic, the /etc/resolv.conf file copied into the pods shows an unreachable ip for the pods. Version-Release number of selected component (if applicable): oc v220.127.116.11.21 kubernetes v1.6.1+5115d708d7 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ocp.edu.flannel.com:8443 openshift v18.104.22.168.21 kubernetes v1.6.1+5115d708d7 How reproducible: Create an OCP cluster as the reference architecture using flannel (so eth0 a.b.c.d and eth1 w.x.y.z). The /etc/resolv.conf in the pod shows the eth0 interface, that is provided by the "listen-address=a.b.c.d" setting in the /etc/dnsmasq.d/origin-dns.conf file and it is not reachable by the pod. Steps to Reproduce: 1. Create the OCP cluster 2. Connect to any pod running in the cluster 3. Check /etc/resolv.conf 4. Try to resolve anything from that DNS Actual results: $ oc rsh docker-registry-1-14tm0 sh-4.2$ curl canihazip.com curl: (6) Could not resolve host: canihazip.com; Unknown error sh-4.2$ cat /etc/resolv.conf nameserver 10.19.115.248 search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal edu.flannel.com options ndots:5 Expected results: Resolve the DSN entry Additional info: Adding the following iptables rules solves the issue # iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 53 -j ACCEPT # iptables -A OS_FIREWALL_ALLOW -p udp -m state --state NEW -m udp --dport 53 -j ACCEPT But I think there are two issues: * Listen address should include the eth1 ip in the dnsmasq configuration * The resolv.conf copied to the pod should point to that ip (I think this parameter is dnsIP in the node-config.yaml)
Comment 1 Bogdan Dobrelya 2017-09-27 15:21:07 UTC
Just a side note, I do not thing iptables rules would be a good fix in the end of the day, keeping in mind https://docs.openshift.com/container-platform/latest/admin_guide/iptables.html#iptables-service that makes any iptables rules ephemeral, which is a node boot time only (IIUC). Perhaps the better fix would be to fix the dnsmasq config and/or better document the iptables persistence caveats for openshift or provide poor users like me some help with translating iptables rules for firewalld.
Comment 2 Bogdan Dobrelya 2017-09-27 16:12:39 UTC
Proposed openshift-ansible fix https://github.com/openshift/openshift-ansible/pull/5560
Comment 3 Bogdan Dobrelya 2017-09-27 16:24:46 UTC
As I understood, the os_firewall_manage_iptables provider works fine for simple rules and can manage to handle this case fully, therefore the proposed fix based on iptables rules. Although I'm not sure how to handle advanced flannel configuration steps described in https://bugzilla.redhat.com/show_bug.cgi?id=1490960, like masquerade rules. But that's another story.
Comment 4 Eduardo Minguez 2017-10-02 09:32:21 UTC
I think the DNS issue happens because the nodes have just one network interface as in the reference architecture the DNS iptables rules are not needed.
Comment 7 Rajat Chopra 2017-10-23 15:01:58 UTC
This requires an enhancement in Ansible so that dnsIP is overridable by flannel. Alternatively the fact 'ansible_default_ipv4' can be set to the desired interface's IP address either by the playbook or by changing the default route on the host. (i.e. the interface that will route 22.214.171.124)
Comment 8 Rajat Chopra 2017-10-23 15:21:10 UTC
Workaround fix will be as per this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1493955 That means, the above bug's fixes will cover this bug's problems also.