1551499 – DNS resolution failed inside pod when using flannel

Bug 1551499 - DNS resolution failed inside pod when using flannel

Summary: DNS resolution failed inside pod when using flannel

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.9.z
Assignee:	Vadim Rutkovsky
QA Contact:	Gaoyun Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-05 10:04 UTC by Gaoyun Pei
Modified:	2018-05-17 06:43 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-17 06:42:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
iptables rules (13.81 KB, text/plain) 2018-03-05 10:08 UTC, Gaoyun Pei	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:1566	0	None	None	None	2018-05-17 06:43:17 UTC

Description Gaoyun Pei 2018-03-05 10:04:56 UTC

Description of problem:
Setting up an ocp-3.9 cluster which is using flannel network, found it couldn't resolve domain name from the pod.

[root@ip-172-18-7-127 ~]# oc rsh docker-registry-1-bhghf
sh-4.2$ curl www.google.com
curl: (6) Could not resolve host: www.google.com; Unknown error

It looks like a same issue with BZ#1490820, which was fixed along with BZ#1493955. Checked the iptables rules on master and node, the required rules in BZ#1493955 are both added, but the domain name resolution still not work. The full iptables rules list on the master could be found in attachment.

Also tried the workaround in https://bugzilla.redhat.com/show_bug.cgi?id=1490820#c0, adding the following iptables rules could solve the issue.
# iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 53 -j ACCEPT
# iptables -A OS_FIREWALL_ALLOW -p udp -m state --state NEW -m udp --dport 53 -j ACCEPT


Version-Release number of the following components:
openshift-ansible-3.9.2-1.git.0.1a855b3.el7.noarch.rpm

How reproducible:
Always

Steps to Reproduce:
1. Enable flannel network used in ocp-3.9, check the pod
openshift_use_openshift_sdn=false
openshift_use_flannel=true


Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Gaoyun Pei 2018-03-05 10:08:31 UTC

Created attachment 1404244 [details]
iptables rules

Comment 2 Vadim Rutkovsky 2018-03-05 13:11:14 UTC

Created https://github.com/openshift/openshift-ansible/pull/7380

Comment 3 Vadim Rutkovsky 2018-03-09 09:26:03 UTC

Fix is available in openshift-ansible-3.9.4-1

Comment 4 Gaoyun Pei 2018-03-09 10:14:48 UTC

Verify this bug with openshift-ansible-3.9.4-1.git.0.a49cc04.el7.noarch.rpm.

After deploy a 3.9 cluster with flannel network, pod could access the external network successfully.
[root@ip-172-18-4-216 ~]# oc rsh docker-registry-1-qbrkn
sh-4.2$ ping www.google.com
PING www.google.com (172.217.8.4) 56(84) bytes of data.
64 bytes from iad23s59-in-f4.1e100.net (172.217.8.4): icmp_seq=1 ttl=47 time=1.37 ms
64 bytes from iad23s59-in-f4.1e100.net (172.217.8.4): icmp_seq=2 ttl=47 time=1.45 ms

Comment 6 Gaoyun Pei 2018-04-23 05:53:11 UTC

Move it to verified according to Comment 4

Comment 9 errata-xmlrpc 2018-05-17 06:42:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1566

Note You need to log in before you can comment on or make changes to this bug.