1560489 – DNS issues needing restart of dnsmasq service - 'could not resolve host' error

Bug 1560489 - DNS issues needing restart of dnsmasq service - 'could not resolve host' error

Summary: DNS issues needing restart of dnsmasq service - 'could not resolve host' error

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Phil Cameron
QA Contact:	Meng Bo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-26 09:57 UTC by Sudarshan Chaudhari
Modified:	2021-12-10 15:51 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-07-30 19:11:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3393141	0	None	None	None	2018-07-26 18:51:23 UTC
Red Hat Product Errata	RHBA-2018:1816	0	None	None	None	2018-07-30 19:12:11 UTC

Description Sudarshan Chaudhari 2018-03-26 09:57:35 UTC

Description of problem:

dnsmasq service freezes randomly and needs to manually restart the dnsmasq service to start the resolution.

No logs are being captured for the dnsmasq service on the openshift nodes. 

Actual results:
URL resolution fails on the nodes and results in unable resolve from pods as well.

Expected results:
Even though NetworkManager or dnsmasq service restarts the config should be correct and it should keep resolving.

Comment 3 Ben Bennett 2018-03-26 13:50:41 UTC

The recent changes that made the dnsmasq handle all traffic for a node may have overloaded the current limits we have set in our config.  We should see if it makes sense to increase the limits generally, or if we need to make it configurable.

Comment 7 Phil Cameron 2018-04-19 12:17:53 UTC

https://github.com/openshift/openshift-ansible/pull/8042
WIP until this change is verified to work for customer

Comment 9 openshift-github-bot 2018-04-30 12:52:47 UTC

Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/1ca76dd879cb6187fa7137a0e36c36461cea3776
dnsmasq - increase dns-forward-max, cache-size

bug 1560489
https://bugzilla.redhat.com/show_bug.cgi?id=1560489

Signed-off-by: Phil Cameron <pcameron>

Comment 12 Hongan Li 2018-05-17 02:41:57 UTC

verified in atomic-openshift-3.10.0-0.47.0.git.0.2fffa04.el7, the options "dns-forward-max" and "cache-size" has been increased to 10000 as below:

[root@ip-172-18-5-139 ~]# cat /etc/dnsmasq.d/origin-dns.conf 
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
dns-forward-max=10000
cache-size=10000
bind-dynamic
except-interface=lo
# End of config

OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
kernel: Linux ip-172-18-5-139.ec2.internal 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

Comment 17 errata-xmlrpc 2018-07-30 19:11:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816

Comment 19 Alasdair Kergon 2019-11-12 13:29:57 UTC

Clearing all the ignored old needinfo requests, picking a random required subcomponent.

Note You need to log in before you can comment on or make changes to this bug.