Bug 1560489 - DNS issues needing restart of dnsmasq service - 'could not resolve host' error
Summary: DNS issues needing restart of dnsmasq service - 'could not resolve host' error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.10.0
Assignee: Phil Cameron
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-26 09:57 UTC by Sudarshan Chaudhari
Modified: 2021-12-10 15:51 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-30 19:11:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3393141 0 None None None 2018-07-26 18:51:23 UTC
Red Hat Product Errata RHBA-2018:1816 0 None None None 2018-07-30 19:12:11 UTC

Description Sudarshan Chaudhari 2018-03-26 09:57:35 UTC
Description of problem:

dnsmasq service freezes randomly and needs to manually restart the dnsmasq service to start the resolution.

No logs are being captured for the dnsmasq service on the openshift nodes. 

Actual results:
URL resolution fails on the nodes and results in unable resolve from pods as well.

Expected results:
Even though NetworkManager or dnsmasq service restarts the config should be correct and it should keep resolving.

Comment 3 Ben Bennett 2018-03-26 13:50:41 UTC
The recent changes that made the dnsmasq handle all traffic for a node may have overloaded the current limits we have set in our config.  We should see if it makes sense to increase the limits generally, or if we need to make it configurable.

Comment 7 Phil Cameron 2018-04-19 12:17:53 UTC
https://github.com/openshift/openshift-ansible/pull/8042
WIP until this change is verified to work for customer

Comment 9 openshift-github-bot 2018-04-30 12:52:47 UTC
Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/1ca76dd879cb6187fa7137a0e36c36461cea3776
dnsmasq - increase dns-forward-max, cache-size

bug 1560489
https://bugzilla.redhat.com/show_bug.cgi?id=1560489

Signed-off-by: Phil Cameron <pcameron>

Comment 12 Hongan Li 2018-05-17 02:41:57 UTC
verified in atomic-openshift-3.10.0-0.47.0.git.0.2fffa04.el7, the options "dns-forward-max" and "cache-size" has been increased to 10000 as below:

[root@ip-172-18-5-139 ~]# cat /etc/dnsmasq.d/origin-dns.conf 
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
dns-forward-max=10000
cache-size=10000
bind-dynamic
except-interface=lo
# End of config

OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
kernel: Linux ip-172-18-5-139.ec2.internal 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

Comment 17 errata-xmlrpc 2018-07-30 19:11:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816

Comment 19 Alasdair Kergon 2019-11-12 13:29:57 UTC
Clearing all the ignored old needinfo requests, picking a random required subcomponent.


Note You need to log in before you can comment on or make changes to this bug.