Description of problem:
dnsmasq service freezes randomly and needs to manually restart the dnsmasq service to start the resolution.
No logs are being captured for the dnsmasq service on the openshift nodes.
URL resolution fails on the nodes and results in unable resolve from pods as well.
Even though NetworkManager or dnsmasq service restarts the config should be correct and it should keep resolving.
The recent changes that made the dnsmasq handle all traffic for a node may have overloaded the current limits we have set in our config. We should see if it makes sense to increase the limits generally, or if we need to make it configurable.
WIP until this change is verified to work for customer
Commit pushed to master at https://github.com/openshift/openshift-ansible
dnsmasq - increase dns-forward-max, cache-size
Signed-off-by: Phil Cameron <email@example.com>
verified in atomic-openshift-3.10.0-0.47.0.git.0.2fffa04.el7, the options "dns-forward-max" and "cache-size" has been increased to 10000 as below:
[root@ip-172-18-5-139 ~]# cat /etc/dnsmasq.d/origin-dns.conf
# End of config
OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
kernel: Linux ip-172-18-5-139.ec2.internal 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.