Bug 1469546 - dns becomes slow after running for a few days
dns becomes slow after running for a few days
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: dnsmasq (Show other bugs)
7.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Petr Menšík
qe-baseos-daemons
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-11 09:25 EDT by Gerd Hoffmann
Modified: 2017-07-31 22:28 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-07-31 22:28:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gerd Hoffmann 2017-07-11 09:25:35 EDT
Description of problem:
dns becomes slow after running for a few days

Version-Release number of selected component (if applicable):
dnsmasq-2.76-2.el7.x86_64

How reproducible:
100%

I'm running "RHEL-7 Workstation" on my machine, with NetworkManager configured to use dnsmasq for split-dns configuration.  Works fine for a few days, but then DNS becomes very slow and dnsmasq fills the log with "Maximum number of concurrent DNS queries reached" messages.
Comment 3 Petr Menšík 2017-07-25 11:49:21 EDT
Hi Gerd, thanks for your report. Can you please share your configuration?

You can increase maximum concurrent queries by dns-forward-max=300 directive. Each query should be freed after 40 seconds. If your workstation time does not go backwards, it should appear only for short time.

Can you please check by tcpdump or similar that queries are not made in high numbers?

Can you try sending 
$ killall -s USR1 dnsmasq

and include its output from journalctl? Before lags and after they start if possible?
Comment 8 Petr Menšík 2017-07-28 13:46:12 EDT
I think I have found a fix that would work.

Summary from private comments is this:
main dnsmasq instance listening on localhost uses forwarders for specific domains. One of that domains is forwarded to libvirt (another) dnsmasq instance handling names of virtual machines. Known hosts work well. However unknown names are forwarded back to the system forwarder. Which is dnsmasq listening on localhost. That forwards it again to libvirt instance, creating unending loop.

localhost configuration includes something like:
server=/sirius.example.org/192.168.105.1
rev-server=192.168.105.0/24,192.168.105.1

on 192.168.105.1 is listening dnsmasq instance of libvirt.

You want to prevent looping of unhandled dns names back. I think it you can fix it by change of libvirt configuration.

edit libvirt network definition by
$ virsh net-edit default

then add into tag <network> (not <ip>) something like this:
  <dns>
    <forwarder addr='127.0.0.1'/>
    <forwarder domain='sirius.example.org'/>
    <forwarder domain='105.168.192.in-addr.arpa'/>
  </dns>

that would forward all queries to localhost instance, but do not forward anything from domains without addr. Registered hosts would be correctly handled, others would receive correct answer NXDOMAIN from dnsmasq of libvirt.
Comment 9 Laine Stump 2017-07-31 22:28:47 EDT
I haven't checked if Petr's suggestion will work, but jdenemar solved this in his similar setup by adding the "localPtr" attribute to libvirt networks' <ip> element. For example:

   <ip address='192.168.105.0' netmask='255.255.255.0' localPtr='yes''/>

dnsmasq only responds to PTR record requests for ip addresses that are *currently assigned to a client* and will normally forward requests for currently unassigned addresses to the upstream DNS server. But when localPtr is yes for an address range, it will respond with a failure rather than forwarding the request upstream. This behavior is achieved by libvirt adding (in the case of the above example) the following to the dnsmasq.conf file:

   local=/105.168.192.in-addr.arpa/

Whenever you have the upstream DNS server on the host setup to potentially forward to a libvirt network's DNS, you should always add localPtr='yes' to the libvirt network's <ip> elements (and also add "localOnly='yes'" to any <domain> element defined in the network).

I'm assuming that making these changes to your network will solve the problem, and closing this as NOTABUG. If the problem persists, then please re-open.

Note You need to log in before you can comment on or make changes to this bug.