Bug 1756201

Summary: lots of resolver priming query complete messages
Product: Red Hat Enterprise Linux 7 Reporter: SHAURYA <sshaurya>
Component: bindAssignee: Petr Menšík <pemensik>
Status: CLOSED ERRATA QA Contact: Robin Hack <rhack>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.7CC: anon.amish, extras-qa, matt.castelein, mruprich, msehnout, pemensik, pzhukov, rhack, tbskyd, thozza, vonsch, zdohnal
Target Milestone: rcKeywords: Patch, Regression, TestCaseProvided, Triaged
Target Release: ---Flags: thozza: mirror+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: bind-9.11.4-22.P2.el7 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1680028 Environment:
Last Closed: 2020-09-29 19:25:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1680028, 1773116    
Bug Blocks: 1757052, 1780577    

Description SHAURYA 2019-09-27 04:30:08 UTC
+++ This bug was initially created as a clone of Bug #1680028 +++


we are seeing this strange status  " resolver priming query complete"


[root@ns-infr ~]# uname -r
3.10.0-1062.1.2.el7.x86_64

[root@ns-infr ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.7 (Maipo)

[root@ns-infr ~]# systemctl status named -l
● named.service - Berkeley Internet Name Domain (DNS)
   Loaded: loaded (/usr/lib/systemd/system/named.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-09-26 08:28:19 UTC; 3min 22s ago
  Process: 1299 ExecStart=/usr/sbin/named -u named -c ${NAMEDCONF} $OPTIONS (code=exited, status=0/SUCCESS)
  Process: 1283 ExecStartPre=/bin/bash -c if [ ! "$DISABLE_ZONE_CHECKING" == "yes" ]; then /usr/sbin/named-checkconf -z "$NAMEDCONF"; else echo "Checking of zone files is disabled"; fi (code=exited, status=0/SUCCESS)
 Main PID: 1308 (named)
   CGroup: /system.slice/named.service
           └─1308 /usr/sbin/named -u named -c /etc/named.conf

Sep 26 08:30:56 ns-infr.mitreap.org named[1308]: resolver priming query complete
Sep 26 08:31:07 ns-infr.mitreap.org named[1308]: resolver priming query complete
Sep 26 08:31:08 ns-infr.mitreap.org named[1308]: resolver priming query complete
Sep 26 08:31:13 ns-infr.mitreap.org named[1308]: resolver priming query complete
Sep 26 08:31:17 ns-infr.mitreap.org named[1308]: resolver priming query complete
Sep 26 08:31:19 ns-infr.mitreap.org named[1308]: resolver priming query complete
Sep 26 08:31:34 ns-infr.mitreap.org named[1308]: resolver priming query complete
Sep 26 08:31:38 ns-infr.mitreap.org named[1308]: resolver priming query complete
Sep 26 08:31:39 ns-infr.mitreap.org named[1308]: resolver priming query complete
Sep 26 08:31:40 ns-infr.mitreap.org named[1308]: resolver priming query complete

Description of problem:
Named seems to log
  named[XXXX]: resolver priming query complete
for almost every name lookup, or so it seems.

Version-Release number of selected component (if applicable):
bind-9.11.5-2.P1.fc29.x86_64

How reproducible:
Always

Steps to Reproduce:
1. configure and run a local recursive caching resolver

Actual results:
Lots of root priming messages in the logs

Expected results:
Very few of those, if any 

Additional info:
I run a recursive caching server, currently forward first to OpenDNS, and have these hints in named.conf:

zone "." IN {
        type hint;
        file "named.ca";
};

Just compared the named.ca file to the output of a dig query for the root servers and they are the same (except for the comments in header/footer)

What am I missing?  Attaching my (sanitized) named.conf.

--- Additional comment from Henrique Martins on 2019-03-23 14:59:33 UTC ---

Explanation of this behaviour here:
  https://kb.isc.org/docs/aa-01537.

Another ISC page:
  https://kb.isc.org/docs/aa-01309
claims:
  Fixed in 9.12.0, 9.11.3, 9.10.7 and 9.9.12, bug RT #45241 could cause named to send unnecessary and frequent priming queries.
However, I now have
  bind-9.11.5-4.P4.fc29.x86_64
and I'm still getting lots of messages logged.

Can be eliminated by querying all root servers for -tA and -tAAAA (or -tANY). I've added that to a 30m crontab for now.

--- Additional comment from Petr Menšík on 2019-09-11 11:02:31 UTC ---

Hi Henrique,

sorry for great delay. This is strange issue, but I think I read something similar on release issue notes.

Have you tried updated versions? Does this issue persist with more recent updates?

--- Additional comment from Henrique Martins on 2019-09-11 17:12:46 UTC ---

Yup, it's been a while. 
I'm now on:
  bind-9.11.10-1.fc30.x86_64

Disabled my root priming crontab job earlier today and the logs seem clean of that annoying message.

I'll report back tomorrow and if still clean, it is time to close the bug.

--- Additional comment from Henrique Martins on 2019-09-12 11:56:05 UTC ---

Seems to be fixed in the current version.
Time to close this.
Thanks

Comment 2 Henrique Martins 2019-09-27 12:50:03 UTC
I see this is in RHEL 7.7, which I don't use.
On Fedora 30, as stated on the cited bug 1680028, this is fixed by a newer release of bind.
You'll have to wait for RH to update it on the EL side, or build from source.

Comment 4 Petr Menšík 2019-10-07 10:17:17 UTC
It seems to me the only related upstream commit was [1]. It fixes glue usage in case NS query is missing the addresses.

This would be the issue, when A and AAAA records are not present in command:

dig +tcp -t NS -q .

As a workaround, it should help requesting them once in a while. Maybe even after named start.
This bash command should get rid of it, until it is fixed.

for NS in a b c d e f g h i;
do
  dig -t A +short $NS.root-servers.net;
  dig -t AAAA +short $NS.root-servers.net;
done

1. https://gitlab.isc.org/isc-projects/bind9/commit/77bc37b6160d31f62aa68bb176917bd2f0736775

Comment 11 Petr Menšík 2020-03-18 13:26:43 UTC
Oh, found it finally. The fix[1] is to stop querying forwarders at all and query directly root servers. Then additional section is always provided.

1. https://gitlab.isc.org/isc-projects/bind9/commit/aa9866c390a21d6984aa75cdb84d7bc77e114c2f

Comment 12 Petr Menšík 2020-03-18 13:54:50 UTC
(In reply to Petr Menšík from comment #11)
> Oh, found it finally. The fix[1] is to stop querying forwarders at all and
> query directly root servers. Then additional section is always provided.
> 
> 1.
> https://gitlab.isc.org/isc-projects/bind9/commit/
> aa9866c390a21d6984aa75cdb84d7bc77e114c2f

This commit was part of 9.11.5, so RHEL 8.2 is already fixed. Needs fixing just in RHEL 7.

Reproduction requires just delay between queries to any names not already in cache, new names. After restart just any names with 3 sec delay between them works. Have to be directed to forwarder that does not respond with IP adresses to dig -t NS -q . That might be default configuration of unbound or bind with minimal answers configured.

Root priming is required just because the server might ask on failures root servers directly. If only forwarders should be ever used, just make sure forward only; is used in options.

Comment 24 errata-xmlrpc 2020-09-29 19:25:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (bind bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3871