Bug 2328476 - DNS lookup problems
Summary: DNS lookup problems
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: bind
Version: 41
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Petr Menšík
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-11-23 13:21 UTC by Thomas Köller
Modified: 2025-09-23 09:55 UTC (History)
5 users (show)

Fixed In Version: bind-9.18.39-3.fc41.1
Clone Of:
Environment:
Last Closed: 2025-09-16 01:24:14 UTC
Type: ---
Embargoed:
thomas: needinfo-


Attachments (Terms of Use)
Bind9 configuration file (2.77 KB, text/plain)
2024-11-23 15:00 UTC, Thomas Köller
no flags Details
Bind9 configuration file (2.82 KB, text/plain)
2025-01-28 00:50 UTC, Thomas Köller
no flags Details
query log file (471.75 KB, application/x-xz)
2025-01-28 00:53 UTC, Thomas Köller
no flags Details
output of 'rndc checkconf -px' (1.04 KB, application/x-xz)
2025-01-28 00:56 UTC, Thomas Köller
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Internet Systems Consortium (ISC) isc-projects bind9 issues 3885 0 None closed "Loop detected resolving..." and different query-behavior after flushing a cache entry between BIND-9.19.10 and BIND-9.1... 2025-09-09 16:19:01 UTC
Internet Systems Consortium (ISC) isc-projects bind9 merge_requests 10025 0 None merged fix: usr: fix "rndc flushname" for longer name server names 2025-09-09 16:19:01 UTC

Description Thomas Köller 2024-11-23 13:21:48 UTC
I am experiencing lots of DNS resolution failures. Typically, a DNS lookup fails (with status SERVFAIL), but when retried a certain number of times, eventually succeeds. Here is an example:

[thomas@sarkovy ~]$ rndc flush
[thomas@sarkovy ~]$ dig www.hpd.de

; <<>> DiG 9.18.30 <<>> www.hpd.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 9627
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 250950fb60d39df7010000006741cb6eada763856cac1066 (good)
;; QUESTION SECTION:
;www.hpd.de.			IN	A

;; Query time: 868 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Nov 23 13:32:46 CET 2024
;; MSG SIZE  rcvd: 67

[thomas@sarkovy ~]$ dig www.hpd.de

; <<>> DiG 9.18.30 <<>> www.hpd.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 44143
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 417004fdba8bd8b1010000006741cb7272f08c6bd8689e13 (good)
;; QUESTION SECTION:
;www.hpd.de.			IN	A

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Nov 23 13:32:50 CET 2024
;; MSG SIZE  rcvd: 67

[thomas@sarkovy ~]$ dig www.hpd.de

; <<>> DiG 9.18.30 <<>> www.hpd.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 35475
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 7a05a1f7531f0acc010000006741cb756995db09b1817752 (good)
;; QUESTION SECTION:
;www.hpd.de.			IN	A

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Nov 23 13:32:53 CET 2024
;; MSG SIZE  rcvd: 67

[thomas@sarkovy ~]$ dig www.hpd.de

; <<>> DiG 9.18.30 <<>> www.hpd.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 64440
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 99bf5895589e1236010000006741cb785dc1fcf50c0e246f (good)
;; QUESTION SECTION:
;www.hpd.de.			IN	A

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Nov 23 13:32:56 CET 2024
;; MSG SIZE  rcvd: 67

[thomas@sarkovy ~]$ dig www.hpd.de

; <<>> DiG 9.18.30 <<>> www.hpd.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2989
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 68ea16fa4cae36b4010000006741cb7ac89f228c5fe8ccca (good)
;; QUESTION SECTION:
;www.hpd.de.			IN	A

;; ANSWER SECTION:
www.hpd.de.		86400	IN	A	87.106.14.91

;; Query time: 147 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Nov 23 13:32:58 CET 2024
;; MSG SIZE  rcvd: 83

This is what named logs (filtered to only contain relevant records):

23-Nov-2024 13:32:45.492 queries: info: client @0x7fc3d7427168 127.0.0.1#49760 (www.hpd.de): query: www.hpd.de IN A +E(0)K (127.0.0.1)
23-Nov-2024 13:32:46.360 query-errors: info: client @0x7fc3d7427168 127.0.0.1#49760 (www.hpd.de): query failed (failure) for www.hpd.de/IN/A at ../../../lib/ns/query.c:7837
23-Nov-2024 13:32:50.141 queries: info: client @0x7fc3c21d0168 127.0.0.1#45711 (www.hpd.de): query: www.hpd.de IN A +E(0)K (127.0.0.1)
23-Nov-2024 13:32:50.141 query-errors: info: client @0x7fc3c21d0168 127.0.0.1#45711 (www.hpd.de): query failed (failure) for www.hpd.de/IN/A at ../../../lib/ns/query.c:7837
23-Nov-2024 13:32:53.565 queries: info: client @0x7fc3d0a27168 127.0.0.1#44695 (www.hpd.de): query: www.hpd.de IN A +E(0)K (127.0.0.1)
23-Nov-2024 13:32:53.565 query-errors: info: client @0x7fc3d0a27168 127.0.0.1#44695 (www.hpd.de): query failed (failure) for www.hpd.de/IN/A at ../../../lib/ns/query.c:7837
23-Nov-2024 13:32:56.487 queries: info: client @0x7fc3d7427168 127.0.0.1#35484 (www.hpd.de): query: www.hpd.de IN A +E(0)K (127.0.0.1)
23-Nov-2024 13:32:56.487 query-errors: info: client @0x7fc3d7427168 127.0.0.1#35484 (www.hpd.de): query failed (failure) for www.hpd.de/IN/A at ../../../lib/ns/query.c:7837
23-Nov-2024 13:32:58.599 queries: info: client @0x7fc3ce811168 127.0.0.1#59963 (www.hpd.de): query: www.hpd.de IN A +E(0)K (127.0.0.1)

Interestingly, after a successful lookup, further lookups also succeed, even if the domain is flushed from the cache using 'rndc flushtree hpd.de'. However, after a full flush (rndc flush), lookups start failing again:

[thomas@sarkovy ~]$ dig www.hpd.de a

; <<>> DiG 9.18.30 <<>> www.hpd.de a
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31300
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: e9b931d2487a4208010000006741d139bfd4b604fde8b724 (good)
;; QUESTION SECTION:
;www.hpd.de.			IN	A

;; ANSWER SECTION:
www.hpd.de.		85912	IN	A	87.106.14.91

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Nov 23 13:57:29 CET 2024
;; MSG SIZE  rcvd: 83

[thomas@sarkovy ~]$ rndc flushtree hpd.de
[thomas@sarkovy ~]$ dig www.hpd.de a

; <<>> DiG 9.18.30 <<>> www.hpd.de a
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6745
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 8da0b89a7418c465010000006741d14a59bf63809df99310 (good)
;; QUESTION SECTION:
;www.hpd.de.			IN	A

;; ANSWER SECTION:
www.hpd.de.		86400	IN	A	87.106.14.91

;; Query time: 58 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Nov 23 13:57:46 CET 2024
;; MSG SIZE  rcvd: 83

[thomas@sarkovy ~]$ rndc flush
[thomas@sarkovy ~]$ dig www.hpd.de a

; <<>> DiG 9.18.30 <<>> www.hpd.de a
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 65533
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 2158472ab06b9051010000006741d184ca514ab30350ad21 (good)
;; QUESTION SECTION:
;www.hpd.de.			IN	A

;; Query time: 63 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Nov 23 13:58:44 CET 2024
;; MSG SIZE  rcvd: 67


Reproducible: Always

Comment 1 Thomas Köller 2024-11-23 15:00:09 UTC
Created attachment 2059517 [details]
Bind9 configuration file

Comment 2 Petr Menšík 2025-01-27 16:54:55 UTC
I am not able to give you any tips for this behaviour. I have seen strange behaviour on single address family networks. Adding -4 or -6 option to named startup parameters helps often in that case.

I suspect it resolves it by caching failing name servers eventually, stopping using them later. rndc dumpdb with some parameter should help you finding more.

Can you use:
rndc dumpdb -fail
or
rndc dumpdb -bad

then looking at contents of file /var/named/data/cache_dump.db, which servers are there and whether some of them may have suspicious cache flags in comments?
File path might be reconfigured by dump-file.

I think rndc flush affects also bad servers cache, but flushtree only cached resource records. Not remote server feature information cache.

Unless you share what you see for the servers, I doubt I can help you.

Additional problem might be using forwarder(s) stripping DNSSEC signatures while signing is enabled.

Comment 3 Thomas Köller 2025-01-27 18:23:00 UTC
(In reply to Petr Menšík from comment #2)
> I am not able to give you any tips for this behaviour. I have seen strange
> behaviour on single address family networks. Adding -4 or -6 option to named
> startup parameters helps often in that case.
> 
> I suspect it resolves it by caching failing name servers eventually,
> stopping using them later. rndc dumpdb with some parameter should help you
> finding more.
> 
> Can you use:
> rndc dumpdb -fail
> or
> rndc dumpdb -bad

Neither of these commands turned up anything of interest, or indeed anything at all:

;
; Start view _default
;
;
; SERVFAIL cache
;
;
; Start view _bind
;
;
; SERVFAIL cache
;
; Dump complete

and

;
; Start view _default
;
;
; Bad cache
;
;
; Start view _bind
;
;
; Bad cache
;
; Dump complete

> Additional problem might be using forwarder(s) stripping DNSSEC signatures
> while signing is enabled.
It is not.

Meanwhile, I had a closer look at the server log (journalctl -l -b -u named.service) and found many occurrences of sequences like this one:

Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns1.zdns.google/A'
Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns4.zdns.google/A'
Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns2.zdns.google/A'
Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns3.zdns.google/A'
Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns4.zdns.google/AAAA'
Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns2.zdns.google/AAAA'
Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns1.zdns.google/AAAA'
Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns3.zdns.google/AAAA'

These occur quite frequently. I tried to reproduce the error using the dig command, but had no success so far.

Over a time span of two days, there were quite a number of those events:

[root@sarkovy cups]# journalctl -l -b -u named.service | grep -Fe 'loop detected resolving' | grep -e 'ns[1234]\.zdns\.google' | wc -l
184

Almost all of them involved the google name servers, but there also were a few others:

[root@sarkovy cups]# journalctl -l -b -u named.service | grep -Fe 'loop detected resolving' | grep -ve 'ns[1234]\.zdns\.google'
Jan 26 22:22:52 sarkovy named[1544]: loop detected resolving 'ns5.pinterest.com/A'
Jan 26 22:22:52 sarkovy named[1544]: loop detected resolving 'ns6.pinterest.com/A'
Jan 26 22:22:52 sarkovy named[1544]: loop detected resolving 'ns6.pinterest.com/AAAA'
Jan 26 22:22:52 sarkovy named[1544]: loop detected resolving 'ns5.pinterest.com/AAAA'
Jan 27 17:12:44 sarkovy named[1544]: loop detected resolving 'ns4.apnic.net/A'
Jan 27 17:12:44 sarkovy named[1544]: loop detected resolving 'ns4.apnic.net/AAAA'

Comment 4 Petr Menšík 2025-01-27 22:48:11 UTC
loop detected should mean that you are forwarding to (at least some) servers, which (in some cases) will forward back to your server. Creating then endless loop. That would be configuration issue on some servers used in the chain. Not necessary on your server.

Try enabling query logging on your server and try domain you have used. You should see there who queries back on you. Without knowing more about your forwarding configuration, it is hard to tell.

rndc querylog on

named-checkconf -px output might help, but without your server publicly reachable, probably not.

Comment 5 Thomas Köller 2025-01-28 00:50:52 UTC
Created attachment 2074093 [details]
Bind9 configuration file

Comment 6 Thomas Köller 2025-01-28 00:53:07 UTC
Created attachment 2074094 [details]
query log file

Comment 7 Thomas Köller 2025-01-28 00:56:43 UTC
Created attachment 2074095 [details]
output of 'rndc checkconf -px'

Comment 8 Thomas Köller 2025-01-28 01:00:32 UTC
(In reply to Petr Menšík from comment #4)
> Try enabling query logging on your server and try domain you have used. You
> should see there who queries back on you. Without knowing more about your
> forwarding configuration, it is hard to tell.
> 
> rndc querylog on
I have been logging queries for quite a while now - output attached. AFAICT there is nothing interesting there.

> named-checkconf -px output might help, but without your server publicly
> reachable, probably not.
It is of course reachable as sarkovy.krokinole.de. Output of 'named-checkconf -px' is attached. Again, I do not see anything of interest there.

Comment 9 Petr Menšík 2025-09-09 16:19:02 UTC
Okay, no forwarders to create any loops. Some overrides to blackhole common zones, but nothing should make it failing.

Can you try command:
dig -4 @localhost +trace www.hpd.de.

Does it remain broken this way? I would guess it could be issue in handling just flushing domain info, like mentioned at upstream issue:

https://gitlab.isc.org/isc-projects/bind9/-/issues/3885

It seems like it could be the same problem you have. Because NS record addresses for hpd.de. might not be flushed when you flush just that domain.

This have been fixed in 9.18.33, which is already available in f41 updates, since update:
https://bodhi.fedoraproject.org/updates/FEDORA-2025-3551f3ba1b

Can you please confirm that the problem is fixed or whether it remains?

If it remains, could you try rawhide branch rebuilds from COPR https://copr.fedorainfracloud.org/coprs/pemensik/bind/? That should contain the most recent bind package we have. I do not have yet normal testing update for f41.

Comment 10 Fedora Update System 2025-09-10 20:55:15 UTC
FEDORA-2025-4922878d8c (bind-9.18.39-3.fc41.1 and bind-dyndb-ldap-11.10-34.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-4922878d8c

Comment 11 Fedora Update System 2025-09-11 02:00:16 UTC
FEDORA-2025-4922878d8c has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-4922878d8c`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-4922878d8c

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 12 Fedora Update System 2025-09-16 01:24:14 UTC
FEDORA-2025-4922878d8c (bind-9.18.39-3.fc41.1 and bind-dyndb-ldap-11.10-34.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.