I am experiencing lots of DNS resolution failures. Typically, a DNS lookup fails (with status SERVFAIL), but when retried a certain number of times, eventually succeeds. Here is an example: [thomas@sarkovy ~]$ rndc flush [thomas@sarkovy ~]$ dig www.hpd.de ; <<>> DiG 9.18.30 <<>> www.hpd.de ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 9627 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 250950fb60d39df7010000006741cb6eada763856cac1066 (good) ;; QUESTION SECTION: ;www.hpd.de. IN A ;; Query time: 868 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Sat Nov 23 13:32:46 CET 2024 ;; MSG SIZE rcvd: 67 [thomas@sarkovy ~]$ dig www.hpd.de ; <<>> DiG 9.18.30 <<>> www.hpd.de ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 44143 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 417004fdba8bd8b1010000006741cb7272f08c6bd8689e13 (good) ;; QUESTION SECTION: ;www.hpd.de. IN A ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Sat Nov 23 13:32:50 CET 2024 ;; MSG SIZE rcvd: 67 [thomas@sarkovy ~]$ dig www.hpd.de ; <<>> DiG 9.18.30 <<>> www.hpd.de ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 35475 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 7a05a1f7531f0acc010000006741cb756995db09b1817752 (good) ;; QUESTION SECTION: ;www.hpd.de. IN A ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Sat Nov 23 13:32:53 CET 2024 ;; MSG SIZE rcvd: 67 [thomas@sarkovy ~]$ dig www.hpd.de ; <<>> DiG 9.18.30 <<>> www.hpd.de ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 64440 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 99bf5895589e1236010000006741cb785dc1fcf50c0e246f (good) ;; QUESTION SECTION: ;www.hpd.de. IN A ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Sat Nov 23 13:32:56 CET 2024 ;; MSG SIZE rcvd: 67 [thomas@sarkovy ~]$ dig www.hpd.de ; <<>> DiG 9.18.30 <<>> www.hpd.de ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2989 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 68ea16fa4cae36b4010000006741cb7ac89f228c5fe8ccca (good) ;; QUESTION SECTION: ;www.hpd.de. IN A ;; ANSWER SECTION: www.hpd.de. 86400 IN A 87.106.14.91 ;; Query time: 147 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Sat Nov 23 13:32:58 CET 2024 ;; MSG SIZE rcvd: 83 This is what named logs (filtered to only contain relevant records): 23-Nov-2024 13:32:45.492 queries: info: client @0x7fc3d7427168 127.0.0.1#49760 (www.hpd.de): query: www.hpd.de IN A +E(0)K (127.0.0.1) 23-Nov-2024 13:32:46.360 query-errors: info: client @0x7fc3d7427168 127.0.0.1#49760 (www.hpd.de): query failed (failure) for www.hpd.de/IN/A at ../../../lib/ns/query.c:7837 23-Nov-2024 13:32:50.141 queries: info: client @0x7fc3c21d0168 127.0.0.1#45711 (www.hpd.de): query: www.hpd.de IN A +E(0)K (127.0.0.1) 23-Nov-2024 13:32:50.141 query-errors: info: client @0x7fc3c21d0168 127.0.0.1#45711 (www.hpd.de): query failed (failure) for www.hpd.de/IN/A at ../../../lib/ns/query.c:7837 23-Nov-2024 13:32:53.565 queries: info: client @0x7fc3d0a27168 127.0.0.1#44695 (www.hpd.de): query: www.hpd.de IN A +E(0)K (127.0.0.1) 23-Nov-2024 13:32:53.565 query-errors: info: client @0x7fc3d0a27168 127.0.0.1#44695 (www.hpd.de): query failed (failure) for www.hpd.de/IN/A at ../../../lib/ns/query.c:7837 23-Nov-2024 13:32:56.487 queries: info: client @0x7fc3d7427168 127.0.0.1#35484 (www.hpd.de): query: www.hpd.de IN A +E(0)K (127.0.0.1) 23-Nov-2024 13:32:56.487 query-errors: info: client @0x7fc3d7427168 127.0.0.1#35484 (www.hpd.de): query failed (failure) for www.hpd.de/IN/A at ../../../lib/ns/query.c:7837 23-Nov-2024 13:32:58.599 queries: info: client @0x7fc3ce811168 127.0.0.1#59963 (www.hpd.de): query: www.hpd.de IN A +E(0)K (127.0.0.1) Interestingly, after a successful lookup, further lookups also succeed, even if the domain is flushed from the cache using 'rndc flushtree hpd.de'. However, after a full flush (rndc flush), lookups start failing again: [thomas@sarkovy ~]$ dig www.hpd.de a ; <<>> DiG 9.18.30 <<>> www.hpd.de a ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31300 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: e9b931d2487a4208010000006741d139bfd4b604fde8b724 (good) ;; QUESTION SECTION: ;www.hpd.de. IN A ;; ANSWER SECTION: www.hpd.de. 85912 IN A 87.106.14.91 ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Sat Nov 23 13:57:29 CET 2024 ;; MSG SIZE rcvd: 83 [thomas@sarkovy ~]$ rndc flushtree hpd.de [thomas@sarkovy ~]$ dig www.hpd.de a ; <<>> DiG 9.18.30 <<>> www.hpd.de a ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6745 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 8da0b89a7418c465010000006741d14a59bf63809df99310 (good) ;; QUESTION SECTION: ;www.hpd.de. IN A ;; ANSWER SECTION: www.hpd.de. 86400 IN A 87.106.14.91 ;; Query time: 58 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Sat Nov 23 13:57:46 CET 2024 ;; MSG SIZE rcvd: 83 [thomas@sarkovy ~]$ rndc flush [thomas@sarkovy ~]$ dig www.hpd.de a ; <<>> DiG 9.18.30 <<>> www.hpd.de a ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 65533 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 2158472ab06b9051010000006741d184ca514ab30350ad21 (good) ;; QUESTION SECTION: ;www.hpd.de. IN A ;; Query time: 63 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Sat Nov 23 13:58:44 CET 2024 ;; MSG SIZE rcvd: 67 Reproducible: Always
Created attachment 2059517 [details] Bind9 configuration file
I am not able to give you any tips for this behaviour. I have seen strange behaviour on single address family networks. Adding -4 or -6 option to named startup parameters helps often in that case. I suspect it resolves it by caching failing name servers eventually, stopping using them later. rndc dumpdb with some parameter should help you finding more. Can you use: rndc dumpdb -fail or rndc dumpdb -bad then looking at contents of file /var/named/data/cache_dump.db, which servers are there and whether some of them may have suspicious cache flags in comments? File path might be reconfigured by dump-file. I think rndc flush affects also bad servers cache, but flushtree only cached resource records. Not remote server feature information cache. Unless you share what you see for the servers, I doubt I can help you. Additional problem might be using forwarder(s) stripping DNSSEC signatures while signing is enabled.
(In reply to Petr Menšík from comment #2) > I am not able to give you any tips for this behaviour. I have seen strange > behaviour on single address family networks. Adding -4 or -6 option to named > startup parameters helps often in that case. > > I suspect it resolves it by caching failing name servers eventually, > stopping using them later. rndc dumpdb with some parameter should help you > finding more. > > Can you use: > rndc dumpdb -fail > or > rndc dumpdb -bad Neither of these commands turned up anything of interest, or indeed anything at all: ; ; Start view _default ; ; ; SERVFAIL cache ; ; ; Start view _bind ; ; ; SERVFAIL cache ; ; Dump complete and ; ; Start view _default ; ; ; Bad cache ; ; ; Start view _bind ; ; ; Bad cache ; ; Dump complete > Additional problem might be using forwarder(s) stripping DNSSEC signatures > while signing is enabled. It is not. Meanwhile, I had a closer look at the server log (journalctl -l -b -u named.service) and found many occurrences of sequences like this one: Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns1.zdns.google/A' Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns4.zdns.google/A' Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns2.zdns.google/A' Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns3.zdns.google/A' Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns4.zdns.google/AAAA' Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns2.zdns.google/AAAA' Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns1.zdns.google/AAAA' Jan 26 18:34:43 sarkovy named[1544]: loop detected resolving 'ns3.zdns.google/AAAA' These occur quite frequently. I tried to reproduce the error using the dig command, but had no success so far. Over a time span of two days, there were quite a number of those events: [root@sarkovy cups]# journalctl -l -b -u named.service | grep -Fe 'loop detected resolving' | grep -e 'ns[1234]\.zdns\.google' | wc -l 184 Almost all of them involved the google name servers, but there also were a few others: [root@sarkovy cups]# journalctl -l -b -u named.service | grep -Fe 'loop detected resolving' | grep -ve 'ns[1234]\.zdns\.google' Jan 26 22:22:52 sarkovy named[1544]: loop detected resolving 'ns5.pinterest.com/A' Jan 26 22:22:52 sarkovy named[1544]: loop detected resolving 'ns6.pinterest.com/A' Jan 26 22:22:52 sarkovy named[1544]: loop detected resolving 'ns6.pinterest.com/AAAA' Jan 26 22:22:52 sarkovy named[1544]: loop detected resolving 'ns5.pinterest.com/AAAA' Jan 27 17:12:44 sarkovy named[1544]: loop detected resolving 'ns4.apnic.net/A' Jan 27 17:12:44 sarkovy named[1544]: loop detected resolving 'ns4.apnic.net/AAAA'
loop detected should mean that you are forwarding to (at least some) servers, which (in some cases) will forward back to your server. Creating then endless loop. That would be configuration issue on some servers used in the chain. Not necessary on your server. Try enabling query logging on your server and try domain you have used. You should see there who queries back on you. Without knowing more about your forwarding configuration, it is hard to tell. rndc querylog on named-checkconf -px output might help, but without your server publicly reachable, probably not.
Created attachment 2074093 [details] Bind9 configuration file
Created attachment 2074094 [details] query log file
Created attachment 2074095 [details] output of 'rndc checkconf -px'
(In reply to Petr Menšík from comment #4) > Try enabling query logging on your server and try domain you have used. You > should see there who queries back on you. Without knowing more about your > forwarding configuration, it is hard to tell. > > rndc querylog on I have been logging queries for quite a while now - output attached. AFAICT there is nothing interesting there. > named-checkconf -px output might help, but without your server publicly > reachable, probably not. It is of course reachable as sarkovy.krokinole.de. Output of 'named-checkconf -px' is attached. Again, I do not see anything of interest there.
Okay, no forwarders to create any loops. Some overrides to blackhole common zones, but nothing should make it failing. Can you try command: dig -4 @localhost +trace www.hpd.de. Does it remain broken this way? I would guess it could be issue in handling just flushing domain info, like mentioned at upstream issue: https://gitlab.isc.org/isc-projects/bind9/-/issues/3885 It seems like it could be the same problem you have. Because NS record addresses for hpd.de. might not be flushed when you flush just that domain. This have been fixed in 9.18.33, which is already available in f41 updates, since update: https://bodhi.fedoraproject.org/updates/FEDORA-2025-3551f3ba1b Can you please confirm that the problem is fixed or whether it remains? If it remains, could you try rawhide branch rebuilds from COPR https://copr.fedorainfracloud.org/coprs/pemensik/bind/? That should contain the most recent bind package we have. I do not have yet normal testing update for f41.
FEDORA-2025-4922878d8c (bind-9.18.39-3.fc41.1 and bind-dyndb-ldap-11.10-34.fc41) has been submitted as an update to Fedora 41. https://bodhi.fedoraproject.org/updates/FEDORA-2025-4922878d8c
FEDORA-2025-4922878d8c has been pushed to the Fedora 41 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-4922878d8c` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-4922878d8c See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2025-4922878d8c (bind-9.18.39-3.fc41.1 and bind-dyndb-ldap-11.10-34.fc41) has been pushed to the Fedora 41 stable repository. If problem still persists, please make note of it in this bug report.