I'm filing this bug based on some testing with dnsmasq 2.80. It is my belief that dnsmasq is incorrectly converting NXDOMAIN responses from authoritative dns servers into NODATA. This can result in bad resolution for ipv4-only or ipv6-only hostnames when searching through the search path (a correct dns client library aborts the search at NODATA but continues with the next search path element at NXDOMAIN - any other behaviour results in bugs [flakiness] in the case of server timeouts and other errors). tea6.foo. and tea7.foo. don't exist. athina:~$ for i in srv txt aaaa a aaaa a txt srv; do host -t $i tea6.foo. 127.0.0.1 | tail -n 1; done Host tea6.foo. not found: 3(NXDOMAIN) Host tea6.foo. not found: 3(NXDOMAIN) Host tea6.foo. not found: 3(NXDOMAIN) tea6.foo has no A record Host tea6.foo. not found: 3(NXDOMAIN) tea6.foo has no A record tea6.foo has no TXT record tea6.foo has no SRV record athina:~$ for i in srv txt a aaaa a aaaa txt srv; do host -t $i tea7.foo. 127.0.0.1 | tail -n 1; done Host tea7.foo. not found: 3(NXDOMAIN) Host tea7.foo. not found: 3(NXDOMAIN) Host tea7.foo. not found: 3(NXDOMAIN) tea7.foo has no AAAA record Host tea7.foo. not found: 3(NXDOMAIN) tea7.foo has no AAAA record tea7.foo has no TXT record tea7.foo has no SRV record yeah somehow A/AAAA are special (127.0.0.1 is dnsmasq 2.80) Here's some more detail: https://umbrella.cisco.com/blog/2014/06/23/nxdomain-nodata-debugging-dns-dual-stacked-hosts/ I'm guessing this bug is introduced by (but unverified): commit b6f926fbefcd2471699599e44f32b8d25b87b471 Author: Simon Kelley <simon.uk> Date: Tue Aug 21 17:46:52 2018 +0100 Don't return NXDOMAIN to empty non-terminals. When a record is defined locally, eg an A record for one.two.example then we already know that if we forward, eg an AAAA query for one.two.example, and get back NXDOMAIN, then we need to alter that to NODATA. This is handled by check_for_local_domain(). But, if we forward two.example, because one.two.example exists, then the answer to two.example should also be a NODATA. For most local records this is easy, just to substring matching. for A, AAAA and CNAME records that are in the cache, it's more difficult. The cache has no efficient way to find such records. The fix is to insert empty (none of F_IPV4, F_IPV6 F_CNAME set) records for each non-terminal.
Norman Rasmussen says: diff --git a/src/cache.c b/src/cache.c index 713e58c..2ff05f7 100644 --- a/src/cache.c +++ b/src/cache.c @@ -790,6 +790,7 @@ int cache_find_non_terminal(char *name, time_t now) if (!is_outdated_cname_pointer(crecp) && !is_expired(now, crecp) && (crecp->flags & F_FORWARD) && + !(crecp->flags & F_NXDOMAIN) && hostname_isequal(name, cache_get_name(crecp))) return 1; seems to fix the bug, and doesn't seem to break the logic that the method was introduced for.
And some additional comments from Norman: I have more information about the trigger (using tcpdump, wireshark, dnsmasq --log-queries=extra -d -q --port 5553, and pkill -USR1 dnsmasq): When the upstream server replies NXDOMAIN that entry is cached: eg: response for A is cached with flags: "4F NX" (v4, forwarded, no replay, nxdomain) The follow up request sees a cached entry for the same name and thinks it MUST NOT return NXDOMAIN, !!!because there is another cache entry for the same name!!! I'm guessing that there's a missing logic check that all other cached entries for the same name are NXDOMAIN replies. So the second entry gets flags of, eg: "6F N " (v6, forwarded, no reply). (switching the order of A and AAAA, only switches the 4 with 6, so it's symetric)
Fixed upstream in: http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=162e5e0062ce923c494cc64282f293f0ed64fc10
Thanks for the fix pushed into upstream!
FEDORA-2019-b0b2b9b380 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-b0b2b9b380
FEDORA-2019-8ad16085e2 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-8ad16085e2
dnsmasq-2.80-7.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-b0b2b9b380
dnsmasq-2.79-9.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-8ad16085e2
dnsmasq-2.80-7.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
dnsmasq-2.79-9.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.