Bug 1674067

Summary: dnsmasq 2.80 falsifies NXDOMAIN into NODATA
Product: [Fedora] Fedora Reporter: Maciej Żenczykowski <zenczykowski>
Component: dnsmasqAssignee: Petr Menšík <pemensik>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: code, dougsland, itamar, jima, laine, p, pemensik, thozza, veillard
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: dnsmasq-2.80-7.fc30 dnsmasq-2.79-9.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-03 01:17:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Maciej Żenczykowski 2019-02-08 22:52:31 UTC
I'm filing this bug based on some testing with dnsmasq 2.80.

It is my belief that dnsmasq is incorrectly converting NXDOMAIN responses from authoritative dns servers into NODATA.

This can result in bad resolution for ipv4-only or ipv6-only hostnames when searching through the search path (a correct dns client library aborts the search at NODATA but continues with the next search path element at NXDOMAIN - any other behaviour results in bugs [flakiness] in the case of server timeouts and other errors).

tea6.foo. and tea7.foo. don't exist.

athina:~$ for i in srv txt aaaa a aaaa a txt srv; do host -t $i tea6.foo. 127.0.0.1 | tail -n 1; done
Host tea6.foo. not found: 3(NXDOMAIN)
Host tea6.foo. not found: 3(NXDOMAIN)
Host tea6.foo. not found: 3(NXDOMAIN)
tea6.foo has no A record
Host tea6.foo. not found: 3(NXDOMAIN)
tea6.foo has no A record
tea6.foo has no TXT record
tea6.foo has no SRV record

athina:~$ for i in srv txt a aaaa a aaaa txt srv; do host -t $i tea7.foo. 127.0.0.1 | tail -n 1; done
Host tea7.foo. not found: 3(NXDOMAIN)
Host tea7.foo. not found: 3(NXDOMAIN)
Host tea7.foo. not found: 3(NXDOMAIN)
tea7.foo has no AAAA record
Host tea7.foo. not found: 3(NXDOMAIN)
tea7.foo has no AAAA record
tea7.foo has no TXT record
tea7.foo has no SRV record

yeah somehow A/AAAA are special (127.0.0.1 is dnsmasq 2.80)

Here's some more detail:

https://umbrella.cisco.com/blog/2014/06/23/nxdomain-nodata-debugging-dns-dual-stacked-hosts/

I'm guessing this bug is introduced by (but unverified):

commit b6f926fbefcd2471699599e44f32b8d25b87b471
Author: Simon Kelley <simon.uk>
Date:   Tue Aug 21 17:46:52 2018 +0100

    Don't return NXDOMAIN to empty non-terminals.
    
    When a record is defined locally, eg an A record for one.two.example then
    we already know that if we forward, eg an AAAA query for one.two.example,
    and get back NXDOMAIN, then we need to alter that to NODATA. This is handled
    by  check_for_local_domain(). But, if we forward two.example, because
    one.two.example exists, then the answer to two.example should also be
    a NODATA.
    
    For most local records this is easy, just to substring matching.
    for A, AAAA and CNAME records that are in the cache, it's more difficult.
    The cache has no efficient way to find such records. The fix is to
    insert empty (none of F_IPV4, F_IPV6 F_CNAME set) records for each
    non-terminal.

Comment 1 Maciej Żenczykowski 2019-02-12 05:14:47 UTC
Norman Rasmussen says:

diff --git a/src/cache.c b/src/cache.c
index 713e58c..2ff05f7 100644
--- a/src/cache.c
+++ b/src/cache.c
@@ -790,6 +790,7 @@ int cache_find_non_terminal(char *name, time_t now)
     if (!is_outdated_cname_pointer(crecp) &&
        !is_expired(now, crecp) &&
        (crecp->flags & F_FORWARD) &&
+       !(crecp->flags & F_NXDOMAIN) &&
        hostname_isequal(name, cache_get_name(crecp)))
       return 1;
 
seems to fix the bug, and doesn't seem to break the logic that the method was introduced for.

Comment 2 Maciej Żenczykowski 2019-02-12 05:20:21 UTC
And some additional comments from Norman:

I have more information about the trigger (using tcpdump, wireshark, dnsmasq --log-queries=extra -d -q --port 5553, and pkill -USR1 dnsmasq):

When the upstream server replies NXDOMAIN that entry is cached:
eg: response for A is cached with flags: "4F   NX" (v4, forwarded, no replay, nxdomain)

The follow up request sees a cached entry for the same name and thinks it MUST NOT return NXDOMAIN,
!!!because there is another cache entry for the same name!!!

I'm guessing that there's a missing logic check that all other cached entries for the same name are NXDOMAIN replies.  So the second entry gets flags of, eg: "6F   N " (v6, forwarded, no reply).

(switching the order of A and AAAA, only switches the 4 with 6, so it's symetric)

Comment 4 Petr Menšík 2019-04-12 08:22:21 UTC
Thanks for the fix pushed into upstream!

Comment 5 Fedora Update System 2019-07-31 18:46:27 UTC
FEDORA-2019-b0b2b9b380 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-b0b2b9b380

Comment 6 Fedora Update System 2019-07-31 19:36:59 UTC
FEDORA-2019-8ad16085e2 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-8ad16085e2

Comment 7 Fedora Update System 2019-08-01 03:28:47 UTC
dnsmasq-2.80-7.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-b0b2b9b380

Comment 8 Fedora Update System 2019-08-01 05:33:52 UTC
dnsmasq-2.79-9.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-8ad16085e2

Comment 9 Fedora Update System 2019-08-03 01:17:05 UTC
dnsmasq-2.80-7.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 10 Fedora Update System 2019-08-15 18:51:39 UTC
dnsmasq-2.79-9.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.