Bug 1943826

Summary: CoreDNS caches NXDOMAIN responses for up to 900 seconds
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: Stephen Greene <sgreene>
Networking sub component: DNS QA Contact: Arvind iyengar <aiyengar>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aiyengar, amcdermo, aos-bugs, dofinn, hongli, jeder, otuchfel
Version: 4.6Keywords: ServiceDeliveryBlocker, ServiceDeliveryImpact
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Bug 1936587 set the global CoreDNS cache max TTL to 900 seconds. Consequence: NXDOMAIN records received from upstream resolvers are cached for 900 seconds. Fix: Explicitly cache negative DNS response records for maximum 30 seconds. Result: Resolving domains that are in the process of being published does not take at minimum 15 minutes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-12 23:22:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1943578    
Bug Blocks: 1944245    

Comment 1 Arvind iyengar 2021-03-31 05:57:43 UTC
Verified in "4.7.0-0.ci.test-2021-03-31-042927-ci-ln-sv7y39b". With this payload it is observed that the additional configuration of 30 second TTL for negative records get set by default along with 900 seconds for positive record in cache plugin section:
-----
Defaulting container name to dns.
Use 'oc describe pod/dns-default-chmn9 -n openshift-dns' to see all of the containers in this pod.
.:5353 {
    errors
    health {
        lameduck 20s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        upstream
        fallthrough in-addr.arpa ip6.arpa
    }
    prometheus 127.0.0.1:9153
    forward . /etc/resolv.conf {
        policy sequential
    }
    cache 900 {
        denial 9984 30
    }
    reload
}
-----

Comment 3 Hongan Li 2021-04-02 00:45:52 UTC
merged in 4.7.0-0.nightly-2021-04-01-052823, moving to verified per #Comment 1

(should be verified by bot but seems it missed this one)

Comment 6 errata-xmlrpc 2021-04-12 23:22:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1075