Bug 1838994

Summary: Bind9 stale-answer-enable default is ignored
Product: Red Hat Enterprise Linux 8 Reporter: Emmanuel Kasper <ekasprzy>
Component: bindAssignee: Petr Menšík <pemensik>
Status: CLOSED WONTFIX QA Contact: qe-baseos-daemons
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.2CC: aegorenk
Target Milestone: rcKeywords: Reproducer
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-22 07:27:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Emmanuel Kasper 2020-05-22 09:49:08 UTC
Description of problem:
Bind9 server-stale-enable default is ignored.

Version-Release number of selected component (if applicable): bind-9.11.13-3.el8


How reproducible: always


I am using the following named.conf configuration:

```
# cat /etc/named.conf


options {
  listen-on port 53 { 127.0.0.1; };
  use-v4-udp-ports { range 9000 65535; };
  use-v6-udp-ports { range 9000 65535; };

  directory       "/var/named";
  dump-file       "/var/named/data/cache_dump.db";
  statistics-file "/var/named/data/named_stats.txt";
  memstatistics-file "/var/named/data/named_mem_stats.txt";
  secroots-file   "/var/named/data/named.secroots";
  recursing-file  "/var/named/data/named.recursing";

  allow-query     { localhost; };
  allow-query-cache { localhost; };

  // Only forward and cache requests
  forward only;
  forwarders { 8.8.4.4; 8.8.8.8; };

  // DNSsec
  dnssec-enable yes;
  dnssec-validation yes;
  managed-keys-directory "/var/named/dynamic";

  max-cache-ttl 3600;
  max-ncache-ttl 3600;

  pid-file "/run/named/named.pid";
  session-keyfile "/run/named/session.key";

};

view localhost_resolver {
  match-clients      { localhost; };
  match-destinations { localhost; };
  recursion yes;
  include "/etc/named.rfc1912.zones";
};

logging {
  channel default_debug {
    print-time yes;
    print-category yes;
    print-severity yes;
    file "data/named.run";
    severity dynamic;
  };
};
```

and `serve-stale-enable` is not enabled, neither in config nor via rncd.

```
# rndc serve-stale status 

localhost_resolver: off (stale-answer-ttl=1 max-stale-ttl=604800)
_bind: off (stale-answer-ttl=1 max-stale-ttl=604800)
```

However a cache dump of this running server gives the following output:

```
# cat /var/named/data/cache_dump.db 
;
; Start view localhost_resolver
;
;
; Cache dump of view 'localhost_resolver' (cache localhost_resolver)
;
; using a 604800 second stale ttl
$DATE 20200515091102
; answer
access.redhat.com.	608272	IN CNAME access.redhat.com.edgekey.net.
; answer
downloads.redhat.com.	608360	CNAME	downloads.redhat.com.edgekey.net.
; answer
e133.b.akamaiedge.net.	604756	A	104.96.151.22
; answer
e1890.dscd.akamaiedge.net. 604780 A	172.227.162.153
; answer
access.redhat.com.edgekey.net. 608337 CNAME e133.b.akamaiedge.net.
; answer
downloads.redhat.com.edgekey.net. 608361 CNAME e1890.dscd.akamaiedge.net.
```

According to the Bind9 documentation, `server-stable-enable`  should be off by default.
Quoting here the relevant part of bind-9.11-serve-stale.patch

```
+             <listitem>
+               <para>
+                 Enable the returning of stale answers when the
+                 nameservers for the zone are not answering.  This
+                 is off by default but can be enabled/disabled via
+                 <command>rndc server-stale on</command> and
+                 <command>rndc server-stale off</command> which
+                 override the named.conf setting.  <command>rndc
+                 server-stale reset</command> will restore control
+                 via named.conf.
+               </para>
+             </listitem>
```

If I am not missing something, it looks like the default is not respected, since bind9 tracks the server-stale TTLs for each record in the cache dump.
Expected here was a decreasing max-cache-ttl, going from 3600 to 0 as it was in older bind9 releases (9.11.4-9.P2.el7 for instance)

Comment 1 Petr Menšík 2020-09-22 12:39:23 UTC
Okay, I confirm stored TTL in dumpdb are wrong. It behaves the same way on 9.16.6 server, which has just lower default TTL used.

$ dig @::1 +short access.redhat.com && rndc dumpdb && grep access.redhat.com /var/named/data/cache_dump.db
# latest RHEL8 BIND
access.redhat.com.	604810	CNAME	access.redhat.com2.edgekey.net.
access.redhat.com2.edgekey.net.globalredir.akadns.net. 605108 CNAME e40408.dsca.akamaiedge.net.
access.redhat.com2.edgekey.net.	610483 CNAME access.redhat.com2.edgekey.net.globalredir.akadns.net.


# Fedora BIND 9.16.6
access-chinacdn.redhat.com.edgekey.net.
access-chinacdn.redhat.com.edgekey.net.globalredir.akadns.net.
e133.a.akamaiedge.net.
104.103.104.37
access.redhat.com.	46354	CNAME	access-chinacdn.redhat.com.edgekey.net.


But it seems only max-stale-ttl=604800 default is different. Otherwise it behaves the same way. TTL provided by upstream server is much lower than any of these values.
I don't know why only default is stored and printed on cache dump, when serve-stale is turned off.

Comment 2 Petr Menšík 2020-09-22 12:57:31 UTC
Default is not ignored, because rndc serve-stale status reports off by default.

Also stale-answer-enable would be correct option to enable or disable serving stale records.

Newer versions have stale-cache-enable no; option, which would return behaviour back to previous state. It is not supported by RHEL8 backport of serve-stale backport. It is on by default on BIND 9.16.

Comment 6 RHEL Program Management 2021-11-22 07:27:02 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.