Bug 1893761

Summary: SEGFAULT in libdns
Product: [Fedora] Fedora Reporter: Fritz Elfert <fritz>
Component: bindAssignee: Petr Menšík <pemensik>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 32CC: aegorenk, anon.amish, dns-sig, mruprich, msehnout, pemensik, pzhukov, vonsch, zdohnal
Target Milestone: ---Keywords: Patch, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: bind-9.11.24-2.fc34 bind-9.11.24-2.fc33 bind-9.11.24-2.fc32 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1894531 (view as bug list) Environment:
Last Closed: 2020-11-09 01:12:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1894531    
Attachments:
Description Flags
Config which triggers crash
none
config before changes
none
Output of coredumpctl debug none

Description Fritz Elfert 2020-11-02 14:34:02 UTC
Description of problem:
I recently enabled DNSSEC validation in my local caching server and now it
segfaults approx every 5 minutes.

Version-Release number of selected component (if applicable):
bind-9.11.23-1.fc32.x86_64 + bind-libs-9.11.23-1.fc32.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Use attached named.conf.crashing (partially redacted)
2.
3.

Actual results:
named crashes approx every 5minutes. Example line in dmesg:

[1883332.180292] isc-worker0018[2182205]: segfault at 8 ip 00007fc1f082c312 sp 00007fc1e6b4dd60 error 4 in libdns.so.1110.1.0[7fc1f07a9000+1a2000]
[1883332.180315] Code: 48 8b 44 24 18 48 8d b8 a8 00 00 00 e8 f7 74 f8 ff e9 24 ff ff ff 66 90 4c 89 e7 e8 18 83 f8 ff 48 8b 44 24 20 48 8d 74 24 18 <48> 8b 78 08 e8 f5 fa ff ff eb 8c 0f 1f 00 48 8d 0d 4b ed 11 00 31

The Code:... output of all crashes is identical.

Expected results:
Should not crash

Additional info:
I also attach the old working named.conf.good.
Basically the changes that lead to the crash were the following:

1. Remove global forward-only and forwarders, because the router
   at 192.168.101.1 does not support DNSSEC.
2. Change global dnssec-enable and dnssec-validation.
3. Moved 3 forward-only zones into a new view with dnssec-validation
   disabled because those forwarders do not support DNSSEC.

None of the internal zones are signed yet.

I also installed all necessary debuginfo packages and ran coredumpctl debug.
Its output is attached as debug.out

Comment 1 Fritz Elfert 2020-11-02 14:36:38 UTC
Created attachment 1725839 [details]
Config which triggers crash

Comment 2 Fritz Elfert 2020-11-02 14:37:31 UTC
Created attachment 1725840 [details]
config before changes

Comment 3 Fritz Elfert 2020-11-02 14:38:16 UTC
Created attachment 1725841 [details]
Output of coredumpctl debug

Comment 4 Fritz Elfert 2020-11-02 14:43:33 UTC
Some additional gdb output:

(gdb) print *nta
$2 = {magic = 1314144622, refcount = {refs = 2}, ntatable = 0x7fc1da7f6110, forced = false, timer = 0x7fc1da0f5b58, fetch = 0x0, rdataset = {magic = 1145983826, methods = 0x0, link = {
      prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, rdclass = 0, type = 0, ttl = 0, trust = 0, covers = 0, attributes = 0, count = 4294967295, resign = 0, private1 = 0x0, private2 = 0x0, 
    private3 = 0x0, privateuint4 = 0, private5 = 0x0, private6 = 0x0, private7 = 0x0, stale_ttl = 3200171710}, sigrdataset = {magic = 1145983826, methods = 0x0, link = {prev = 0xffffffffffffffff, 
      next = 0xffffffffffffffff}, rdclass = 0, type = 0, ttl = 0, trust = 0, covers = 0, attributes = 0, count = 4294967295, resign = 0, private1 = 0x0, private2 = 0x0, private3 = 0x0, 
    privateuint4 = 0, private5 = 0x0, private6 = 0x0, private7 = 0x0, stale_ttl = 3200171710}, fn = {name = {magic = 1145983854, ndata = 0x7fc1d9e8b240 "\001\061\002\061\060\ain-addr\004arpa", 
      length = 19, labels = 5, attributes = 1, offsets = 0x7fc1d9e8b180 "", buffer = 0x7fc1d9e8b200, link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, list = {head = 0x0, tail = 0x0}}, 
    offsets = "\000\002\005\r\022", '\276' <repeats 123 times>, buffer = {magic = 1114990113, base = 0x7fc1d9e8b240, length = 255, used = 19, current = 0, active = 0, link = {
        prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, mctx = 0x0, autore = false}, data = "\001\061\002\061\060\ain-addr\004arpa\000", '\276' <repeats 236 times>}, name = 0x7fc1d9e8b130, 
  expiry = 1604494611}
(gdb) print *view
Cannot access memory at address 0x0

So it looks like this function is called with a nullptr view.

Comment 5 Fritz Elfert 2020-11-02 14:58:52 UTC
Ahhh, and that reveals my fault:

Before creating the new view and moving three zones into it, I experimented with

rndc nta add ZONE (where ZONE is one of those moved zones)

I also found in ISC docs, that any negative trust anchor is retained over restarts.
So now, I ran

rndc nta -remove ZONE

after that, no crashes happen anymore :-)

Still, this should not lead to a crash but attempting to check nta expiry with a nullptr view should be logged
as error and the corresponding nta entry should be removed. Therefore, I leave this open....

Thanks
 -Fritz

Comment 6 Fritz Elfert 2020-11-02 15:28:54 UTC
I have reported this upstream: https://gitlab.isc.org/isc-projects/bind9/-/issues/2244

Comment 7 Petr Menšík 2020-11-02 17:45:11 UTC
lib/dns/nta.c:285

	if (result != ISC_R_SUCCESS) {
		dns_view_weakdetach(&view);
		nta_detach(view->mctx, &nta);
	}

It seems to me only reversal might help:

	if (result != ISC_R_SUCCESS) {
		nta_detach(view->mctx, &nta);
		dns_view_weakdetach(&view);
	}

Comment 8 Petr Menšík 2020-11-04 12:55:40 UTC
Thank you for your report, it was merged already.

Comment 9 Petr Menšík 2020-11-04 12:58:42 UTC
Lowering priority, because createfetch must fail first. That is not common situation.

Comment 10 Fedora Update System 2020-11-04 15:41:07 UTC
FEDORA-2020-4fb5288e2c has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-4fb5288e2c

Comment 11 Fedora Update System 2020-11-04 15:42:05 UTC
FEDORA-2020-10f706cd37 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-10f706cd37

Comment 12 Fedora Update System 2020-11-05 02:02:48 UTC
FEDORA-2020-4fb5288e2c has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-4fb5288e2c`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-4fb5288e2c

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 13 Fedora Update System 2020-11-05 03:28:14 UTC
FEDORA-2020-10f706cd37 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-10f706cd37`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-10f706cd37

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 14 Fedora Update System 2020-11-09 01:12:16 UTC
FEDORA-2020-10f706cd37 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 15 Fedora Update System 2020-11-28 02:10:04 UTC
FEDORA-2020-4fb5288e2c has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.