Bug 1893761 - SEGFAULT in libdns
Summary: SEGFAULT in libdns
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: bind
Version: 32
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Petr Menšík
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1894531
TreeView+ depends on / blocked
 
Reported: 2020-11-02 14:34 UTC by Fritz Elfert
Modified: 2020-11-28 02:10 UTC (History)
9 users (show)

Fixed In Version: bind-9.11.24-2.fc34 bind-9.11.24-2.fc33 bind-9.11.24-2.fc32
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1894531 (view as bug list)
Environment:
Last Closed: 2020-11-09 01:12:16 UTC
Type: Bug


Attachments (Terms of Use)
Config which triggers crash (2.93 KB, text/plain)
2020-11-02 14:36 UTC, Fritz Elfert
no flags Details
config before changes (3.37 KB, text/plain)
2020-11-02 14:37 UTC, Fritz Elfert
no flags Details
Output of coredumpctl debug (11.79 KB, text/plain)
2020-11-02 14:38 UTC, Fritz Elfert
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Internet Systems Consortium (ISC) isc-projects bind9 issues 2244 0 None None None 2020-11-02 17:03:37 UTC
Internet Systems Consortium (ISC) isc-projects bind9 merge_requests 4353 0 None None None 2020-11-04 12:55:39 UTC

Description Fritz Elfert 2020-11-02 14:34:02 UTC
Description of problem:
I recently enabled DNSSEC validation in my local caching server and now it
segfaults approx every 5 minutes.

Version-Release number of selected component (if applicable):
bind-9.11.23-1.fc32.x86_64 + bind-libs-9.11.23-1.fc32.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Use attached named.conf.crashing (partially redacted)
2.
3.

Actual results:
named crashes approx every 5minutes. Example line in dmesg:

[1883332.180292] isc-worker0018[2182205]: segfault at 8 ip 00007fc1f082c312 sp 00007fc1e6b4dd60 error 4 in libdns.so.1110.1.0[7fc1f07a9000+1a2000]
[1883332.180315] Code: 48 8b 44 24 18 48 8d b8 a8 00 00 00 e8 f7 74 f8 ff e9 24 ff ff ff 66 90 4c 89 e7 e8 18 83 f8 ff 48 8b 44 24 20 48 8d 74 24 18 <48> 8b 78 08 e8 f5 fa ff ff eb 8c 0f 1f 00 48 8d 0d 4b ed 11 00 31

The Code:... output of all crashes is identical.

Expected results:
Should not crash

Additional info:
I also attach the old working named.conf.good.
Basically the changes that lead to the crash were the following:

1. Remove global forward-only and forwarders, because the router
   at 192.168.101.1 does not support DNSSEC.
2. Change global dnssec-enable and dnssec-validation.
3. Moved 3 forward-only zones into a new view with dnssec-validation
   disabled because those forwarders do not support DNSSEC.

None of the internal zones are signed yet.

I also installed all necessary debuginfo packages and ran coredumpctl debug.
Its output is attached as debug.out

Comment 1 Fritz Elfert 2020-11-02 14:36:38 UTC
Created attachment 1725839 [details]
Config which triggers crash

Comment 2 Fritz Elfert 2020-11-02 14:37:31 UTC
Created attachment 1725840 [details]
config before changes

Comment 3 Fritz Elfert 2020-11-02 14:38:16 UTC
Created attachment 1725841 [details]
Output of coredumpctl debug

Comment 4 Fritz Elfert 2020-11-02 14:43:33 UTC
Some additional gdb output:

(gdb) print *nta
$2 = {magic = 1314144622, refcount = {refs = 2}, ntatable = 0x7fc1da7f6110, forced = false, timer = 0x7fc1da0f5b58, fetch = 0x0, rdataset = {magic = 1145983826, methods = 0x0, link = {
      prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, rdclass = 0, type = 0, ttl = 0, trust = 0, covers = 0, attributes = 0, count = 4294967295, resign = 0, private1 = 0x0, private2 = 0x0, 
    private3 = 0x0, privateuint4 = 0, private5 = 0x0, private6 = 0x0, private7 = 0x0, stale_ttl = 3200171710}, sigrdataset = {magic = 1145983826, methods = 0x0, link = {prev = 0xffffffffffffffff, 
      next = 0xffffffffffffffff}, rdclass = 0, type = 0, ttl = 0, trust = 0, covers = 0, attributes = 0, count = 4294967295, resign = 0, private1 = 0x0, private2 = 0x0, private3 = 0x0, 
    privateuint4 = 0, private5 = 0x0, private6 = 0x0, private7 = 0x0, stale_ttl = 3200171710}, fn = {name = {magic = 1145983854, ndata = 0x7fc1d9e8b240 "\001\061\002\061\060\ain-addr\004arpa", 
      length = 19, labels = 5, attributes = 1, offsets = 0x7fc1d9e8b180 "", buffer = 0x7fc1d9e8b200, link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, list = {head = 0x0, tail = 0x0}}, 
    offsets = "\000\002\005\r\022", '\276' <repeats 123 times>, buffer = {magic = 1114990113, base = 0x7fc1d9e8b240, length = 255, used = 19, current = 0, active = 0, link = {
        prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, mctx = 0x0, autore = false}, data = "\001\061\002\061\060\ain-addr\004arpa\000", '\276' <repeats 236 times>}, name = 0x7fc1d9e8b130, 
  expiry = 1604494611}
(gdb) print *view
Cannot access memory at address 0x0

So it looks like this function is called with a nullptr view.

Comment 5 Fritz Elfert 2020-11-02 14:58:52 UTC
Ahhh, and that reveals my fault:

Before creating the new view and moving three zones into it, I experimented with

rndc nta add ZONE (where ZONE is one of those moved zones)

I also found in ISC docs, that any negative trust anchor is retained over restarts.
So now, I ran

rndc nta -remove ZONE

after that, no crashes happen anymore :-)

Still, this should not lead to a crash but attempting to check nta expiry with a nullptr view should be logged
as error and the corresponding nta entry should be removed. Therefore, I leave this open....

Thanks
 -Fritz

Comment 6 Fritz Elfert 2020-11-02 15:28:54 UTC
I have reported this upstream: https://gitlab.isc.org/isc-projects/bind9/-/issues/2244

Comment 7 Petr Menšík 2020-11-02 17:45:11 UTC
lib/dns/nta.c:285

	if (result != ISC_R_SUCCESS) {
		dns_view_weakdetach(&view);
		nta_detach(view->mctx, &nta);
	}

It seems to me only reversal might help:

	if (result != ISC_R_SUCCESS) {
		nta_detach(view->mctx, &nta);
		dns_view_weakdetach(&view);
	}

Comment 8 Petr Menšík 2020-11-04 12:55:40 UTC
Thank you for your report, it was merged already.

Comment 9 Petr Menšík 2020-11-04 12:58:42 UTC
Lowering priority, because createfetch must fail first. That is not common situation.

Comment 10 Fedora Update System 2020-11-04 15:41:07 UTC
FEDORA-2020-4fb5288e2c has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-4fb5288e2c

Comment 11 Fedora Update System 2020-11-04 15:42:05 UTC
FEDORA-2020-10f706cd37 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-10f706cd37

Comment 12 Fedora Update System 2020-11-05 02:02:48 UTC
FEDORA-2020-4fb5288e2c has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-4fb5288e2c`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-4fb5288e2c

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 13 Fedora Update System 2020-11-05 03:28:14 UTC
FEDORA-2020-10f706cd37 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-10f706cd37`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-10f706cd37

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 14 Fedora Update System 2020-11-09 01:12:16 UTC
FEDORA-2020-10f706cd37 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 15 Fedora Update System 2020-11-28 02:10:04 UTC
FEDORA-2020-4fb5288e2c has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.