1894531 – SEGFAULT in libdns

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1894531 - SEGFAULT in libdns

Summary: SEGFAULT in libdns

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	bind
Sub Component:
Version:	8.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	8.4
Assignee:	Petr Menšík
QA Contact:	Petr Sklenar
Docs Contact:
URL:
Whiteboard:
Depends On:	bind_rebase_el840 1893761
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-04 13:01 UTC by Petr Menšík
Modified:	2021-05-18 14:59 UTC (History)
CC List:	6 users (show)
Fixed In Version:	bind-9.11.26-1.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1893761
Environment:
Last Closed:	2021-05-18 14:59:05 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2021:1645	0	None	None	None	2021-05-18 14:59:31 UTC

Description Petr Menšík 2020-11-04 13:01:54 UTC

+++ This bug was initially created as a clone of Bug #1893761 +++

Description of problem:
I recently enabled DNSSEC validation in my local caching server and now it
segfaults approx every 5 minutes.

Version-Release number of selected component (if applicable):
bind-9.11.23-1.fc32.x86_64 + bind-libs-9.11.23-1.fc32.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Use attached named.conf.crashing (partially redacted)
2.
3.

Actual results:
named crashes approx every 5minutes. Example line in dmesg:

[1883332.180292] isc-worker0018[2182205]: segfault at 8 ip 00007fc1f082c312 sp 00007fc1e6b4dd60 error 4 in libdns.so.1110.1.0[7fc1f07a9000+1a2000]
[1883332.180315] Code: 48 8b 44 24 18 48 8d b8 a8 00 00 00 e8 f7 74 f8 ff e9 24 ff ff ff 66 90 4c 89 e7 e8 18 83 f8 ff 48 8b 44 24 20 48 8d 74 24 18 <48> 8b 78 08 e8 f5 fa ff ff eb 8c 0f 1f 00 48 8d 0d 4b ed 11 00 31

The Code:... output of all crashes is identical.

Expected results:
Should not crash

Additional info:
I also attach the old working named.conf.good.
Basically the changes that lead to the crash were the following:

1. Remove global forward-only and forwarders, because the router
   at 192.168.101.1 does not support DNSSEC.
2. Change global dnssec-enable and dnssec-validation.
3. Moved 3 forward-only zones into a new view with dnssec-validation
   disabled because those forwarders do not support DNSSEC.

None of the internal zones are signed yet.

I also installed all necessary debuginfo packages and ran coredumpctl debug.
Its output is attached as debug.out

--- Additional comment from Fritz Elfert on 2020-11-02 15:36:38 CET ---



--- Additional comment from Fritz Elfert on 2020-11-02 15:37:31 CET ---



--- Additional comment from Fritz Elfert on 2020-11-02 15:38:16 CET ---



--- Additional comment from Fritz Elfert on 2020-11-02 15:43:33 CET ---

Some additional gdb output:

(gdb) print *nta
$2 = {magic = 1314144622, refcount = {refs = 2}, ntatable = 0x7fc1da7f6110, forced = false, timer = 0x7fc1da0f5b58, fetch = 0x0, rdataset = {magic = 1145983826, methods = 0x0, link = {
      prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, rdclass = 0, type = 0, ttl = 0, trust = 0, covers = 0, attributes = 0, count = 4294967295, resign = 0, private1 = 0x0, private2 = 0x0, 
    private3 = 0x0, privateuint4 = 0, private5 = 0x0, private6 = 0x0, private7 = 0x0, stale_ttl = 3200171710}, sigrdataset = {magic = 1145983826, methods = 0x0, link = {prev = 0xffffffffffffffff, 
      next = 0xffffffffffffffff}, rdclass = 0, type = 0, ttl = 0, trust = 0, covers = 0, attributes = 0, count = 4294967295, resign = 0, private1 = 0x0, private2 = 0x0, private3 = 0x0, 
    privateuint4 = 0, private5 = 0x0, private6 = 0x0, private7 = 0x0, stale_ttl = 3200171710}, fn = {name = {magic = 1145983854, ndata = 0x7fc1d9e8b240 "\001\061\002\061\060\ain-addr\004arpa", 
      length = 19, labels = 5, attributes = 1, offsets = 0x7fc1d9e8b180 "", buffer = 0x7fc1d9e8b200, link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, list = {head = 0x0, tail = 0x0}}, 
    offsets = "\000\002\005\r\022", '\276' <repeats 123 times>, buffer = {magic = 1114990113, base = 0x7fc1d9e8b240, length = 255, used = 19, current = 0, active = 0, link = {
        prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, mctx = 0x0, autore = false}, data = "\001\061\002\061\060\ain-addr\004arpa\000", '\276' <repeats 236 times>}, name = 0x7fc1d9e8b130, 
  expiry = 1604494611}
(gdb) print *view
Cannot access memory at address 0x0

So it looks like this function is called with a nullptr view.

--- Additional comment from Fritz Elfert on 2020-11-02 15:58:52 CET ---

Ahhh, and that reveals my fault:

Before creating the new view and moving three zones into it, I experimented with

rndc nta add ZONE (where ZONE is one of those moved zones)

I also found in ISC docs, that any negative trust anchor is retained over restarts.
So now, I ran

rndc nta -remove ZONE

after that, no crashes happen anymore :-)

Still, this should not lead to a crash but attempting to check nta expiry with a nullptr view should be logged
as error and the corresponding nta entry should be removed. Therefore, I leave this open....

Thanks
 -Fritz

--- Additional comment from Fritz Elfert on 2020-11-02 16:28:54 CET ---

I have reported this upstream: https://gitlab.isc.org/isc-projects/bind9/-/issues/2244

--- Additional comment from Petr Menšík on 2020-11-02 18:45:11 CET ---

lib/dns/nta.c:285

	if (result != ISC_R_SUCCESS) {
		dns_view_weakdetach(&view);
		nta_detach(view->mctx, &nta);
	}

It seems to me only reversal might help:

	if (result != ISC_R_SUCCESS) {
		nta_detach(view->mctx, &nta);
		dns_view_weakdetach(&view);
	}

--- Additional comment from Petr Menšík on 2020-11-04 13:55:40 CET ---

Thank you for your report, it was merged already.

--- Additional comment from Petr Menšík on 2020-11-04 13:58:42 CET ---

Lowering priority, because createfetch must fail first. That is not common situation.

Comment 6 errata-xmlrpc 2021-05-18 14:59:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (bind bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1645

Note You need to log in before you can comment on or make changes to this bug.