Bug 2131184

Summary: named-pkcs11: exiting (due to assertion failure) dumping core: REQUIRE(rdataset->methods != ((void *)0))
Product: Red Hat Enterprise Linux 7 Reporter: Aleksandr Sharov <asharov>
Component: bindAssignee: Petr Menšík <pemensik>
Status: ASSIGNED --- QA Contact: rhel-cs-infra-services-qe <rhel-cs-infra-services-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.9CC: pemensik, rcritten, sam, tscherf
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1854473, 1891410    
Bug Blocks: 2216096    
Attachments:
Description Flags
named.run none

Description Aleksandr Sharov 2022-09-30 08:26:44 UTC
Created attachment 1915194 [details]
named.run

Description of problem:
This is reopening of BZ https://bugzilla.redhat.com/show_bug.cgi?id=1935152 , packages updated


Sep 28 03:07:04 ipa-1 named-pkcs11[7587]: ../../../lib/dns-pkcs11/rdataset.c:263: REQUIRE(rdataset->methods != ((void *)0)) failed, back trace
Sep 28 03:07:04 ipa-1 named-pkcs11[7587]: #0 0x5630812cf120 in ??
Sep 28 03:07:04 ipa-1 named-pkcs11[7587]: #1 0x7fbf1a50246a in ??
Sep 28 03:07:04 ipa-1 named-pkcs11[7587]: #2 0x7fbf1a8897c8 in ??
Sep 28 03:07:04 ipa-1 named-pkcs11[7587]: #3 0x7fbf1a89b8e0 in ??
Sep 28 03:07:04 ipa-1 named-pkcs11[7587]: #4 0x7fbf1a89df79 in ??
Sep 28 03:07:04 ipa-1 named-pkcs11[7587]: #5 0x7fbf1a8a113c in ??
Sep 28 03:07:04 ipa-1 named-pkcs11[7587]: #6 0x7fbf1a526f6b in ??
Sep 28 03:07:04 ipa-1 named-pkcs11[7587]: #7 0x7fbf18601ea5 in ??
Sep 28 03:07:04 ipa-1 named-pkcs11[7587]: #8 0x7fbf17674b0d in ??
Sep 28 03:07:04 ipa-1 named-pkcs11[7587]: exiting (due to assertion failure)
Sep 28 03:07:04 ipa-1 abrt-hook-ccpp[7808]: Process 7587 (named-pkcs11) of user 25 killed by SIGABRT - dumping core
Sep 28 03:07:06 ipa-1 systemd[1]: named-pkcs11.service: main process exited, code=killed, status=6/ABRT
Sep 28 03:07:06 ipa-1 systemd[1]: Unit named-pkcs11.service entered failed state.
Sep 28 03:07:06 ipa-1 systemd[1]: named-pkcs11.service failed.


Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 7.9 (Maipo)
bind-pkcs11-9.11.4-26.P2.el7_9.9.x86_64                     Wed Sep 21 18:36:44 2022

How reproducible:
-

Expected results:
named not failing

Additional info:
Core file can be found in attached case

Comment 4 Petr Menšík 2022-09-30 18:46:05 UTC
I don't see any real concurrency in attached core dump. It terminates with REQUIRE(rdataset->methods != ((void *)0)) failed, on line lib/dns-pkcs11/rdataset.c:263, which suggests rdataset in iteration were disassociated. But I don't see any real concurrency. Thread 1 is doing fctx_try and Thread 5 processing ldap change. Other threads seems to be idling.

Comment 6 Petr Menšík 2022-09-30 19:01:06 UTC
It seems the crash logged at beginning of named.run happens during managed-keys priming, the failing query was ./DNSKEY.

28-Sep-2022 03:07:02.458 managed-keys-zone: journal file is out of date: removing journal file
28-Sep-2022 03:07:02.458 managed-keys-zone: loaded serial 54749
28-Sep-2022 03:07:02.458 zone 0.in-addr.arpa/IN: loaded serial 0
28-Sep-2022 03:07:02.459 zone localhost/IN: loaded serial 0
28-Sep-2022 03:07:02.459 zone 1.0.0.127.in-addr.arpa/IN: loaded serial 0
28-Sep-2022 03:07:02.459 zone 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/IN: loaded serial 0
28-Sep-2022 03:07:02.459 zone localhost.localdomain/IN: loaded serial 0
28-Sep-2022 03:07:02.459 all zones loaded
28-Sep-2022 03:07:02.460 running
28-Sep-2022 03:07:02.480 LDAP configuration for instance 'ipa' synchronized
28-Sep-2022 03:07:02.498 LDAP data for instance 'ipa' are being synchronized, please ignore message 'all zones loaded'
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:7fd::1#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:500:2f::f#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:500:1::53#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:500:200::b#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:503:c27::2:30#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:dc3::35#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:503:ba3e::2:30#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:7fe::53#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:500:9f::42#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:500:2d::d#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:500:a8::e#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:500:2::c#53
28-Sep-2022 03:07:03.261 network unreachable resolving './NS/IN': 2001:500:12::d0d#53
28-Sep-2022 03:07:04.862 ../../../lib/dns-pkcs11/rdataset.c:263: REQUIRE(rdataset->methods != ((void *)0)) failed, back trace

Comment 7 Petr Menšík 2022-09-30 19:06:19 UTC
This issue might be related to bind-dyndb-ldap, but seems primary in bind component, named-pkcs11 service.

I would recommend the customer using OPTIONS="-4" in /etc/sysconfig/named. When IPv6 is not available on the host, it would save named trying from accessing it. It would make all outgoing queries IPv4 only, because it seems all IPv6 queries are failing anyway on this host. Probably would not avoid hitting this issue, but would save some CPU cycles for important stuff. And log less warnings.

Comment 8 Petr Menšík 2022-09-30 19:11:20 UTC
I would suggest workaround recommended in https://bugzilla.redhat.com/show_bug.cgi?id=1935152#c36

It would not prevent those crashes, but would make them less noticeable.

Comment 9 Petr Menšík 2022-09-30 19:18:45 UTC
There has been bug #1891410 filled for this issue, which has been closed as a duplicate of bug #1891410, which is still open.

Comment 10 Petr Menšík 2022-09-30 19:22:32 UTC
Bug #1854473 is the correct one still open for bind-dyndb-ldap.