Bug 2048235

Summary: freeipa: bind coredumps with systemctl restart named
Product: [Fedora] Fedora Reporter: Florence Blanc-Renaud <frenaud>
Component: bindAssignee: Petr Menšík <pemensik>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 36CC: abokovoy, aegorenk, anon.amish, dns-sig, mruprich, msehnout, pemensik, vonsch, zdohnal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: bind-9.16.27-1.fc35 bind-9.16.27-1.fc36 bind-9.16.27-1.fc34 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-22 03:42:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2057493    

Description Florence Blanc-Renaud 2022-01-30 12:35:21 UTC
Description of problem:
With the update to bind 9.16.25-2 and bind-dyndb-ldap 11.9-13, bind coredumps in my IPA server.


Version-Release number of selected component (if applicable):
freeipa-server-4.10.0.dev-0.fc36.x86_64
bind-9.16.25-2.fc36.x86_64
bind-dyndb-ldap-11.9-13.fc36.x86_64

How reproducible:
90% of the time

Steps to Reproduce:
1. install freeipa-server-dns package on rawhide
2. configure the server with ipa-server-install --domain ipa.test --realm IPA.TEST --setup-dns --auto-forwarders -a Secret123 -p Secret123 -U
3. restart named with systemctl restart named.service

Actual results:
Sometimes the coredump happens during ipa-server-install (in an internal step restarting all the services with "ipactl restart"), sometimes the coredump happens with "systemctl restart named.service"

Expected results:
The installation and restart should work without any problem.

Additional info:

The issue has already been reported in IPA project, https://pagure.io/freeipa/issue/9102.

Comment 2 Petr Menšík 2022-01-31 13:09:58 UTC
Were able to reproduce it on rawhide. It even crashed gdb partially.

#0  0x00007feed4d94796 in isc_hp_protect (hp=0x560ad4ccc1f0, ihp=<optimized out>, atom=0x560ad4cd2680) at ../../../lib/isc/hp.c:158
        __atomic_store_ptr = <optimized out>
        __atomic_store_tmp = 94604519898192
        n = <optimized out>
        ret = 94604519898192
#1  0x00007feed4dc022a in isc_queue_enqueue (item=<optimized out>, queue=<optimized out>) at ../../../lib/isc/queue.c:142
        lt = 0x0
        idx = <optimized out>
        n = 0
#2  isc_queue_enqueue (queue=0x560ad4cd2600, item=140663467457264) at ../../../lib/isc/queue.c:134
        lt = <optimized out>
        idx = <optimized out>
        n = <optimized out>
        lnext = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        newnode = <optimized out>
        __atomic_compare_exchange_ptr = <optimized out>
        __atomic_compare_exchange_tmp = <optimized out>
#3  0x00007feed4dab8d5 in isc__nm_enqueue_ievent (worker=0x560ad4ccc710, event=<optimized out>)
    at netmgr/../../../../lib/isc/netmgr/netmgr.c:1116
No locals.
#4  0x00007feed4dc8972 in task_ready (task=0x560ad4cd5490) at ../../../lib/isc/task.c:344
        manager = 0x560ad4cd3890
        manager = <optimized out>
        __v = <optimized out>
#5  isc_task_sendto (task=task@entry=0x560ad4cd5490, eventp=eventp@entry=0x7feed11b5118, c=c@entry=-1) at ../../../lib/isc/task.c:493
        was_idle = <optimized out>
#6  0x00007feed4dc8a6e in isc_task_send (task=task@entry=0x560ad4cd5490, eventp=eventp@entry=0x7feed11b5118)
    at ../../../lib/isc/task.c:449
No locals.
#7  0x00007feed139b988 in sync_event_send (synchronous=true, ev=0x7feed11b5118, task=0x560ad4cd5490, sctx=0x7feecc416b10)
    at /usr/src/debug/bind-dyndb-ldap-11.9-13.fc36.x86_64/src/syncrepl.c:614
        result = <optimized out>
        abs_timeout = {seconds = 3426842816, nanoseconds = 32750}
        seqid = 3
        cleanup = <optimized out>
        result = <optimized out>
        abs_timeout = <optimized out>
        seqid = <optimized out>
        cleanup = <optimized out>
#8  syncrepl_update (inst=inst@entry=0x7feecc416a00, entryp=entryp@entry=0x7feed11b51c8, chgtype=chgtype@entry=1)
    at /usr/src/debug/bind-dyndb-ldap-11.9-13.fc36.x86_64/src/ldap_helper.c:4231
--Type <RET> for more, q to quit, c to continue without paging--
        result = <optimized out>
        pevent = 0x0
        entry = 0x7feec4009160
        zone_name = <optimized out>
        zone_ptr = 0x0
        action = <optimized out>
        task = 0x560ad4cd5490
        synchronous = true
        cleanup = <optimized out>
        __func__ = <optimized out>
#9  0x00007feed139c695 in ldap_sync_search_entry (ls=<optimized out>, msg=<optimized out>, entryUUID=0x7feed11b5330, 
    phase=LDAP_SYNC_CAPI_ADD) at /usr/src/debug/bind-dyndb-ldap-11.9-13.fc36.x86_64/src/ldap_helper.c:4409
        inst = <optimized out>
        old_entry = 0x0
        new_entry = 0x7feec4009160
        result = 0
        node = 0x0
        mldap_open = false
        modrdn = <optimized out>
        cleanup = <optimized out>
        __func__ = "ldap_sync_search_entry"
#10 0x00007feed135e361 in ldap_sync_search_entry (ls=ls@entry=0x7feec40019a0, res=res@entry=0x7feec4009090)
    at /usr/src/debug/openldap-2.4.59-6.fc36.x86_64/openldap-2.4.59/libraries/libldap_r/ldap_sync.c:225
        ctrls = 0x7feec4004ec0
        rc = 80
        i = <optimized out>
        ber = 0x7feec4009100
        entryUUID = {bv_len = 16, bv_val = 0x7feec400ba67 "\363\222݊\202\221\021\354\224|\274\233\264i\003\231"}
        cookie = {bv_len = 0, bv_val = 0x0}
        state = 1
        len = 0
        phase = <optimized out>
        __PRETTY_FUNCTION__ = "ldap_sync_search_entry"
#11 0x00007feed135f0bb in ldap_sync_init (ls=ls@entry=0x7feec40019a0, mode=mode@entry=3)
    at /usr/src/debug/openldap-2.4.59-6.fc36.x86_64/openldap-2.4.59/libraries/libldap_r/ldap_sync.c:790
        refreshDone = 0
        msg = 0x7feec4009090
        ctrl = {ldctl_oid = 0x7feed1365946 "1.3.6.1.4.1.4203.1.9.1.1", ldctl_value = {bv_len = 8, 
            bv_val = 0x7feec4009a80 "0\006\n\001\003\001\001"}, ldctl_iscritical = 1 '\001'}
        ctrls = {0x7feed11b53b0, 0x0}
        ber = 0x7feec4002210
        rc = <optimized out>
        tv = {tv_sec = 0, tv_usec = 100000}
        tvp = <optimized out>
        res = 0x7feec4009090
--Type <RET> for more, q to quit, c to continue without paging--
        __PRETTY_FUNCTION__ = "ldap_sync_init"
#12 0x00007feed139d45d in ldap_sync_doit (inst=inst@entry=0x7feecc416a00, conn=conn@entry=0x7feecc418430, 
    filter_objcs=filter_objcs@entry=0x7feed13aa778 "(|(objectClass=idnsZone)  (objectClass=idnsForwardZone)  (objectClass=idnsRecord))", mode=mode@entry=3) at /usr/src/debug/bind-dyndb-ldap-11.9-13.fc36.x86_64/src/ldap_helper.c:4693
        result = 0
        ret = <optimized out>
        s_len = <optimized out>
        ldap_sync = 0x7feec40019a0
        err_hint = 0x7feed13a5fb8 ""
        filter = "(|  (objectClass=idnsConfigObject)    (&(objectClass=idnsServerConfigObject)    (idnsServerId=ci-vm-10-0-137-13.hosted.upshift.rdu2.redhat.com))(|(objectClass=idnsZone)  (objectClass=idnsForwardZone)  "...
        config_template = "(|  (objectClass=idnsConfigObject)  %s%s%s%s)"
        server_id = 0x7feecc419c28 "ci-vm-10-0-137-13.hosted.upshift.rdu2.redhat.com"
        __func__ = <optimized out>
#13 0x00007feed139d8e6 in ldap_syncrepl_watcher (arg=0x7feecc416a00)
    at /usr/src/debug/bind-dyndb-ldap-11.9-13.fc36.x86_64/src/ldap_helper.c:4785
        inst = 0x7feecc416a00
        conn = 0x7feecc418430
        ret = <optimized out>
        result = <optimized out>
        sigset = {__val = {512, 0 <repeats 15 times>}}
        reconnect_interval = 0
        state = sync_datainit
        retry = <optimized out>
        __func__ = <optimized out>
#14 0x00007feed4dc7ef9 in isc__trampoline_run (arg=0x7feecc469430) at ../../../lib/isc/trampoline.c:198
        trampoline = 0x7feecc469430
        result = <optimized out>
#15 0x00007feed45ad862 in start_thread (arg=<optimized out>) at pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140663723370656, 1579087791515376470, 140663687177792, 0, 140663741666704, 0, 
                -1569599249955546282, -1569606384089687210}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#16 0x00007feed4632670 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I cannot say it is obvious, what is wrong here.

(gdb) disassemble 
Dump of assembler code for function isc_hp_protect:
   0x00007feed4d94750 <+0>:	endbr64 
   0x00007feed4d94754 <+4>:	push   %r13
   0x00007feed4d94756 <+6>:	movslq %esi,%rsi
   0x00007feed4d94759 <+9>:	mov    %rdi,%r13
   0x00007feed4d9475c <+12>:	xor    %eax,%eax
   0x00007feed4d9475e <+14>:	push   %r12
   0x00007feed4d94760 <+16>:	mov    %rdx,%r12
   0x00007feed4d94763 <+19>:	push   %rbp
   0x00007feed4d94764 <+20>:	lea    0x0(,%rsi,8),%rbp
   0x00007feed4d9476c <+28>:	push   %rbx
   0x00007feed4d9476d <+29>:	sub    $0x8,%rsp
   0x00007feed4d94771 <+33>:	jmp    0x7feed4d9479d <isc_hp_protect+77>
   0x00007feed4d94773 <+35>:	nopl   0x0(%rax,%rax,1)
   0x00007feed4d94778 <+40>:	data16 lea 0x6b7b8(%rip),%rdi        # 0x7feed4dfff38
   0x00007feed4d94780 <+48>:	data16 data16 rex.W call 0x7feed4d8f160 <__tls_get_addr@plt>
   0x00007feed4d94788 <+56>:	mov    %rbx,%rcx
   0x00007feed4d9478b <+59>:	movslq (%rax),%rdx
   0x00007feed4d9478e <+62>:	mov    0x10(%r13),%rax
   0x00007feed4d94792 <+66>:	mov    (%rax,%rdx,8),%rax
=> 0x00007feed4d94796 <+70>:	xchg   %rcx,(%rax,%rbp,1)
   0x00007feed4d9479a <+74>:	mov    %rbx,%rax
   0x00007feed4d9479d <+77>:	mov    (%r12),%rbx
   0x00007feed4d947a1 <+81>:	cmp    %rax,%rbx
   0x00007feed4d947a4 <+84>:	jne    0x7feed4d94778 <isc_hp_protect+40>
   0x00007feed4d947a6 <+86>:	add    $0x8,%rsp
   0x00007feed4d947aa <+90>:	pop    %rbx
   0x00007feed4d947ab <+91>:	pop    %rbp
   0x00007feed4d947ac <+92>:	pop    %r12
   0x00007feed4d947ae <+94>:	pop    %r13
   0x00007feed4d947b0 <+96>:	ret    
End of assembler dump.
(gdb) info registers
rax            0xbe                190
rbx            0x560ad4cd7c50      94604519898192
rcx            0x560ad4cd7c50      94604519898192
rdx            0x4                 4
rsi            0x0                 0
rdi            0x7feed4dfff38      140663750393656
rbp            0x0                 0x0
rsp            0x7feed11b5020      0x7feed11b5020
r8             0x0                 0
r9             0x0                 0
r10            0x7feec402be40      140663467458112
r11            0x7feec4000030      140663467278384
r12            0x560ad4cd2680      94604519876224
r13            0x560ad4ccc1f0      94604519850480
r14            0x7feec4009160      140663467315552
r15            0x7feecc416b10      140663605783312
rip            0x7feed4d94796      0x7feed4d94796 <isc_hp_protect+70>
eflags         0x10202             [ IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
k0             0x0                 0
k1             0xf                 15
k2             0xff                255
k3             0x20                32
k4             0x0                 0
k5             0x0                 0
k6             0x0                 0
k7             0x0                 0

It seems some variable is uninitialized 0xbe seems uninitialized constant, rbp is 0

Could be some hp structure uninitialized properly in ldap worker thread?

Comment 3 Petr Menšík 2022-01-31 17:46:41 UTC
Might be caused by recent modification of taskmgr [1]. It is well handled by named, but might not be handled accordingly by bind-dyndb-ldap plugin. It is possible change is required. Mentioned also in later commit 5be356760 [2].

1. https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/4918
2. https://gitlab.isc.org/isc-projects/bind9/commit/5be356760dd6855944234980f2fc0f13130267fb

Comment 4 Petr Menšík 2022-01-31 18:28:49 UTC
I tried copr build [1] on f35 release. It crashes the same way. That means code changes are responsible, not gcc and libraries changes as I thought originally. Comment #3 lists primary candidates. It seems bind-dyndb-ldap needs to be fixed to work with modified code.

Comment 5 Petr Menšík 2022-01-31 23:35:47 UTC
This happens when another thread is loading a zone. Maybe exclusive task semantics has changed? It seems some locks are not at proper places.

(gdb) info threads
  Id   Target Id                         Frame 
  1    Thread 0x7ff863bfe640 (LWP 10357) 0x00007ff86b33a886 in isc_hp_protect (hp=0x55f85dec4c60, ihp=<optimized out>, 
    atom=0x55f85dec3180) at ../../../lib/isc/hp.c:158
  2    Thread 0x7ff86a803dc0 (LWP 10353) 0x00007ff86ac3743a in __GI___sigtimedwait (set=set@entry=0x7ffcf7859e00, 
    info=info@entry=0x7ffcf7859d30, timeout=timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:61
  3    Thread 0x7ff869327640 (LWP 10355) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7ff869326b00, op=393, 
    expected=0, futex_word=0x55f85debdf98) at futex-internal.c:57
  4    Thread 0x7ff868b26640 (LWP 10356) 0x00007ff86ad05c1e in epoll_wait (epfd=16, events=events@entry=0x55f85df38750, 
    maxevents=maxevents@entry=2048, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
* 5    Thread 0x7ff869b28640 (LWP 10354) isc_lex_isfile (lex=lex@entry=0x7ff85c015a30) at ../../../lib/isc/lex.c:1104


(gdb) bt full
#0  isc_lex_isfile (lex=lex@entry=0x7ff85c015a30) at ../../../lib/isc/lex.c:1104
        source = <optimized out>
#1  0x00007ff86b4f79bf in fromtext_warneof (callbacks=0x7ff864072ac8, lexer=0x7ff85c015a30) at ../../../lib/dns/rdata.c:2080
No locals.
#2  dns_rdata_fromtext (rdata=0x0, rdclass=<optimized out>, type=<optimized out>, lexer=0x7ff85c015a30, origin=<optimized out>, 
    options=<optimized out>, mctx=0x55f85df46080, target=0x7ff869b22480, callbacks=0x7ff864072ac8) at ../../../lib/dns/rdata.c:1051
        result = 0
        region = {base = 0x302d7265444e536e <error: Cannot access memory at address 0x302d7265444e536e>, length = 1677734048}
        st = <optimized out>
        token = {type = isc_tokentype_eof, value = {as_char = 48 '0', as_ulong = 140704806808112, as_region = {
              base = 0x7ff864072a30 "", length = 1}, as_textregion = {base = 0x7ff864072a30 "", length = 1}, 
            as_pointer = 0x7ff864072a30}}
        lexoptions = 291
        name = 0x7ff864220978 "buffer-0x7ff869b224c0"
        line = <optimized out>
        callback = 0x7ff86b44a610 <isclog_error_callback>
        tresult = 0
        length = <optimized out>
        unknown = <optimized out>
#3  0x00007ff86b53066f in dns_sdb_putrr (lookup=0x7ff864072a80, type=<optimized out>, ttl=0, data=0x7ff864220808 "@")
    at ../../../lib/dns/sdb.c:374
        datalen = <optimized out>
        typeval = 2
        r = {base = 0x55f85d025c65 "ns", length = 2}
        lex = 0x7ff85c015a30
        result = <optimized out>
        p = 0x7ff8640030a0 "\003\061\061\063\003\061\060\060\aIN-ADDR\004ARPA"
        size = 1024
        mctx = 0x55f85df46080
        imp = <optimized out>
        origin = 0x7ff8642208a8
        b = {magic = 1114990113, base = 0x7ff864220808, length = 1, used = 1, current = 1, active = 0, link = {
            prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, mctx = 0x0, autore = false}
        rb = {magic = 1114990113, base = 0x7ff8640030a0, length = 1024, used = 22, current = 0, active = 0, link = {
            prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, mctx = 0x0, autore = false}
        failure = <optimized out>
#4  0x000055f85cfda93d in builtin_authority (zone=<optimized out>, dbdata=<optimized out>, lookup=0x7ff864072a80)
    at ../../../bin/named/builtin.c:461
        result = <optimized out>
        contact = <optimized out>
        server = 0x7ff864220808 "@"
        b = <optimized out>
#5  0x00007ff86b52c239 in findnodeext (db=0x7ff864220890, name=<optimized out>, create=<optimized out>, methods=<optimized out>, 
    clientinfo=<optimized out>, nodep=0x7ff869b22ac8) at ../../../lib/dns/sdb.c:829
        sdb = 0x7ff864220890
--Type <RET> for more, q to quit, c to continue without paging--
        node = 0x7ff864072a80
        result = 0
        b = {magic = 1114990113, base = 0x7ff869b22630, length = 1024, used = 2, current = 0, active = 0, link = {
            prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, mctx = 0x0, autore = false}
        namestr = "@\000\000\000\000\000\000\000a\305Rk\370\177", '\000' <repeats 11 times>, "\363nI}g\301\300\260&\262i\370\177\000\000\210\376\377\377\377\377\377\377\000\000\000\000\000\000\000\000\000\363nI}g\301\300p\b\"d\370\177\000\000\210\376\377\377\377\377\377\377\000\000\000\000\000\000\000\000\250\000\000\000\000\000\000\000\200,\ad\370\177\000\000\215\353^k\370\177\000\000\243\002\000\000\000\000\000\000\325\033\311j\370\177\000\000\200`\364]\370U\000\000}\337\064k\370\177\000\000\000\000\000\000\000\000\000\000\230`\364]\370U\000\000\200`\364]\370U\000\000}\337\064k\370\177\000\000\352L8k\370\177\000\000\230'\262i\370\177\000\000\060'\262i\370\177\000\000"...
        isorigin = true
        imp = 0x55f85dec0e40
        relname = {magic = 1145983854, ndata = 0x7ff864479ad0 "\003\061\061\063\003\061\060\060\aIN-ADDR\004ARPA", length = 0, 
          labels = 0, attributes = 0, offsets = 0x0, buffer = 0x0, link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, 
          list = {head = 0x0, tail = 0x0}}
        labels = <optimized out>
#6  0x00007ff86b5622a3 in zone_get_from_db (zone=zone@entry=0x7ff864478cc0, db=db@entry=0x7ff864220890, 
    nscount=nscount@entry=0x7ff869b22c30, soacount=soacount@entry=0x7ff869b22c34, soattl=soattl@entry=0x7ff869b22c14, 
    serial=serial@entry=0x7ff869b22c28, refresh=0x7ff869b22c24, retry=0x7ff869b22c20, expire=0x7ff869b22c1c, minimum=0x7ff869b22c18, 
    errors=0x7ff869b22c2c) at ../../../lib/dns/zone.c:5632
        result = <optimized out>
        answer = 0
        version = 0x7ff86b63a774 <dummy>
        node = 0x0
#7  0x00007ff86b562891 in zone_postload (zone=0x7ff864478cc0, db=0x7ff864220890, loadtime=..., result=<optimized out>)
    at ../../../lib/dns/zone.c:4891
        soacount = 0
        nscount = 0
        errors = 0
        serial = 0
        oldserial = 1773285968
        refresh = 0
        retry = 0
        expire = 0
        minimum = 0
        soattl = 0
        now = {seconds = 1643652943, nanoseconds = 618605591}
        needdump = false
        fixjournal = false
        hasinclude = false
        nomaster = <optimized out>
        had_db = false
        inc = <optimized out>
        is_dynamic = false
        done = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
#8  0x00007ff86b55bdd9 in zone_load (zone=0x7ff864478cc0, flags=0, locked=locked@entry=true) at ../../../lib/dns/zone.c:2333
        result = <optimized out>
        now = {seconds = 1643652943, nanoseconds = 618605591}
        loadtime = {seconds = 1643652943, nanoseconds = 618605591}
        db = 0x7ff864220890
        rbt = <optimized out>
        hasraw = <optimized out>
        is_dynamic = <optimized out>
#9  0x00007ff86b55c2f5 in zone_asyncload (task=0x55f85df45ea0, event=<optimized out>) at ../../../lib/dns/zone.c:2366
        asl = 0x7ff8640605b0
        zone = 0x7ff864478cc0
        result = <optimized out>
#10 0x00007ff86b37071d in task_run (task=0x55f85df45ea0) at ../../../lib/isc/task.c:850
        dispatch_count = 0
        finished = false
        event = 0x7ff8640605e0
        result = 0
        dispatch_count = <optimized out>
        finished = <optimized out>
        event = <optimized out>
        result = <optimized out>
        done = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        __v = <optimized out>
#11 isc_task_run (task=0x55f85df45ea0) at ../../../lib/isc/task.c:943
No locals.
#12 0x00007ff86b35bc09 in isc__nm_async_task (worker=0x55f85debd100, ev0=0x7ff864063de0)
    at netmgr/../../../../lib/isc/netmgr/netmgr.c:855
        ievent = 0x7ff864063de0
        result = <optimized out>
#13 process_netievent (worker=worker@entry=0x55f85debd100, ievent=0x7ff864063de0) at netmgr/../../../../lib/isc/netmgr/netmgr.c:940
No locals.
#14 0x00007ff86b35bd85 in process_queue (worker=worker@entry=0x55f85debd100, type=type@entry=NETIEVENT_TASK)
    at netmgr/../../../../lib/isc/netmgr/netmgr.c:1009
        stop = <optimized out>
        waiting = 17
        ievent = <optimized out>
#15 0x00007ff86b35c577 in process_all_queues (worker=0x55f85debd100) at netmgr/../../../../lib/isc/netmgr/netmgr.c:780
--Type <RET> for more, q to quit, c to continue without paging--
        result = <optimized out>
        type = 2
        reschedule = false
        reschedule = <optimized out>
        type = <optimized out>
        result = <optimized out>
#16 async_cb (handle=0x55f85debd460) at netmgr/../../../../lib/isc/netmgr/netmgr.c:809
        worker = 0x55f85debd100
#17 0x00007ff86b0f6b7d in uv__async_io (loop=0x55f85debd110, w=<optimized out>, events=<optimized out>) at src/unix/async.c:163
        buf = "\001", '\000' <repeats 967 times>...
        r = <optimized out>
        queue = {0x7ff869b245f0, 0x7ff869b245f0}
        q = 0x55f85debd4c8
        h = 0x55f85debd460
        __PRETTY_FUNCTION__ = <optimized out>
#18 0x00007ff86b112a7e in uv__io_poll (loop=0x55f85debd110, timeout=<optimized out>) at src/unix/epoll.c:374
        no_epoll_pwait = <optimized out>
        no_epoll_wait = <optimized out>
        events = {{events = 1, data = {ptr = 0xd, fd = 13, u32 = 13, u64 = 13}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, 
              u64 = 0}} <repeats 1018 times>, {events = 0, data = {ptr = 0x55f85debd110, fd = 1575735568, u32 = 1575735568, 
              u64 = 94525215985936}}, {events = 0, data = {ptr = 0x5debd2b000000000, fd = 0, u32 = 0, u64 = 6767734518410379264}}, {
            events = 22008, data = {ptr = 0x55f85debd290, fd = 1575735952, u32 = 1575735952, u64 = 94525215986320}}, {events = 1, 
            data = {ptr = 0x600000000, fd = 0, u32 = 0, u64 = 25769803776}}, {events = 0, data = {
              ptr = 0x7ff86accc27d <__GI___clock_gettime+29>, fd = 1791804029, u32 = 1791804029, u64 = 140704920420989}}}
        pe = 0x7ff869b24b20
        e = {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}
        real_timeout = <optimized out>
        q = <optimized out>
        w = 0x55f85debd2d8
        sigset = {__val = {0 <repeats 16 times>}}
        sigmask = <optimized out>
        base = <optimized out>
        have_signals = 0
        nevents = 0
        count = <optimized out>
        nfds = <optimized out>
        fd = <optimized out>
        op = <optimized out>
        i = 0
        user_timeout = <optimized out>
        reset_timeout = <optimized out>
        max_safe_timeout = <optimized out>
        no_epoll_pwait_cached = <optimized out>
        no_epoll_wait_cached = <optimized out>
        __PRETTY_FUNCTION__ = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
#19 0x00007ff86b0fc618 in uv__io_poll (timeout=<optimized out>, loop=0x55f85debd110) at src/unix/udp.c:122
        events = <optimized out>
        real_timeout = <optimized out>
        q = <optimized out>
        sigmask = <optimized out>
        no_epoll_pwait = <optimized out>
        w = <optimized out>
        nevents = <optimized out>
        fd = <optimized out>
        op = <optimized out>
        pe = <optimized out>
        user_timeout = <optimized out>
        no_epoll_wait = <optimized out>
        sigset = <optimized out>
        i = <optimized out>
        e = <optimized out>
        base = <optimized out>
        have_signals = <optimized out>
        count = <optimized out>
        nfds = <optimized out>
        reset_timeout = <optimized out>
        update_timeout = <optimized out>
        max_safe_timeout = <optimized out>
        no_epoll_pwait_cached = <optimized out>
        no_epoll_wait_cached = <optimized out>
        no_epoll_pwait = <optimized out>
        no_epoll_wait = <optimized out>
        events = <optimized out>
        pe = <optimized out>
        e = <optimized out>
        real_timeout = <optimized out>
        q = <optimized out>
        w = <optimized out>
        sigset = <optimized out>
        sigmask = <optimized out>
        base = <optimized out>
        have_signals = <optimized out>
        nevents = <optimized out>
        count = <optimized out>
        nfds = <optimized out>
        fd = <optimized out>
        op = <optimized out>
        i = <optimized out>
        user_timeout = <optimized out>
        reset_timeout = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        __PRETTY_FUNCTION__ = <optimized out>
        update_timeout = <optimized out>
        _saved_errno = <optimized out>
        x = <optimized out>
#20 uv_run (loop=loop@entry=0x55f85debd110, mode=mode@entry=UV_RUN_DEFAULT) at src/unix/core.c:389
        timeout = <optimized out>
        r = <optimized out>
        ran_pending = <optimized out>
#21 0x00007ff86b35be3b in nm_thread (worker0=0x55f85debd100) at netmgr/../../../../lib/isc/netmgr/netmgr.c:715
        r = 1
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        worker = 0x55f85debd100
        mgr = 0x55f85debcfb0
#22 0x00007ff86b36e5da in isc__trampoline_run (arg=0x55f85deb0060) at ../../../lib/isc/trampoline.c:198
        trampoline = 0x55f85deb0060
        result = <optimized out>
#23 0x00007ff86ac81a87 in start_thread (arg=<optimized out>) at pthread_create.c:435
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140704901924416, -3602452523273357970, 140704901924416, 1, 140704920115184, 0, 
                3598754200031655278, 3598746725591348590}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#24 0x00007ff86ad06640 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Comment 6 Alexander Bokovoy 2022-02-01 08:52:58 UTC
What I don't understand is how those commits could be responsible if v9_16 branch does not have them -- they only appeared in v9_17 and later.

Comment 7 Petr Menšík 2022-02-01 10:59:13 UTC
(In reply to Alexander Bokovoy from comment #6)
> What I don't understand is how those commits could be responsible if v9_16
> branch does not have them -- they only appeared in v9_17 and later.

Oh, correct. Only commit that references it is present, not the refactoring itself.
There are only two commits touching tasks subsystem in 9.16.25:
https://gitlab.isc.org/isc-projects/bind9/commit/9ec7d78d164991b668cd50371ffe8f9b7b4b5ac3
https://gitlab.isc.org/isc-projects/bind9/commit/5be356760dd6855944234980f2fc0f13130267fb
Which were merged in:
https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5698

Another candidate might be memory pools changes, which also removed lock support.
https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5637

Comment 9 Petr Menšík 2022-02-02 14:33:09 UTC
Prepared candidate MR on upstream [1], which would allow additional threads to be used with hazard pointers implementation. It would initialize just first half of possible threads, previous maximum of initialized threads. bind-dyndb-ldap would have to call isc_nm_thread_check() on used thread to have isc_hp_* functions ready. It is intended to be run from thread initializer just single time, instead of checking this every time atomic should be used.

1. https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5795

Comment 10 Petr Menšík 2022-02-02 14:49:00 UTC
Core of detected issue is with unbound tid() identifier in isc_hp_init(4 * workers);

It initializes only 4 threads, meaning ids 0-3. syncrepl thread is not spawned by task manager itself, so this is different thread. it has isc_tid_v == 4, which reaches behind initialized pointers. It is a bit surprising there does not exist any REQUIRE() check ensuring it would fail in controlled manner. So we crash by using uninitialized memory as pointer.

Comment 11 Ben Cotton 2022-02-08 20:19:55 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 36 development cycle.
Changing version to 36.

Comment 12 Petr Menšík 2022-02-11 16:35:34 UTC
I have built testing build in copr [1].

It requires modification of /etc/sysconfig/named:
OPTIONS=" -E pkcs11 -H 200"

Then systemctl restart ipa passed. It seems to be possible way, but requires also modification of freeipa package to autodetect -H parameter support. Then add new parameter to named options. Unfortunately this cannot be done just in bind-dyndb-ldap, which usually is the only dependency build together with bind. Not yet accepted or refused by upstream.

1. https://copr.fedorainfracloud.org/coprs/pemensik/bind-9.16/build/3440794/

Comment 13 Fedora Update System 2022-03-18 16:13:28 UTC
FEDORA-2022-14e36aac0c has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-14e36aac0c

Comment 14 Fedora Update System 2022-03-18 17:09:58 UTC
FEDORA-2022-427cfc50f8 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2022-427cfc50f8

Comment 15 Fedora Update System 2022-03-18 17:11:00 UTC
FEDORA-2022-042d9c6146 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2022-042d9c6146

Comment 16 Fedora Update System 2022-03-19 19:27:09 UTC
FEDORA-2022-14e36aac0c has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-14e36aac0c`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-14e36aac0c

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Fedora Update System 2022-03-19 22:54:50 UTC
FEDORA-2022-042d9c6146 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-042d9c6146`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-042d9c6146

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 18 Fedora Update System 2022-03-19 23:16:42 UTC
FEDORA-2022-427cfc50f8 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-427cfc50f8`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-427cfc50f8

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 19 Fedora Update System 2022-03-22 03:42:24 UTC
FEDORA-2022-427cfc50f8 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 20 Fedora Update System 2022-03-26 15:25:20 UTC
FEDORA-2022-14e36aac0c has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 21 Fedora Update System 2022-03-27 01:40:29 UTC
FEDORA-2022-042d9c6146 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.