Bug 2048235 - freeipa: bind coredumps with systemctl restart named
Summary: freeipa: bind coredumps with systemctl restart named
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: bind
Version: 36
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Petr Menšík
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 2057493
TreeView+ depends on / blocked
 
Reported: 2022-01-30 12:35 UTC by Florence Blanc-Renaud
Modified: 2022-03-27 01:40 UTC (History)
9 users (show)

Fixed In Version: bind-9.16.27-1.fc35 bind-9.16.27-1.fc36 bind-9.16.27-1.fc34
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-22 03:42:24 UTC
Type: Bug


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Fedora Pagure freeipa issue 9102 0 None None None 2022-01-31 18:05:57 UTC
Internet Systems Consortium (ISC) isc-projects bind9 merge_requests 5637 0 None None None 2022-02-01 10:59:13 UTC
Internet Systems Consortium (ISC) isc-projects bind9 merge_requests 5698 0 None None None 2022-02-01 10:59:13 UTC
Internet Systems Consortium (ISC) isc-projects bind9 merge_requests 5795 0 None None None 2022-02-02 14:33:09 UTC

Description Florence Blanc-Renaud 2022-01-30 12:35:21 UTC
Description of problem:
With the update to bind 9.16.25-2 and bind-dyndb-ldap 11.9-13, bind coredumps in my IPA server.


Version-Release number of selected component (if applicable):
freeipa-server-4.10.0.dev-0.fc36.x86_64
bind-9.16.25-2.fc36.x86_64
bind-dyndb-ldap-11.9-13.fc36.x86_64

How reproducible:
90% of the time

Steps to Reproduce:
1. install freeipa-server-dns package on rawhide
2. configure the server with ipa-server-install --domain ipa.test --realm IPA.TEST --setup-dns --auto-forwarders -a Secret123 -p Secret123 -U
3. restart named with systemctl restart named.service

Actual results:
Sometimes the coredump happens during ipa-server-install (in an internal step restarting all the services with "ipactl restart"), sometimes the coredump happens with "systemctl restart named.service"

Expected results:
The installation and restart should work without any problem.

Additional info:

The issue has already been reported in IPA project, https://pagure.io/freeipa/issue/9102.

Comment 2 Petr Menšík 2022-01-31 13:09:58 UTC
Were able to reproduce it on rawhide. It even crashed gdb partially.

#0  0x00007feed4d94796 in isc_hp_protect (hp=0x560ad4ccc1f0, ihp=<optimized out>, atom=0x560ad4cd2680) at ../../../lib/isc/hp.c:158
        __atomic_store_ptr = <optimized out>
        __atomic_store_tmp = 94604519898192
        n = <optimized out>
        ret = 94604519898192
#1  0x00007feed4dc022a in isc_queue_enqueue (item=<optimized out>, queue=<optimized out>) at ../../../lib/isc/queue.c:142
        lt = 0x0
        idx = <optimized out>
        n = 0
#2  isc_queue_enqueue (queue=0x560ad4cd2600, item=140663467457264) at ../../../lib/isc/queue.c:134
        lt = <optimized out>
        idx = <optimized out>
        n = <optimized out>
        lnext = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        newnode = <optimized out>
        __atomic_compare_exchange_ptr = <optimized out>
        __atomic_compare_exchange_tmp = <optimized out>
#3  0x00007feed4dab8d5 in isc__nm_enqueue_ievent (worker=0x560ad4ccc710, event=<optimized out>)
    at netmgr/../../../../lib/isc/netmgr/netmgr.c:1116
No locals.
#4  0x00007feed4dc8972 in task_ready (task=0x560ad4cd5490) at ../../../lib/isc/task.c:344
        manager = 0x560ad4cd3890
        manager = <optimized out>
        __v = <optimized out>
#5  isc_task_sendto (task=task@entry=0x560ad4cd5490, eventp=eventp@entry=0x7feed11b5118, c=c@entry=-1) at ../../../lib/isc/task.c:493
        was_idle = <optimized out>
#6  0x00007feed4dc8a6e in isc_task_send (task=task@entry=0x560ad4cd5490, eventp=eventp@entry=0x7feed11b5118)
    at ../../../lib/isc/task.c:449
No locals.
#7  0x00007feed139b988 in sync_event_send (synchronous=true, ev=0x7feed11b5118, task=0x560ad4cd5490, sctx=0x7feecc416b10)
    at /usr/src/debug/bind-dyndb-ldap-11.9-13.fc36.x86_64/src/syncrepl.c:614
        result = <optimized out>
        abs_timeout = {seconds = 3426842816, nanoseconds = 32750}
        seqid = 3
        cleanup = <optimized out>
        result = <optimized out>
        abs_timeout = <optimized out>
        seqid = <optimized out>
        cleanup = <optimized out>
#8  syncrepl_update (inst=inst@entry=0x7feecc416a00, entryp=entryp@entry=0x7feed11b51c8, chgtype=chgtype@entry=1)
    at /usr/src/debug/bind-dyndb-ldap-11.9-13.fc36.x86_64/src/ldap_helper.c:4231
--Type <RET> for more, q to quit, c to continue without paging--
        result = <optimized out>
        pevent = 0x0
        entry = 0x7feec4009160
        zone_name = <optimized out>
        zone_ptr = 0x0
        action = <optimized out>
        task = 0x560ad4cd5490
        synchronous = true
        cleanup = <optimized out>
        __func__ = <optimized out>
#9  0x00007feed139c695 in ldap_sync_search_entry (ls=<optimized out>, msg=<optimized out>, entryUUID=0x7feed11b5330, 
    phase=LDAP_SYNC_CAPI_ADD) at /usr/src/debug/bind-dyndb-ldap-11.9-13.fc36.x86_64/src/ldap_helper.c:4409
        inst = <optimized out>
        old_entry = 0x0
        new_entry = 0x7feec4009160
        result = 0
        node = 0x0
        mldap_open = false
        modrdn = <optimized out>
        cleanup = <optimized out>
        __func__ = "ldap_sync_search_entry"
#10 0x00007feed135e361 in ldap_sync_search_entry (ls=ls@entry=0x7feec40019a0, res=res@entry=0x7feec4009090)
    at /usr/src/debug/openldap-2.4.59-6.fc36.x86_64/openldap-2.4.59/libraries/libldap_r/ldap_sync.c:225
        ctrls = 0x7feec4004ec0
        rc = 80
        i = <optimized out>
        ber = 0x7feec4009100
        entryUUID = {bv_len = 16, bv_val = 0x7feec400ba67 "\363\222݊\202\221\021\354\224|\274\233\264i\003\231"}
        cookie = {bv_len = 0, bv_val = 0x0}
        state = 1
        len = 0
        phase = <optimized out>
        __PRETTY_FUNCTION__ = "ldap_sync_search_entry"
#11 0x00007feed135f0bb in ldap_sync_init (ls=ls@entry=0x7feec40019a0, mode=mode@entry=3)
    at /usr/src/debug/openldap-2.4.59-6.fc36.x86_64/openldap-2.4.59/libraries/libldap_r/ldap_sync.c:790
        refreshDone = 0
        msg = 0x7feec4009090
        ctrl = {ldctl_oid = 0x7feed1365946 "1.3.6.1.4.1.4203.1.9.1.1", ldctl_value = {bv_len = 8, 
            bv_val = 0x7feec4009a80 "0\006\n\001\003\001\001"}, ldctl_iscritical = 1 '\001'}
        ctrls = {0x7feed11b53b0, 0x0}
        ber = 0x7feec4002210
        rc = <optimized out>
        tv = {tv_sec = 0, tv_usec = 100000}
        tvp = <optimized out>
        res = 0x7feec4009090
--Type <RET> for more, q to quit, c to continue without paging--
        __PRETTY_FUNCTION__ = "ldap_sync_init"
#12 0x00007feed139d45d in ldap_sync_doit (inst=inst@entry=0x7feecc416a00, conn=conn@entry=0x7feecc418430, 
    filter_objcs=filter_objcs@entry=0x7feed13aa778 "(|(objectClass=idnsZone)  (objectClass=idnsForwardZone)  (objectClass=idnsRecord))", mode=mode@entry=3) at /usr/src/debug/bind-dyndb-ldap-11.9-13.fc36.x86_64/src/ldap_helper.c:4693
        result = 0
        ret = <optimized out>
        s_len = <optimized out>
        ldap_sync = 0x7feec40019a0
        err_hint = 0x7feed13a5fb8 ""
        filter = "(|  (objectClass=idnsConfigObject)    (&(objectClass=idnsServerConfigObject)    (idnsServerId=ci-vm-10-0-137-13.hosted.upshift.rdu2.redhat.com))(|(objectClass=idnsZone)  (objectClass=idnsForwardZone)  "...
        config_template = "(|  (objectClass=idnsConfigObject)  %s%s%s%s)"
        server_id = 0x7feecc419c28 "ci-vm-10-0-137-13.hosted.upshift.rdu2.redhat.com"
        __func__ = <optimized out>
#13 0x00007feed139d8e6 in ldap_syncrepl_watcher (arg=0x7feecc416a00)
    at /usr/src/debug/bind-dyndb-ldap-11.9-13.fc36.x86_64/src/ldap_helper.c:4785
        inst = 0x7feecc416a00
        conn = 0x7feecc418430
        ret = <optimized out>
        result = <optimized out>
        sigset = {__val = {512, 0 <repeats 15 times>}}
        reconnect_interval = 0
        state = sync_datainit
        retry = <optimized out>
        __func__ = <optimized out>
#14 0x00007feed4dc7ef9 in isc__trampoline_run (arg=0x7feecc469430) at ../../../lib/isc/trampoline.c:198
        trampoline = 0x7feecc469430
        result = <optimized out>
#15 0x00007feed45ad862 in start_thread (arg=<optimized out>) at pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140663723370656, 1579087791515376470, 140663687177792, 0, 140663741666704, 0, 
                -1569599249955546282, -1569606384089687210}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#16 0x00007feed4632670 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I cannot say it is obvious, what is wrong here.

(gdb) disassemble 
Dump of assembler code for function isc_hp_protect:
   0x00007feed4d94750 <+0>:	endbr64 
   0x00007feed4d94754 <+4>:	push   %r13
   0x00007feed4d94756 <+6>:	movslq %esi,%rsi
   0x00007feed4d94759 <+9>:	mov    %rdi,%r13
   0x00007feed4d9475c <+12>:	xor    %eax,%eax
   0x00007feed4d9475e <+14>:	push   %r12
   0x00007feed4d94760 <+16>:	mov    %rdx,%r12
   0x00007feed4d94763 <+19>:	push   %rbp
   0x00007feed4d94764 <+20>:	lea    0x0(,%rsi,8),%rbp
   0x00007feed4d9476c <+28>:	push   %rbx
   0x00007feed4d9476d <+29>:	sub    $0x8,%rsp
   0x00007feed4d94771 <+33>:	jmp    0x7feed4d9479d <isc_hp_protect+77>
   0x00007feed4d94773 <+35>:	nopl   0x0(%rax,%rax,1)
   0x00007feed4d94778 <+40>:	data16 lea 0x6b7b8(%rip),%rdi        # 0x7feed4dfff38
   0x00007feed4d94780 <+48>:	data16 data16 rex.W call 0x7feed4d8f160 <__tls_get_addr@plt>
   0x00007feed4d94788 <+56>:	mov    %rbx,%rcx
   0x00007feed4d9478b <+59>:	movslq (%rax),%rdx
   0x00007feed4d9478e <+62>:	mov    0x10(%r13),%rax
   0x00007feed4d94792 <+66>:	mov    (%rax,%rdx,8),%rax
=> 0x00007feed4d94796 <+70>:	xchg   %rcx,(%rax,%rbp,1)
   0x00007feed4d9479a <+74>:	mov    %rbx,%rax
   0x00007feed4d9479d <+77>:	mov    (%r12),%rbx
   0x00007feed4d947a1 <+81>:	cmp    %rax,%rbx
   0x00007feed4d947a4 <+84>:	jne    0x7feed4d94778 <isc_hp_protect+40>
   0x00007feed4d947a6 <+86>:	add    $0x8,%rsp
   0x00007feed4d947aa <+90>:	pop    %rbx
   0x00007feed4d947ab <+91>:	pop    %rbp
   0x00007feed4d947ac <+92>:	pop    %r12
   0x00007feed4d947ae <+94>:	pop    %r13
   0x00007feed4d947b0 <+96>:	ret    
End of assembler dump.
(gdb) info registers
rax            0xbe                190
rbx            0x560ad4cd7c50      94604519898192
rcx            0x560ad4cd7c50      94604519898192
rdx            0x4                 4
rsi            0x0                 0
rdi            0x7feed4dfff38      140663750393656
rbp            0x0                 0x0
rsp            0x7feed11b5020      0x7feed11b5020
r8             0x0                 0
r9             0x0                 0
r10            0x7feec402be40      140663467458112
r11            0x7feec4000030      140663467278384
r12            0x560ad4cd2680      94604519876224
r13            0x560ad4ccc1f0      94604519850480
r14            0x7feec4009160      140663467315552
r15            0x7feecc416b10      140663605783312
rip            0x7feed4d94796      0x7feed4d94796 <isc_hp_protect+70>
eflags         0x10202             [ IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
k0             0x0                 0
k1             0xf                 15
k2             0xff                255
k3             0x20                32
k4             0x0                 0
k5             0x0                 0
k6             0x0                 0
k7             0x0                 0

It seems some variable is uninitialized 0xbe seems uninitialized constant, rbp is 0

Could be some hp structure uninitialized properly in ldap worker thread?

Comment 3 Petr Menšík 2022-01-31 17:46:41 UTC
Might be caused by recent modification of taskmgr [1]. It is well handled by named, but might not be handled accordingly by bind-dyndb-ldap plugin. It is possible change is required. Mentioned also in later commit 5be356760 [2].

1. https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/4918
2. https://gitlab.isc.org/isc-projects/bind9/commit/5be356760dd6855944234980f2fc0f13130267fb

Comment 4 Petr Menšík 2022-01-31 18:28:49 UTC
I tried copr build [1] on f35 release. It crashes the same way. That means code changes are responsible, not gcc and libraries changes as I thought originally. Comment #3 lists primary candidates. It seems bind-dyndb-ldap needs to be fixed to work with modified code.

Comment 5 Petr Menšík 2022-01-31 23:35:47 UTC
This happens when another thread is loading a zone. Maybe exclusive task semantics has changed? It seems some locks are not at proper places.

(gdb) info threads
  Id   Target Id                         Frame 
  1    Thread 0x7ff863bfe640 (LWP 10357) 0x00007ff86b33a886 in isc_hp_protect (hp=0x55f85dec4c60, ihp=<optimized out>, 
    atom=0x55f85dec3180) at ../../../lib/isc/hp.c:158
  2    Thread 0x7ff86a803dc0 (LWP 10353) 0x00007ff86ac3743a in __GI___sigtimedwait (set=set@entry=0x7ffcf7859e00, 
    info=info@entry=0x7ffcf7859d30, timeout=timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:61
  3    Thread 0x7ff869327640 (LWP 10355) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7ff869326b00, op=393, 
    expected=0, futex_word=0x55f85debdf98) at futex-internal.c:57
  4    Thread 0x7ff868b26640 (LWP 10356) 0x00007ff86ad05c1e in epoll_wait (epfd=16, events=events@entry=0x55f85df38750, 
    maxevents=maxevents@entry=2048, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
* 5    Thread 0x7ff869b28640 (LWP 10354) isc_lex_isfile (lex=lex@entry=0x7ff85c015a30) at ../../../lib/isc/lex.c:1104


(gdb) bt full
#0  isc_lex_isfile (lex=lex@entry=0x7ff85c015a30) at ../../../lib/isc/lex.c:1104
        source = <optimized out>
#1  0x00007ff86b4f79bf in fromtext_warneof (callbacks=0x7ff864072ac8, lexer=0x7ff85c015a30) at ../../../lib/dns/rdata.c:2080
No locals.
#2  dns_rdata_fromtext (rdata=0x0, rdclass=<optimized out>, type=<optimized out>, lexer=0x7ff85c015a30, origin=<optimized out>, 
    options=<optimized out>, mctx=0x55f85df46080, target=0x7ff869b22480, callbacks=0x7ff864072ac8) at ../../../lib/dns/rdata.c:1051
        result = 0
        region = {base = 0x302d7265444e536e <error: Cannot access memory at address 0x302d7265444e536e>, length = 1677734048}
        st = <optimized out>
        token = {type = isc_tokentype_eof, value = {as_char = 48 '0', as_ulong = 140704806808112, as_region = {
              base = 0x7ff864072a30 "", length = 1}, as_textregion = {base = 0x7ff864072a30 "", length = 1}, 
            as_pointer = 0x7ff864072a30}}
        lexoptions = 291
        name = 0x7ff864220978 "buffer-0x7ff869b224c0"
        line = <optimized out>
        callback = 0x7ff86b44a610 <isclog_error_callback>
        tresult = 0
        length = <optimized out>
        unknown = <optimized out>
#3  0x00007ff86b53066f in dns_sdb_putrr (lookup=0x7ff864072a80, type=<optimized out>, ttl=0, data=0x7ff864220808 "@")
    at ../../../lib/dns/sdb.c:374
        datalen = <optimized out>
        typeval = 2
        r = {base = 0x55f85d025c65 "ns", length = 2}
        lex = 0x7ff85c015a30
        result = <optimized out>
        p = 0x7ff8640030a0 "\003\061\061\063\003\061\060\060\aIN-ADDR\004ARPA"
        size = 1024
        mctx = 0x55f85df46080
        imp = <optimized out>
        origin = 0x7ff8642208a8
        b = {magic = 1114990113, base = 0x7ff864220808, length = 1, used = 1, current = 1, active = 0, link = {
            prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, mctx = 0x0, autore = false}
        rb = {magic = 1114990113, base = 0x7ff8640030a0, length = 1024, used = 22, current = 0, active = 0, link = {
            prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, mctx = 0x0, autore = false}
        failure = <optimized out>
#4  0x000055f85cfda93d in builtin_authority (zone=<optimized out>, dbdata=<optimized out>, lookup=0x7ff864072a80)
    at ../../../bin/named/builtin.c:461
        result = <optimized out>
        contact = <optimized out>
        server = 0x7ff864220808 "@"
        b = <optimized out>
#5  0x00007ff86b52c239 in findnodeext (db=0x7ff864220890, name=<optimized out>, create=<optimized out>, methods=<optimized out>, 
    clientinfo=<optimized out>, nodep=0x7ff869b22ac8) at ../../../lib/dns/sdb.c:829
        sdb = 0x7ff864220890
--Type <RET> for more, q to quit, c to continue without paging--
        node = 0x7ff864072a80
        result = 0
        b = {magic = 1114990113, base = 0x7ff869b22630, length = 1024, used = 2, current = 0, active = 0, link = {
            prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, mctx = 0x0, autore = false}
        namestr = "@\000\000\000\000\000\000\000a\305Rk\370\177", '\000' <repeats 11 times>, "\363nI}g\301\300\260&\262i\370\177\000\000\210\376\377\377\377\377\377\377\000\000\000\000\000\000\000\000\000\363nI}g\301\300p\b\"d\370\177\000\000\210\376\377\377\377\377\377\377\000\000\000\000\000\000\000\000\250\000\000\000\000\000\000\000\200,\ad\370\177\000\000\215\353^k\370\177\000\000\243\002\000\000\000\000\000\000\325\033\311j\370\177\000\000\200`\364]\370U\000\000}\337\064k\370\177\000\000\000\000\000\000\000\000\000\000\230`\364]\370U\000\000\200`\364]\370U\000\000}\337\064k\370\177\000\000\352L8k\370\177\000\000\230'\262i\370\177\000\000\060'\262i\370\177\000\000"...
        isorigin = true
        imp = 0x55f85dec0e40
        relname = {magic = 1145983854, ndata = 0x7ff864479ad0 "\003\061\061\063\003\061\060\060\aIN-ADDR\004ARPA", length = 0, 
          labels = 0, attributes = 0, offsets = 0x0, buffer = 0x0, link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, 
          list = {head = 0x0, tail = 0x0}}
        labels = <optimized out>
#6  0x00007ff86b5622a3 in zone_get_from_db (zone=zone@entry=0x7ff864478cc0, db=db@entry=0x7ff864220890, 
    nscount=nscount@entry=0x7ff869b22c30, soacount=soacount@entry=0x7ff869b22c34, soattl=soattl@entry=0x7ff869b22c14, 
    serial=serial@entry=0x7ff869b22c28, refresh=0x7ff869b22c24, retry=0x7ff869b22c20, expire=0x7ff869b22c1c, minimum=0x7ff869b22c18, 
    errors=0x7ff869b22c2c) at ../../../lib/dns/zone.c:5632
        result = <optimized out>
        answer = 0
        version = 0x7ff86b63a774 <dummy>
        node = 0x0
#7  0x00007ff86b562891 in zone_postload (zone=0x7ff864478cc0, db=0x7ff864220890, loadtime=..., result=<optimized out>)
    at ../../../lib/dns/zone.c:4891
        soacount = 0
        nscount = 0
        errors = 0
        serial = 0
        oldserial = 1773285968
        refresh = 0
        retry = 0
        expire = 0
        minimum = 0
        soattl = 0
        now = {seconds = 1643652943, nanoseconds = 618605591}
        needdump = false
        fixjournal = false
        hasinclude = false
        nomaster = <optimized out>
        had_db = false
        inc = <optimized out>
        is_dynamic = false
        done = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
#8  0x00007ff86b55bdd9 in zone_load (zone=0x7ff864478cc0, flags=0, locked=locked@entry=true) at ../../../lib/dns/zone.c:2333
        result = <optimized out>
        now = {seconds = 1643652943, nanoseconds = 618605591}
        loadtime = {seconds = 1643652943, nanoseconds = 618605591}
        db = 0x7ff864220890
        rbt = <optimized out>
        hasraw = <optimized out>
        is_dynamic = <optimized out>
#9  0x00007ff86b55c2f5 in zone_asyncload (task=0x55f85df45ea0, event=<optimized out>) at ../../../lib/dns/zone.c:2366
        asl = 0x7ff8640605b0
        zone = 0x7ff864478cc0
        result = <optimized out>
#10 0x00007ff86b37071d in task_run (task=0x55f85df45ea0) at ../../../lib/isc/task.c:850
        dispatch_count = 0
        finished = false
        event = 0x7ff8640605e0
        result = 0
        dispatch_count = <optimized out>
        finished = <optimized out>
        event = <optimized out>
        result = <optimized out>
        done = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        __v = <optimized out>
#11 isc_task_run (task=0x55f85df45ea0) at ../../../lib/isc/task.c:943
No locals.
#12 0x00007ff86b35bc09 in isc__nm_async_task (worker=0x55f85debd100, ev0=0x7ff864063de0)
    at netmgr/../../../../lib/isc/netmgr/netmgr.c:855
        ievent = 0x7ff864063de0
        result = <optimized out>
#13 process_netievent (worker=worker@entry=0x55f85debd100, ievent=0x7ff864063de0) at netmgr/../../../../lib/isc/netmgr/netmgr.c:940
No locals.
#14 0x00007ff86b35bd85 in process_queue (worker=worker@entry=0x55f85debd100, type=type@entry=NETIEVENT_TASK)
    at netmgr/../../../../lib/isc/netmgr/netmgr.c:1009
        stop = <optimized out>
        waiting = 17
        ievent = <optimized out>
#15 0x00007ff86b35c577 in process_all_queues (worker=0x55f85debd100) at netmgr/../../../../lib/isc/netmgr/netmgr.c:780
--Type <RET> for more, q to quit, c to continue without paging--
        result = <optimized out>
        type = 2
        reschedule = false
        reschedule = <optimized out>
        type = <optimized out>
        result = <optimized out>
#16 async_cb (handle=0x55f85debd460) at netmgr/../../../../lib/isc/netmgr/netmgr.c:809
        worker = 0x55f85debd100
#17 0x00007ff86b0f6b7d in uv__async_io (loop=0x55f85debd110, w=<optimized out>, events=<optimized out>) at src/unix/async.c:163
        buf = "\001", '\000' <repeats 967 times>...
        r = <optimized out>
        queue = {0x7ff869b245f0, 0x7ff869b245f0}
        q = 0x55f85debd4c8
        h = 0x55f85debd460
        __PRETTY_FUNCTION__ = <optimized out>
#18 0x00007ff86b112a7e in uv__io_poll (loop=0x55f85debd110, timeout=<optimized out>) at src/unix/epoll.c:374
        no_epoll_pwait = <optimized out>
        no_epoll_wait = <optimized out>
        events = {{events = 1, data = {ptr = 0xd, fd = 13, u32 = 13, u64 = 13}}, {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, 
              u64 = 0}} <repeats 1018 times>, {events = 0, data = {ptr = 0x55f85debd110, fd = 1575735568, u32 = 1575735568, 
              u64 = 94525215985936}}, {events = 0, data = {ptr = 0x5debd2b000000000, fd = 0, u32 = 0, u64 = 6767734518410379264}}, {
            events = 22008, data = {ptr = 0x55f85debd290, fd = 1575735952, u32 = 1575735952, u64 = 94525215986320}}, {events = 1, 
            data = {ptr = 0x600000000, fd = 0, u32 = 0, u64 = 25769803776}}, {events = 0, data = {
              ptr = 0x7ff86accc27d <__GI___clock_gettime+29>, fd = 1791804029, u32 = 1791804029, u64 = 140704920420989}}}
        pe = 0x7ff869b24b20
        e = {events = 0, data = {ptr = 0x0, fd = 0, u32 = 0, u64 = 0}}
        real_timeout = <optimized out>
        q = <optimized out>
        w = 0x55f85debd2d8
        sigset = {__val = {0 <repeats 16 times>}}
        sigmask = <optimized out>
        base = <optimized out>
        have_signals = 0
        nevents = 0
        count = <optimized out>
        nfds = <optimized out>
        fd = <optimized out>
        op = <optimized out>
        i = 0
        user_timeout = <optimized out>
        reset_timeout = <optimized out>
        max_safe_timeout = <optimized out>
        no_epoll_pwait_cached = <optimized out>
        no_epoll_wait_cached = <optimized out>
        __PRETTY_FUNCTION__ = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
#19 0x00007ff86b0fc618 in uv__io_poll (timeout=<optimized out>, loop=0x55f85debd110) at src/unix/udp.c:122
        events = <optimized out>
        real_timeout = <optimized out>
        q = <optimized out>
        sigmask = <optimized out>
        no_epoll_pwait = <optimized out>
        w = <optimized out>
        nevents = <optimized out>
        fd = <optimized out>
        op = <optimized out>
        pe = <optimized out>
        user_timeout = <optimized out>
        no_epoll_wait = <optimized out>
        sigset = <optimized out>
        i = <optimized out>
        e = <optimized out>
        base = <optimized out>
        have_signals = <optimized out>
        count = <optimized out>
        nfds = <optimized out>
        reset_timeout = <optimized out>
        update_timeout = <optimized out>
        max_safe_timeout = <optimized out>
        no_epoll_pwait_cached = <optimized out>
        no_epoll_wait_cached = <optimized out>
        no_epoll_pwait = <optimized out>
        no_epoll_wait = <optimized out>
        events = <optimized out>
        pe = <optimized out>
        e = <optimized out>
        real_timeout = <optimized out>
        q = <optimized out>
        w = <optimized out>
        sigset = <optimized out>
        sigmask = <optimized out>
        base = <optimized out>
        have_signals = <optimized out>
        nevents = <optimized out>
        count = <optimized out>
        nfds = <optimized out>
        fd = <optimized out>
        op = <optimized out>
        i = <optimized out>
        user_timeout = <optimized out>
        reset_timeout = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        __PRETTY_FUNCTION__ = <optimized out>
        update_timeout = <optimized out>
        _saved_errno = <optimized out>
        x = <optimized out>
#20 uv_run (loop=loop@entry=0x55f85debd110, mode=mode@entry=UV_RUN_DEFAULT) at src/unix/core.c:389
        timeout = <optimized out>
        r = <optimized out>
        ran_pending = <optimized out>
#21 0x00007ff86b35be3b in nm_thread (worker0=0x55f85debd100) at netmgr/../../../../lib/isc/netmgr/netmgr.c:715
        r = 1
        __atomic_load_ptr = <optimized out>
        __atomic_load_tmp = <optimized out>
        worker = 0x55f85debd100
        mgr = 0x55f85debcfb0
#22 0x00007ff86b36e5da in isc__trampoline_run (arg=0x55f85deb0060) at ../../../lib/isc/trampoline.c:198
        trampoline = 0x55f85deb0060
        result = <optimized out>
#23 0x00007ff86ac81a87 in start_thread (arg=<optimized out>) at pthread_create.c:435
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140704901924416, -3602452523273357970, 140704901924416, 1, 140704920115184, 0, 
                3598754200031655278, 3598746725591348590}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#24 0x00007ff86ad06640 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Comment 6 Alexander Bokovoy 2022-02-01 08:52:58 UTC
What I don't understand is how those commits could be responsible if v9_16 branch does not have them -- they only appeared in v9_17 and later.

Comment 7 Petr Menšík 2022-02-01 10:59:13 UTC
(In reply to Alexander Bokovoy from comment #6)
> What I don't understand is how those commits could be responsible if v9_16
> branch does not have them -- they only appeared in v9_17 and later.

Oh, correct. Only commit that references it is present, not the refactoring itself.
There are only two commits touching tasks subsystem in 9.16.25:
https://gitlab.isc.org/isc-projects/bind9/commit/9ec7d78d164991b668cd50371ffe8f9b7b4b5ac3
https://gitlab.isc.org/isc-projects/bind9/commit/5be356760dd6855944234980f2fc0f13130267fb
Which were merged in:
https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5698

Another candidate might be memory pools changes, which also removed lock support.
https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5637

Comment 9 Petr Menšík 2022-02-02 14:33:09 UTC
Prepared candidate MR on upstream [1], which would allow additional threads to be used with hazard pointers implementation. It would initialize just first half of possible threads, previous maximum of initialized threads. bind-dyndb-ldap would have to call isc_nm_thread_check() on used thread to have isc_hp_* functions ready. It is intended to be run from thread initializer just single time, instead of checking this every time atomic should be used.

1. https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/5795

Comment 10 Petr Menšík 2022-02-02 14:49:00 UTC
Core of detected issue is with unbound tid() identifier in isc_hp_init(4 * workers);

It initializes only 4 threads, meaning ids 0-3. syncrepl thread is not spawned by task manager itself, so this is different thread. it has isc_tid_v == 4, which reaches behind initialized pointers. It is a bit surprising there does not exist any REQUIRE() check ensuring it would fail in controlled manner. So we crash by using uninitialized memory as pointer.

Comment 11 Ben Cotton 2022-02-08 20:19:55 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 36 development cycle.
Changing version to 36.

Comment 12 Petr Menšík 2022-02-11 16:35:34 UTC
I have built testing build in copr [1].

It requires modification of /etc/sysconfig/named:
OPTIONS=" -E pkcs11 -H 200"

Then systemctl restart ipa passed. It seems to be possible way, but requires also modification of freeipa package to autodetect -H parameter support. Then add new parameter to named options. Unfortunately this cannot be done just in bind-dyndb-ldap, which usually is the only dependency build together with bind. Not yet accepted or refused by upstream.

1. https://copr.fedorainfracloud.org/coprs/pemensik/bind-9.16/build/3440794/

Comment 13 Fedora Update System 2022-03-18 16:13:28 UTC
FEDORA-2022-14e36aac0c has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-14e36aac0c

Comment 14 Fedora Update System 2022-03-18 17:09:58 UTC
FEDORA-2022-427cfc50f8 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2022-427cfc50f8

Comment 15 Fedora Update System 2022-03-18 17:11:00 UTC
FEDORA-2022-042d9c6146 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2022-042d9c6146

Comment 16 Fedora Update System 2022-03-19 19:27:09 UTC
FEDORA-2022-14e36aac0c has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-14e36aac0c`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-14e36aac0c

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Fedora Update System 2022-03-19 22:54:50 UTC
FEDORA-2022-042d9c6146 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-042d9c6146`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-042d9c6146

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 18 Fedora Update System 2022-03-19 23:16:42 UTC
FEDORA-2022-427cfc50f8 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-427cfc50f8`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-427cfc50f8

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 19 Fedora Update System 2022-03-22 03:42:24 UTC
FEDORA-2022-427cfc50f8 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 20 Fedora Update System 2022-03-26 15:25:20 UTC
FEDORA-2022-14e36aac0c has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 21 Fedora Update System 2022-03-27 01:40:29 UTC
FEDORA-2022-042d9c6146 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.