In Fedora-Rawhide-20201119.n.0 , both the FreeIPA upgrade tests (from 32 and 33) are failing. bind on the server crashes shortly after startup, leaving the clients unable to resolve names. This is the backtrace of the crash: === Thread 6 (Thread 0x7f50c7b28640 (LWP 1019)): #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7f50c832f0cc) at ../sysdeps/nptl/futex-internal.h:187 __ret = -512 oldtype = 0 err = <optimized out> oldtype = <optimized out> err = <optimized out> __ret = <optimized out> resultvar = <optimized out> __arg4 = <optimized out> __arg3 = <optimized out> __arg2 = <optimized out> __arg1 = <optimized out> _a4 = <optimized out> _a3 = <optimized out> _a2 = <optimized out> _a1 = <optimized out> #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f50c832f028, cond=0x7f50c832f0a0) at pthread_cond_wait.c:507 spin = 0 buffer = {__routine = 0x7f50ca06b1e0 <__condvar_cleanup_waiting>, __arg = 0x7f50c7b27ca0, __canceltype = 0, __prev = 0x0} cbuffer = {wseq = 55, cond = 0x7f50c832f0a0, mutex = 0x7f50c832f028, private = 0} err = <optimized out> g = 3350363264 flags = <optimized out> g1_start = <optimized out> maxspin = 0 signals = <optimized out> result = 0 wseq = 55 seq = 27 private = 0 maxspin = <optimized out> err = <optimized out> result = <optimized out> wseq = <optimized out> g = <optimized out> seq = <optimized out> flags = <optimized out> private = <optimized out> signals = <optimized out> done = <optimized out> g1_start = <optimized out> spin = <optimized out> buffer = {__routine = <optimized out>, __arg = <optimized out>, __canceltype = <optimized out>, __prev = <optimized out>} cbuffer = {wseq = <optimized out>, cond = <optimized out>, mutex = <optimized out>, private = <optimized out>} s = <optimized out> #2 __pthread_cond_wait (cond=0x7f50c832f0a0, mutex=0x7f50c832f028) at pthread_cond_wait.c:637 No locals. #3 0x00007f50ca41efe2 in dispatch (manager=0x7f50c832f010) at ../../../lib/isc/task.c:1103 task = <optimized out> task = <optimized out> dispatch_count = <optimized out> done = <optimized out> requeue = <optimized out> finished = <optimized out> event = <optimized out> was_idle = <optimized out> #4 run (uap=0x7f50c832f010) at ../../../lib/isc/task.c:1331 manager = 0x7f50c832f010 #5 0x00007f50ca0653f9 in start_thread (arg=0x7f50c7b28640) at pthread_create.c:473 ret = <optimized out> pd = 0x7f50c7b28640 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139984924476992, 1023172103268482320, 140726763984366, 140726763984367, 0, 139984924476992, -1111677725453042416, -1111652881951867632}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #6 0x00007f50c9e07bb3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals. Thread 5 (Thread 0x7f50c57b1640 (LWP 1022)): #0 0x00007f50c9dfc96f in __GI___poll (fds=0x7f50b8004b04, nfds=1, timeout=100) at ../sysdeps/unix/sysv/linux/poll.c:29 sc_ret = -516 sc_cancel_oldtype = 0 sc_ret = <optimized out> #1 0x00007f50c5ad19dd in wait4msg (result=0x7f50c57b04b8, timeout=<optimized out>, all=<optimized out>, msgid=<optimized out>, ld=0x7f50b8000bf0) at result.c:315 err = <optimized out> lc_ready = 0 tv = {tv_sec = 0, tv_usec = 100000} start_time_tv = {tv_sec = 1605786573, tv_usec = 766085} rc = -2 tv0 = {tv_sec = 0, tv_usec = 100000} tvp = 0x7f50c57b02c0 lc = 0x0 rc = <optimized out> tv = {tv_sec = <optimized out>, tv_usec = <optimized out>} tv0 = {tv_sec = <optimized out>, tv_usec = <optimized out>} start_time_tv = {tv_sec = <optimized out>, tv_usec = <optimized out>} tvp = <optimized out> lc = <optimized out> __PRETTY_FUNCTION__ = {<optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>} lc_ready = <optimized out> err = <optimized out> lnext = <optimized out> serviced = <optimized out> curr_time_tv = {tv_sec = <optimized out>, tv_usec = <optimized out>} delta_time_tv = {tv_sec = <optimized out>, tv_usec = <optimized out>} #2 ldap_result (ld=0x7f50b8000bf0, msgid=<optimized out>, all=<optimized out>, timeout=<optimized out>, result=0x7f50c57b04b8) at result.c:120 rc = <optimized out> __PRETTY_FUNCTION__ = "ldap_result" #3 0x00007f50c5af83b5 in ldap_sync_init (ls=ls@entry=0x7f50b800a360, mode=mode@entry=3) at ldap_sync.c:760 msg = 0x0 ctrl = {ldctl_oid = 0x7f50c5afeb69 "1.3.6.1.4.1.4203.1.9.1.1", ldctl_value = {bv_len = 8, bv_val = 0x7f50b800a3f0 "0\006\n\001\003\001\001"}, ldctl_iscritical = 1 '\001'} ctrls = {0x7f50c57b04d0, 0x0} ber = 0x7f50b80033f0 rc = <optimized out> tv = {tv_sec = 0, tv_usec = 100000} tvp = 0x0 res = 0x0 done = <optimized out> __PRETTY_FUNCTION__ = "ldap_sync_init" #4 0x00007f50c5b3b5a5 in ldap_sync_doit (inst=inst@entry=0x7f50c8343520, conn=<optimized out>, filter_objcs=filter_objcs@entry=0x7f50c5b48880 "(|(objectClass=idnsZone) (objectClass=idnsForwardZone) (objectClass=idnsRecord))", mode=mode@entry=3) at ldap_helper.c:4687 result = 0 ret = <optimized out> s_len = <optimized out> ldap_sync = 0x7f50b800a360 err_hint = 0x7f50c5b43f41 "" filter = "(| (objectClass=idnsConfigObject) (&(objectClass=idnsServerConfigObject) (idnsServerId=ipa001.domain.local))(|(objectClass=idnsZone) (objectClass=idnsForwardZone) (objectClass=idnsRecord)))\000\000"... config_template = "(| (objectClass=idnsConfigObject) %s%s%s%s)" server_id = 0x7f50c832d1f8 "ipa001.domain.local" cleanup = <optimized out> __func__ = {<optimized out> <repeats 15 times>} #5 0x00007f50c5b3ba54 in ldap_syncrepl_watcher (arg=0x7f50c8343520) at ldap_helper.c:4779 inst = 0x7f50c8343520 conn = 0x7f50c832a7a8 ret = <optimized out> result = <optimized out> sigset = {__val = {512, 0 <repeats 15 times>}} reconnect_interval = 0 state = sync_datainit retry = <optimized out> __func__ = {<optimized out> <repeats 22 times>} #6 0x00007f50ca0653f9 in start_thread (arg=0x7f50c57b1640) at pthread_create.c:473 ret = <optimized out> pd = 0x7f50c57b1640 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139984887289408, 1023172103268482320, 139984924464910, 139984924464911, 0, 139984887289408, -1111681652663763696, -1111652881951867632}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #7 0x00007f50c9e07bb3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals. Thread 4 (Thread 0x7f50c6afc640 (LWP 1021)): #0 0x00007f50c9e07efe in epoll_wait (epfd=11, events=0x5561b7cfa5a0, maxevents=2048, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 sc_ret = -4 sc_cancel_oldtype = 0 sc_ret = <optimized out> #1 0x00007f50ca4385dc in watcher (uap=0x7f50c832ff30) at unix/../../../../lib/isc/unix/socket.c:4314 manager = 0x7f50c832ff30 done = false cc = <optimized out> fnname = 0x7f50ca44ac06 "epoll_wait()" strbuf = '\000' <repeats 127 times> #2 0x00007f50ca0653f9 in start_thread (arg=0x7f50c6afc640) at pthread_create.c:473 ret = <optimized out> pd = 0x7f50c6afc640 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139984907519552, 1023172103268482320, 140726763984254, 140726763984255, 0, 139984907519552, -1111680180026852080, -1111652881951867632}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #3 0x00007f50c9e07bb3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals. Thread 3 (Thread 0x7f50c7327640 (LWP 1020)): #0 0x00007f50ca071d5e in __futex_abstimed_wait_cancelable64 (futex_word=0x7f50c83300a4, expected=0, clockid=<optimized out>, abstime=0x7f50c7326c20, private=<optimized out>) at ../sysdeps/nptl/futex-internal.c:116 sc_cancel_oldtype = 0 sc_ret = <optimized out> clockbit = <optimized out> err = <optimized out> op = 393 #1 0x00007f50ca06b796 in __pthread_cond_wait_common (abstime=0x7f50c7326c20, clockid=0, mutex=0x7f50c8330028, cond=0x7f50c8330078) at pthread_cond_wait.c:519 spin = 0 buffer = {__routine = 0x7f50ca06b1e0 <__condvar_cleanup_waiting>, __arg = 0x7f50c7326bb0, __canceltype = 0, __prev = 0x0} cbuffer = {wseq = 199, cond = 0x7f50c8330078, mutex = 0x7f50c8330028, private = 0} err = <optimized out> g = 1 flags = <optimized out> g1_start = <optimized out> maxspin = 0 signals = <optimized out> result = 0 wseq = 199 seq = 99 private = 0 maxspin = <optimized out> err = <optimized out> result = <optimized out> wseq = <optimized out> g = <optimized out> seq = <optimized out> flags = <optimized out> private = <optimized out> signals = <optimized out> done = <optimized out> g1_start = <optimized out> spin = <optimized out> buffer = {__routine = <optimized out>, __arg = <optimized out>, __canceltype = <optimized out>, __prev = <optimized out>} cbuffer = {wseq = <optimized out>, cond = <optimized out>, mutex = <optimized out>, private = <optimized out>} s = <optimized out> #2 __pthread_cond_timedwait (cond=0x7f50c8330078, mutex=0x7f50c8330028, abstime=0x7f50c7326c20) at pthread_cond_wait.c:655 flags = <optimized out> clockid = 0 #3 0x00007f50ca430980 in isc_condition_waituntil (c=0x7f50c8330078, m=0x7f50c8330028, t=0x7f50c833006c) at pthreads/../../../../lib/isc/pthreads/condition.c:59 presult = <optimized out> result = <optimized out> ts = {tv_sec = 1605786597, tv_nsec = 717388000} strbuf = "\315[\266_\000\000\000\000\235q\v\000\000\000\000\000\020\000\266\305P\177\000\000+\000\000\000\000\000\000\000\366\276C\312P\177\000\000\001\251@\312P\177\000\000\340\261\006\312P\177\000\000\200l2\307P\177\000\000\000\000\000\000\000\000\000\000\177\nB\312P\177\000\000\200Qd\312P\177\000\000\210-?\312P\177\000\000\002\000\000\000\000\000\000\000\004\000\000\000\002\000\000\000\003\000\000\000\000\000\000\000\030\000\000\000\000\000\000" #4 0x00007f50ca421faf in run (uap=0x7f50c8330010) at ../../../lib/isc/timer.c:811 manager = <optimized out> now = {seconds = 1605786573, nanoseconds = 749981000} result = <optimized out> #5 0x00007f50ca0653f9 in start_thread (arg=0x7f50c7327640) at pthread_create.c:473 ret = <optimized out> pd = 0x7f50c7327640 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139984916084288, 1023172103268482320, 140726763984414, 140726763984415, 0, 139984916084288, -1111676626478285552, -1111652881951867632}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #6 0x00007f50c9e07bb3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals. Thread 2 (Thread 0x7f50c9946840 (LWP 1017)): #0 0x00007f50c9d4371a in __GI___sigsuspend (set=0x7ffd80c71730) at ../sysdeps/unix/sysv/linux/sigsuspend.c:26 sc_ret = -514 sc_cancel_oldtype = 0 sc_ret = <optimized out> #1 0x00007f50ca423063 in isc__app_ctxrun (ctx0=0x7f50ca45c220 <isc_g_appctx.lto_priv>) at unix/../../../../lib/isc/unix/app.c:754 result = <optimized out> ctx = 0x7f50ca45c220 <isc_g_appctx.lto_priv> event = 0x0 next_event = <optimized out> task = 0x0 sset = {__val = {0, 139984960562097, 0, 93878184220816, 0, 1859295532631526656, 139984932878392, 139984967682512, 140726763984800, 93878169943640, 93878169943792, 139984967467484, 140726763984952, 139984963555081, 139984963555152, 139984967682512}} strbuf = "\340\311E\312P\177\000\000@lD\312P\177\000\000#\000\000\000\000\000\000\000_\363B\312P\177\000\000z.\000\000\316\v\000\000\005\000\000\000\000\000\000\000.\000\000\000\000\000\000\000s\001\000\000\000\000\000\000\316\001\000\000|\000\000\000\234\000\000\000|\000\000\000\060\347ʷaU\000\000x.\000\000\000\000\000\000\220\347\002\000\000\000\000\000\000\252\354\311P\177\000\000p+F\312P\177\000\000y.\000\000\000\000\000" #2 0x00007f50ca423caf in isc_app_run () at unix/../../../../lib/isc/unix/../app_api.c:207 result = <optimized out> #3 isc_app_run () at unix/../../../../lib/isc/unix/../app_api.c:198 No locals. #4 0x00005561b6ede337 in main (argc=<optimized out>, argv=0x7ffd80c71978) at ../../../bin/named/main.c:1563 result = <optimized out> Thread 1 (Thread 0x7f50c8329640 (LWP 1018)): #0 __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:49 set = {__val = {16387, 139984961043373, 4611686018427387904, 0 <repeats 13 times>}} pid = <optimized out> tid = <optimized out> ret = <optimized out> #1 0x00007f50c9d2c8a4 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {7, 0, 139984961597632, 0, 139984932865664, 139984932865644, 139984969995595, 152, 139984969998552, 139984932865568, 139984961096710, 139984932865664, 139984932867592, 549755813898, 1859295532631526656, 139984932865664}}, sa_flags = -1225872388, sa_restorer = 0x98} sigs = {__val = {32, 93878169943632, 139984967676688, 139984932865648, 139984932865656, 139984967340006, 0, 0, 93878169759840, 139984932865392, 206158430248, 139984932865616, 139984932865424, 1859295532631526656, 0, 139984969995595}} #2 0x00005561b6edd415 in assertion_failed (file=<optimized out>, line=-936212880, type=<optimized out>, cond=0x7f50c8328678 "") at ../../../bin/named/main.c:238 tracebuf = {0x5561b6eeab91 <assertion_failed+97>, 0x7f50ca3f22c0 <isc_assertion_failed+16>, 0x7f50ca51096a <dns_db_attach+90>, 0x7f50c5b3f873 <zr_get_zone_dbs+243>, 0x7f50c5b3958f <update_record+911>, 0x7f50ca41f1f3 <run+707>, 0x7f50ca0653f9 <start_thread+233>, 0x7f50c9e07bb3 <clone+67>, 0x7f50c9e07bb3 <clone+67>, 0x0 <repeats 112 times>, 0x19cd8b6f9af90d00, 0xffffffffffffffff, 0x7f50c8328ad0, 0x0, 0x7f50c8328b10, 0x7f50c8328bb8, 0x7f50c8328bc0} i = <optimized out> nframes = 8 result = <optimized out> logsuffix = <optimized out> fname = 0x0 #3 0x00007f50ca3f22c0 in isc_assertion_failed (file=<optimized out>, line=<optimized out>, type=<optimized out>, cond=<optimized out>) at ../../../lib/isc/assertions.c:51 No locals. #4 0x00007f50ca51096a in dns_db_attach (source=<optimized out>, targetp=targetp@entry=0x7f50c8328b08) at ../../../lib/dns/db.c:152 No locals. #5 0x00007f50c5b3f873 in zr_get_zone_dbs (zr=0x7f50c5b56010, name=<optimized out>, ldapdbp=0x7f50c8328bb8, rbtdbp=0x7f50c8328bc0) at zone_register.c:456 result = 0 zinfo = 0x7f50c4ef8fe8 ldapdb = 0x0 __func__ = {<optimized out> <repeats 16 times>} #6 0x00007f50c5b3958f in update_record (task=<optimized out>, event=<optimized out>) at ldap_helper.c:3921 pevent = 0x7f50c5b5ea28 result = <optimized out> inst = 0x7f50c8343520 mctx = 0x5561b7cae730 zone_settings = 0x0 raw = 0x7f50c0044930 secure = 0x0 zone_found = true zone_reloaded = false serial = 32592 entry = 0x7f50c5b6f3d0 rbtdb = 0x0 ldapdb = 0x0 diff = {magic = 1145652806, mctx = 0x5561b7cae730, tuples = {head = 0x0, tail = 0x0}} version = 0x0 node = 0x0 rbt_rds_iterator = 0x0 sync_state = 3358755856 rdatalist = {head = 0x0, tail = 0x0} prevname = {magic = 1145983854, ndata = 0x0, length = 0, labels = 0, attributes = 0, offsets = 0x0, buffer = 0x0, link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, list = {head = 0x0, tail = 0x0}} prevorigin = {magic = 1145983854, ndata = 0x0, length = 0, labels = 0, attributes = 0, offsets = 0x0, buffer = 0x0, link = {prev = 0xffffffffffffffff, next = 0xffffffffffffffff}, list = {head = 0x0, tail = 0x0}} __func__ = {<optimized out> <repeats 14 times>} #7 0x00007f50ca41f1f3 in dispatch (manager=0x7f50c832f010) at ../../../lib/isc/task.c:1157 dispatch_count = 0 done = false finished = false requeue = false event = 0x7f50c5b5ea28 task = 0x7f50c8342330 task = <optimized out> dispatch_count = <optimized out> done = <optimized out> requeue = <optimized out> finished = <optimized out> event = <optimized out> was_idle = <optimized out> #8 run (uap=0x7f50c832f010) at ../../../lib/isc/task.c:1331 manager = 0x7f50c832f010 #9 0x00007f50ca0653f9 in start_thread (arg=0x7f50c8329640) at pthread_create.c:473 ret = <optimized out> pd = 0x7f50c8329640 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139984932869696, 1023172103268482320, 140726763984366, 140726763984367, 0, 139984932869696, -1111656840637694704, -1111652881951867632}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #10 0x00007f50c9e07bb3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals. === I'm assigning to freeipa for now, but this could also have been caused by any of these: Package: bind-dyndb-ldap-11.5-1.fc34 Old package: bind-dyndb-ldap-11.3-5.fc34 Package: krb5-1.18.3-2.fc34 Old package: krb5-1.18.2-30.fc34 Package: openldap-2.4.56-1.fc34 Old package: openldap-2.4.55-1.fc34 Package: python-ldap-3.3.1-2.fc34 Old package: python-ldap-3.3.1-1.fc34 Package: tomcat-1:9.0.40-1.fc34 Old package: tomcat-1:9.0.39-1.fc34 There was a different bind issue with the tests that deploy FreeIPA directly on Rawhide - https://bugzilla.redhat.com/show_bug.cgi?id=1899661 - which ab says he thinks he fixed in https://koji.fedoraproject.org/koji/taskinfo?taskID=55896678 ; I'm not sure if he thinks that will also fix this.
Proposing as a Beta blocker as a violation of "It must be possible to successfully complete a direct upgrade from a fully updated installation of each of the last two stable Fedora Server releases with the system configured as a FreeIPA domain controller or postgresql server as specified in the relevant criteria. The upgraded system must meet all relevant release criteria, including criteria relating to functionality of the server software." - https://fedoraproject.org/wiki/Fedora_34_Beta_Release_Criteria#Server_upgrade_requirements
Note that I think this issue is not an issue now that https://koji.fedoraproject.org/koji/taskinfo?taskID=55896678 is in Rawhide. The whole flow is that named exits due to a PKCS11 engine it is asked to start with is missing. This makes execution of other threads cancelled and irrelevant as the main named process is winding down. So this 'crash' is a side-effect of the other bug.
Ok, I reproduced it locally. The issue is that we merged 9.16 support and then made it working with 9.11 version. However, refcounting code has a difference between the two BIND versions and it wasn't taken into account when we made the same code base working with both 9.16/9.11. commit 542096278ba605a5800348a6c31dc058685860de Author: Stanislav Levin <slev> Date: Mon May 25 06:45:49 2020 +0000 bind-9.16: Correct reference counting With [0] BIND atomic functions for reference counting have the identical interface as their C11 counter parts. This means that these functions return the _previous_ refcount instead of the current one. [0]: https://gitlab.isc.org/isc-projects/bind9/-/commit/bef8ac5bae0f95dbff76ec99a7994024a5d6c64e I need to review all refcounting use and make sure we detach objects on '0' count in 9.11 and on '1' in 9.16. By fixing ldap_driver.c's detach() for 9.11 I've got rid of the crash already, so this is the way to go.
Fixed in https://bodhi.fedoraproject.org/updates/FEDORA-2020-1bf607bf69
We got a pass on the Rawhide that completed yesterday.