Whew, so this one's a doozy! In testing of today's Rawhide compose, Fedora-Rawhide-20181005.n.0, FreeIPA server deployment fails. (This seems consistent, BTW: there are two tests that do this and both tests failed the same way). The logs show an awful lot of stuff going on. First, 'ipa-submit' crashes: Oct 05 13:34:37 ipa001.domain.local systemd-coredump[29951]: Process 29935 (ipa-submit) of user 0 dumped core. Stack trace of thread 29935: #0 0x00007f3cb8dc55cf raise (libc.so.6) #1 0x00007f3cb8daf895 abort (libc.so.6) #2 0x00007f3cb8daf769 __assert_fail_base.cold.0 (libc.s> #3 0x00007f3cb8dbd9f6 __assert_fail (libc.so.6) #4 0x00007f3cb92d0c95 n/a (libkrb5.so.3) #5 0x00007f3cb92d16ef n/a (libkrb5.so.3) #6 0x00007f3cb92dad0c n/a (libkrb5.so.3) #7 0x00007f3cb92d48b3 n/a (libkrb5.so.3) #8 0x00007f3cb92d39dd krb5_cc_select (libkrb5.so.3) #9 0x00007f3cb8906959 n/a (libgssapi_krb5.so.2) #10 0x00007f3cb890fa82 n/a (libgssapi_krb5.so.2) #11 0x00007f3cb8910b06 n/a (libgssapi_krb5.so.2) #12 0x00007f3cb88fa0bb gss_init_sec_context (libgssapi_k> #13 0x00007f3cb8443667 gssapi_client_mech_step (libgssap> #14 0x00007f3cb8cc9471 sasl_client_step (libsasl2.so.3) #15 0x00007f3cb8cc95fa sasl_client_start (libsasl2.so.3) #16 0x00007f3cb93a71db ldap_int_sasl_bind (libldap-2.4.s> #17 0x00007f3cb93aa87c ldap_sasl_interactive_bind (libld> #18 0x00007f3cb93aaa9a ldap_sasl_interactive_bind_s (lib> #19 0x000055b7f8321d6a n/a (ipa-submit) #20 0x000055b7f832249c n/a (ipa-submit) #21 0x000055b7f8320cb2 n/a (ipa-submit) #22 0x00007f3cb8db1413 __libc_start_main (libc.so.6) #23 0x000055b7f8321c0e n/a (ipa-submit) then, some kind of kernel locking issue involving xfs shows up: Oct 05 13:34:43 ipa001.domain.local kernel: ====================================================== Oct 05 13:34:43 ipa001.domain.local kernel: WARNING: possible circular locking dependency detected Oct 05 13:34:43 ipa001.domain.local kernel: 4.19.0-0.rc6.git3.1.fc30.x86_64 #1 Not tainted Oct 05 13:34:43 ipa001.domain.local kernel: ------------------------------------------------------ Oct 05 13:34:43 ipa001.domain.local kernel: kswapd0/41 is trying to acquire lock: Oct 05 13:34:43 ipa001.domain.local kernel: 000000006d6da44f (sb_internal){.+.+}, at: xfs_trans_alloc+0x18f/0x200 [xfs] Oct 05 13:34:43 ipa001.domain.local kernel: but task is already holding lock: Oct 05 13:34:43 ipa001.domain.local kernel: 000000007ecb1cd3 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 Oct 05 13:34:43 ipa001.domain.local kernel: which lock already depends on the new lock. then ipa-submit crashes a bunch more times: Oct 05 13:36:17 ipa001.domain.local systemd-coredump[30961]: Process 30951 (ipa-submit) of user 0 dumped core. Oct 05 13:36:47 ipa001.domain.local systemd-coredump[31547]: Process 31539 (ipa-submit) of user 0 dumped core. Oct 05 13:36:50 ipa001.domain.local systemd-coredump[31621]: Process 31619 (ipa-submit) of user 0 dumped core. Oct 05 13:37:28 ipa001.domain.local systemd-coredump[32129]: Process 32127 (ipa-submit) of user 0 dumped core. then named-pkcs11 joins the party, with a fairly similar-looking trace: Oct 05 13:38:47 ipa001.domain.local systemd-coredump[32664]: Process 32657 (named-pkcs11) of user 25 dumped core. Stack trace of thread 32658: #0 0x00007f648712e5cf raise (libc.so.6) #1 0x00007f6487118895 abort (libc.so.6) #2 0x00007f6487118769 __assert_fail_base.cold.0 (libc.so.6) #3 0x00007f64871269f6 __assert_fail (libc.so.6) #4 0x00007f6486e77c95 n/a (libkrb5.so.3) #5 0x00007f6486e786ef n/a (libkrb5.so.3) #6 0x00007f6486e81d0c n/a (libkrb5.so.3) #7 0x00007f6486e7b8b3 n/a (libkrb5.so.3) #8 0x00007f6486e7a9dd krb5_cc_select (libkrb5.so.3) #9 0x00007f6486f5a959 n/a (libgssapi_krb5.so.2) #10 0x00007f6486f63a82 n/a (libgssapi_krb5.so.2) #11 0x00007f6486f64b06 n/a (libgssapi_krb5.so.2) #12 0x00007f6486f4e0bb gss_init_sec_context (libgssapi_krb5.so.2) #13 0x00007f6482b37667 gssapi_client_mech_step (libgssapiv2.so) #14 0x00007f6482def471 sasl_client_step (libsasl2.so.3) #15 0x00007f6482def5fa sasl_client_start (libsasl2.so.3) #16 0x00007f6482e2e1db ldap_int_sasl_bind (libldap-2.4.so.2) #17 0x00007f6482e3187c ldap_sasl_interactive_bind (libldap-2.4.so.2) #18 0x00007f6482e31a9a ldap_sasl_interactive_bind_s (libldap-2.4.so.2) #19 0x00007f6482e7ac6f ldap_connect (ldap.so) #20 0x00007f6482e87a90 new_ldap_instance (ldap.so) #21 0x00007f6482e78292 dyndb_init (ldap.so) #22 0x00007f64878cf75a dns_dyndb_load (libdns-pkcs11.so.1102) #23 0x000055b2e8357728 n/a (named-pkcs11) #24 0x000055b2e8364ee3 n/a (named-pkcs11) #25 0x000055b2e8365f57 n/a (named-pkcs11) #26 0x00007f648781d899 n/a (libisc-pkcs11.so.169) #27 0x00007f648748e58e start_thread (libpthread.so.0) #28 0x00007f64871f3293 __clone (libc.so.6) I have asked an expensive team of highly-trained scientists to consider this data and their conclusion was "stuff is definitely broke". Thanks, scientists! Will attach more detailed logs and so on.
Created attachment 1490966 [details] tarball of /var/log from the failed test
Wow, this crash is *contagious* - if you try to run it through gdb...gdb crashes. Core was generated by `/usr/libexec/certmonger/ipa-submit'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 return ret; (gdb) set logging on Copying output to gdb.txt. (gdb) thread apply all bt full Segmentation fault (core dumped)
Here's the best I can do: (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f3cb8daf895 in __GI_abort () at abort.c:79 #2 0x00007f3cb8daf769 in __assert_fail_base (fmt=0x7f3cb8f16eb0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f3cb9336f25 "r == 0", file=0x7f3cb9336f08 "../../../include/k5-thread.h", line=376, function=<optimized out>) at assert.c:92 #3 0x00007f3cb8dbd9f6 in __GI___assert_fail (assertion=assertion@entry=0x7f3cb9336f25 "r == 0", file=file@entry=0x7f3cb9336f08 "../../../include/k5-thread.h", line=line@entry=376, function=function@entry=0x7f3cb9336ff0 <__PRETTY_FUNCTION__.5598> "k5_mutex_lock") at assert.c:101 #4 0x00007f3cb92d0c95 in k5_mutex_lock (m=0x55b7f90409b8) at ../../../include/k5-thread.h:376 #5 0x00007f3cb92d16ef in k5_mutex_lock (m=0x55b7f90409b8) at ../../../include/k5-thread.h:376 #6 k5_cc_mutex_lock (context=context@entry=0x55b7f9051b20, m=m@entry=0x55b7f90409b8) at ccbase.c:460 Segmentation fault (core dumped)
Similar story with the named-pkcs11 crash, backtracing it also crashes gdb: (gdb) bt full #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 set = {__val = {16387, 140069578765808, 140069712060800, 140069739891035, 140071695908864, 140069578765808, 140069578765808, 140069578765808, 140069578765808, 140069578765898, 140069578765908, 140069578765808, 140069578765908, 0, 0, 0}} pid = <optimized out> tid = <optimized out> ret = <optimized out> #1 0x00007f6487118895 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x7f647d7c75f0, sa_sigaction = 0x7f647d7c75f0}, sa_mask = { __val = {0, 140069553831968, 140069712058688, 0, 0, 0, 140069712060248, 21474836480, 140069712060096, 140069741017680, 140069741002416, 0, 9654014966396037376, 140069740987315, 0, 140069741002416}}, sa_flags = -2031231224, sa_restorer = 0x7f6486eddf25} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x00007f6487118769 in __assert_fail_base (fmt=0x7f648727feb0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f6486eddf25 "r == 0", file=0x7f6486eddf08 "../../../include/k5-thread.h", line=376, function=<optimized out>) at assert.c:92 str = 0x7f647d7c75f0 "p\235y}d\177" total = 4096 #3 0x00007f64871269f6 in __GI___assert_fail (assertion=assertion@entry=0x7f6486eddf25 "r == 0", file=file@entry=0x7f6486eddf08 "../../../include/k5-thread.h", line=line@entry=376, function=function@entry=0x7f6486eddff0 <__PRETTY_FUNCTION__.5598> "k5_mutex_lock") at assert.c:101 No locals. #4 0x00007f6486e77c95 in k5_mutex_lock (m=0x7f647d799d78) at ../../../include/k5-thread.h:376 r = <optimized out> __PRETTY_FUNCTION__ = "k5_mutex_lock" #5 0x00007f6486e786ef in k5_mutex_lock (m=0x7f647d799d78) at ../../../include/k5-thread.h:376 r = 22 r = <optimized out> __PRETTY_FUNCTION__ = "k5_mutex_lock" #6 k5_cc_mutex_lock (context=context@entry=0x7f647d7a3ff0, m=m@entry=0x7f647d799d78) at ccbase.c:460 No locals. Segmentation fault (core dumped)
Let's add some gdb maintainers to the CC party!
Created attachment 1490969 [details] tarball containing the crash coredumps (the whole abrt directory for each, in fact)
The failure in both cases is an assertion in annobin in glibc within libkrb5. Thus moving to krb5 component -- I guess there are gaps which annobin detects and which need to be fixed at krb5 side.
This looks like possibly the same bug as https://bugzilla.redhat.com/show_bug.cgi?id=1633089 . I think I may have fixed it in krb5-1.16.1-20 ; can you try with that?
it'll happen automatically with the next successful Rawhide compose.
That build seems to be broken for the releases it was submitted as an update for. See openQA failures: FEDORA-2018-fde188a269 - krb5-1.16.1-20.fc28 https://openqa.fedoraproject.org/tests/290634 https://openqa.fedoraproject.org/tests/290622 FEDORA-2018-0408a72d39 - krb5-1.16.1-20.fc29 https://openqa.fedoraproject.org/tests/290599 https://openqa.fedoraproject.org/tests/290611 The logs - see "Logs & Assets" tab - show very similar problems to those reported in this bug, multiple crashes running through libkrb, e.g.: { "signal": 6 , "executable": "/usr/libexec/certmonger/ipa-submit" , "stacktrace": [ { "crash_thread": true , "frames": [ { "address": 139810602745515 , "build_id": "4677f12c31fa6c18c9925096f32e71afd4c71327" , "build_id_offset": 224939 , "function_name": "raise" , "file_name": "/lib64/libc.so.6" } , { "address": 139810602657209 , "build_id": "4677f12c31fa6c18c9925096f32e71afd4c71327" , "build_id_offset": 136633 , "function_name": "abort" , "file_name": "/lib64/libc.so.6" } , { "address": 139810602656913 , "build_id": "4677f12c31fa6c18c9925096f32e71afd4c71327" , "build_id_offset": 136337 , "function_name": "_nl_load_domain.cold.0" , "file_name": "/lib64/libc.so.6" } , { "address": 139810602714642 , "build_id": "4677f12c31fa6c18c9925096f32e71afd4c71327" , "build_id_offset": 194066 , "file_name": "/lib64/libc.so.6" } , { "address": 139810626620997 , "build_id": "a97d31f22a47cfc282aacaca99672bf4f3984900" , "build_id_offset": 196165 , "file_name": "/lib64/libkrb5.so.3" } , { "address": 139810626623647 , "build_id": "a97d31f22a47cfc282aacaca99672bf4f3984900" , "build_id_offset": 198815 , "function_name": "k5_cc_mutex_lock" , "file_name": "/lib64/libkrb5.so.3" } , { "address": 139810626662252 , "build_id": "a97d31f22a47cfc282aacaca99672bf4f3984900" , "build_id_offset": 237420 , "function_name": "krb5_mcc_close" , "file_name": "/lib64/libkrb5.so.3" } , { "address": 139810626636547 , "build_id": "a97d31f22a47cfc282aacaca99672bf4f3984900" , "build_id_offset": 211715 , "function_name": "realm_choose" , "file_name": "/lib64/libkrb5.so.3" } , { "address": 139810626632749 , "build_id": "a97d31f22a47cfc282aacaca99672bf4f3984900" , "build_id_offset": 207917 , "function_name": "krb5_cc_select" , "file_name": "/lib64/libkrb5.so.3" } , { "address": 139810547787593 , "build_id": "19dc0f068054682836700e0df269e7002f2d7abb" , "build_id_offset": 141129 , "function_name": "kg_cred_resolve" , "file_name": "/lib64/libgssapi_krb5.so.2" } , { "address": 139810547824754 , "build_id": "19dc0f068054682836700e0df269e7002f2d7abb" , "build_id_offset": 178290 , "function_name": "krb5_gss_init_sec_context_ext" , "file_name": "/lib64/libgssapi_krb5.so.2" } , { "address": 139810547828982 , "build_id": "19dc0f068054682836700e0df269e7002f2d7abb" , "build_id_offset": 182518 , "function_name": "krb5_gss_init_sec_context" , "file_name": "/lib64/libgssapi_krb5.so.2" } , { "address": 139810547736235 , "build_id": "19dc0f068054682836700e0df269e7002f2d7abb" , "build_id_offset": 89771 , "function_name": "gss_init_sec_context" , "file_name": "/lib64/libgssapi_krb5.so.2" } , { "address": 139810519338711 , "build_id": "d4421f649df90b99f09feaafb5e2c16d51490966" , "build_id_offset": 11991 , "file_name": "/usr/lib64/sasl2/libgssapiv2.so" } , { "address": 139810593379985 , "build_id": "3dd431575f610904df6e7f6f708931118693bc2c" , "build_id_offset": 38545 , "function_name": "sasl_client_step" , "file_name": "/lib64/libsasl2.so.3" } , { "address": 139810593380378 , "build_id": "3dd431575f610904df6e7f6f708931118693bc2c" , "build_id_offset": 38938 , "function_name": "sasl_client_start" , "file_name": "/lib64/libsasl2.so.3" } , { "address": 139810629588027 , "build_id": "83bfd387acf651d85146865b67720f35f5b804cf" , "build_id_offset": 95291 , "function_name": "ldap_int_sasl_bind" , "file_name": "/lib64/libldap-2.4.so.2" } , { "address": 139810629602012 , "build_id": "83bfd387acf651d85146865b67720f35f5b804cf" , "build_id_offset": 109276 , "function_name": "ldap_sasl_interactive_bind" , "file_name": "/lib64/libldap-2.4.so.2" } , { "address": 139810629602554 , "build_id": "83bfd387acf651d85146865b67720f35f5b804cf" , "build_id_offset": 109818 , "function_name": "ldap_sasl_interactive_bind_s" , "file_name": "/lib64/libldap-2.4.so.2" } , { "address": 94513532705626 , "build_id": "9d767fd01f98945efad349bb9b3ead7116899d94" , "build_id_offset": 23386 , "file_name": "/usr/libexec/certmonger/ipa-submit" } , { "address": 94513532707468 , "build_id": "9d767fd01f98945efad349bb9b3ead7116899d94" , "build_id_offset": 25228 , "file_name": "/usr/libexec/certmonger/ipa-submit" } , { "address": 94513532701410 , "build_id": "9d767fd01f98945efad349bb9b3ead7116899d94" , "build_id_offset": 19170 , "file_name": "/usr/libexec/certmonger/ipa-submit" } ] } ] } Unfortunately, the F29 update was pushed stable with positive karma from folks who clearly didn't test FreeIPA, or look at the automated test results :( That means this is now a clear F29 Final blocker.
This is probably the same bug as https://bugzilla.redhat.com/show_bug.cgi?id=1633089 .
Well...it looks the same as the crash in https://bugzilla.redhat.com/show_bug.cgi?id=1633089#c26 , yes. But if I'm following things correctly, that's not the same as the *initial* problem in that bug. The initial problem in that bug was a crash that occurred at a later point in a specific test suite. You then investigated and attempted to fix that crash with the -20 build. But it seems like the build that attempts to fix it introduces this *other* crash during initial FreeIPA deployment - so when your -20 build which tried to fix #1633089 showed up in Rawhide on 2018-10-02, then later in F28 and F29 updates-testing yesterday, it *introduced* this crash to Fedora.
krb5-1.16.1-21.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-b1048ce2df
krb5-1.16.1-21.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-e2b9bccef0
Note, -21 should just revert to the state of -19, before Robbie tried to fix 1633089 and this crash showed up. 1633089 will still likely be present in it, but hopefully this crash will not be present and openQA tests will pass.
krb5-1.16.1-21.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-e2b9bccef0
hey folks, can we get some blocker votes on this? it's a pretty obvious blocker and the fix is ready but I cannot push it stable until it's accepted...
+1 given adam's comments.
+1 blocker
+1 Blocker
That's +4, setting accepted.
krb5-1.16.1-21.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.
krb5-1.16.1-21.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.