Description of problem: The nsupdate utility crashes when invoked by sssd. Stack trace of thread 2224: #0 0x00007fc4ec6f291f raise (libc.so.6) #1 0x00007fc4ec6f451a abort (libc.so.6) #2 0x00007fc4ee941b9f isc_assertion_failed (libisc.so.160) #3 0x00007fc4ef26e532 dns_name_clone (libdns.so.165) #4 0x00005601df4296e0 setzone (nsupdate) #5 0x00005601df42b1ed send_update (nsupdate) #6 0x00005601df42c7ba update_completed (nsupdate) #7 0x00007fc4ee96656c run (libisc.so.160) #8 0x00007fc4ed7476ca start_thread (libpthread.so.0) #9 0x00007fc4ec7c4f7f __clone (libc.so.6) Stack trace of thread 2221: #0 0x00007fc4ec6f2d06 __sigsuspend (libc.so.6) #1 0x00007fc4ee96f981 isc__app_ctxrun (libisc.so.160) #2 0x00007fc4ee97048b isc_app_run (libisc.so.160) #3 0x00005601df4279de main (nsupdate) #4 0x00007fc4ec6dd401 __libc_start_main (libc.so.6) #5 0x00005601df427d0a _start (nsupdate) Stack trace of thread 2223: #0 0x00007fc4ed74d460 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fc4ee96c680 run (libisc.so.160) #2 0x00007fc4ed7476ca start_thread (libpthread.so.0) #3 0x00007fc4ec7c4f7f __clone (libc.so.6) Stack trace of thread 2222: #0 0x00007fc4ec7c5573 epoll_wait (libc.so.6) #1 0x00007fc4ee97a4e3 watcher (libisc.so.160) #2 0x00007fc4ed7476ca start_thread (libpthread.so.0) #3 0x00007fc4ec7c4f7f __clone (libc.so.6) Version-Release number of selected component (if applicable): bind-utils-9.10.4-4.P8.fc25.x86_64 sssd-1.15.2-2.fc25.x86_64 How reproducible: Happens intermittently one of 10 times running through a ipa-client-install process immediately followed by a 'getent passwd' Steps to Reproduce: 1. LANG=C /usr/sbin/ipa-client-install --domain cockpit.lan --realm COCKPIT.LAN --mkhomedir --enable-dns-updates --unattended --force-join --principal admin -W --force-ntpd 2. getent passwd admin Actual results: No output from getent. Due to the fact that nsupdate has crashed when invoked by sssd. Expected results: Successful domain join.
More SSSD journal logs related to this crash: Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: ; Communication with fec0::ddfe:a6b7:3d90:17b7#53 failed: operation canceled Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: name.c:1023: REQUIRE((__builtin_expect(!!((source) != ((void *)0)), 1) && __builtin_expect(!!(((const isc__magic_t *)(source))->magic == ((('D') << 24 | ('N') << 16 | ('S') << 8 | ('n')))), 1))) failed, back trace Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #0 0x7fd75d5f4c47 in ?? Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #1 0x7fd75d5f4b9a in ?? Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #2 0x7fd75df21532 in ?? Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #3 0x556f76fa36e0 in ?? Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #4 0x556f76fa51ed in ?? Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #5 0x556f76fa67ba in ?? Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #6 0x7fd75d61956c in ?? Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #7 0x7fd75c3fa6ca in ?? Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #8 0x7fd75b477f7f in ??
Backtrace with debug info: Missing separate debuginfos, use: dnf debuginfo-install gssproxy-0.7.0-3.fc25.x86_64 libgcc-6.3.1-1.fc25.x86_64 sssd-client-1.15.2-2.fc25.x86_64 (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 #1 0x00007fb8c7c1d51a in __GI_abort () at abort.c:89 #2 0x00007fb8c9e6ab9f in isc_assertion_failed ( file=file@entry=0x7fb8c9eb1af8 "mem.c", line=line@entry=1080, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb8c9eb1bdd "ctx->stats[i].gets == 0U") at assertions.c:59 #3 0x00007fb8c9e7ba88 in destroy (ctx=ctx@entry=0x55a946e31030) at mem.c:1080 #4 0x00007fb8c9e7c00c in isc__mem_destroy ( ctxp=ctxp@entry=0x55a945a0aa40 <gmctx>) at mem.c:1233 #5 0x00007fb8c9e7fb48 in isc_mem_destroy (mctxp=0x55a945a0aa40 <gmctx>) at mem.c:2760 #6 0x000055a9457ffba7 in cleanup () at ./nsupdate.c:3128 #7 main (argc=2, argv=0x7ffd83b1e258) at ./nsupdate.c:3180 ... (gdb) p i $6 = 13 (gdb) p ctx->stats[i].gets $1 = 1 (gdb) p ctx->stats[0] $7 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0} (gdb) p ctx->stats[1] $8 = {gets = 0, totalgets = 1, blocks = 0, freefrags = 0} (gdb) p ctx->stats[2] $9 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0} (gdb) p ctx->stats[3] $10 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0} (gdb) p ctx->stats[4] $11 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0} (gdb) p ctx->stats[5] $12 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0} (gdb) p ctx->stats[6] $13 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0} (gdb) p ctx->stats[7] $14 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0} (gdb) p ctx->stats[8] $15 = {gets = 0, totalgets = 1, blocks = 1, freefrags = 513} (gdb) p ctx->stats[9] $16 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0} (gdb) p ctx->stats[10] $17 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0} (gdb) p ctx->stats[11] $18 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0} (gdb) p ctx->stats[12] $19 = {gets = 0, totalgets = 2, blocks = 0, freefrags = 0} (gdb) p ctx->stats[13] $20 = {gets = 1, totalgets = 3, blocks = 0, freefrags = 0} (gdb) p ctx->stats[14] $21 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0} (gdb) p *ctx $22 = {common = {impmagic = 0, magic = 0, methods = 0x7fb8ca0ca220 <memmethods>}, ondestroy = {magic = 1147491188, events = {head = 0x0, tail = 0x0}}, flags = 2, lock = {__data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 3, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22 times>, __align = 0}, memalloc = 0x7fb8c9e7a970 <default_memalloc>, memfree = 0x7fb8c9e7a960 <default_memfree>, arg = 0x0, max_size = 1100, checkfree = isc_boolean_true, stats = 0x55a946e311a0, references = 0, name = '\000' <repeats 15 times>, tag = 0x0, quota = 0, total = 3365389, inuse = 32, maxinuse = 2919503, hi_water = 0, lo_water = 0, hi_called = isc_boolean_false, is_overmem = isc_boolean_false, water = 0x0, water_arg = 0x0, pools = {head = 0x0, tail = 0x0}, poolcnt = 0, mem_target = 4096, freelists = 0x55a946e39b50, basic_blocks = 0x7fb8caf1b010, basic_table = 0x55a946e3df40, basic_table_count = 1, basic_table_size = 1024, lowest = 0x7fb8caeec010 "X\300\356ʸ\177", highest = 0x7fb8caf2c00f "", debuglist = 0x0, debuglistcnt = 0, memalloc_failures = 0, link = { prev = 0xffffffffffffffff, next = 0xffffffffffffffff}}
Created attachment 1275635 [details] Core dump of nsupdate
This bug was found by the Cockpit integration tests.
Cockpit is tracking cases of this bug as a known issue: https://github.com/cockpit-project/cockpit/issues/6422
Stef, can you get debug logs at level 9 for that interaction? They include the exact request sent to nsupdate Since it sounds like a race condition, I wonder if we are trying to update the DNS entry before the network interface is fully ready or something and sending bad data into nsupdate. Of course, nsupdate still should not crash, but maybe we can get a consistent reproducer this way.
Not sure what that entails. This is integration tests, not a manual process. Would it make for you to submit a change to the Cockpit integration tests to add the debug log level stuff? Somewhere around here? https://github.com/cockpit-project/cockpit/blob/master/test/verify/check-realms#L177
(In reply to Stef Walter from comment #7) > Not sure what that entails. This is integration tests, not a manual process. > Would it make for you to submit a change to the Cockpit integration tests to > add the debug log level stuff? > > Somewhere around here? > > https://github.com/cockpit-project/cockpit/blob/master/test/verify/check- > realms#L177 Sorry, this fell off my radar until today. Stef, can you tweak your CI to set `debug_level=9` in the [domain/DOMAINNAME] section of sssd.conf when enrolling (but before the service starts) and attach /var/log/sssd/sssd_DOMAINNAME.log ?
Hi Stef, Hi Stephen. I tried to debug coredump from comment #3, however its backtrace is something completely different than I see in bug report description, but matches comment #2. (gdb) bt #0 0x00007fb8c7c1b91f in __libc_signal_restore_set (set=0x7ffd83b1ddc0) at ../sysdeps/unix/sysv/linux/nptl-signals.h:79 #1 __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:55 #2 0x00007fb8c7c1d51a in __GI_abort () at abort.c:87 #3 0x00007fb8c9e7c00c in isc__mem_destroy (ctxp=0xd, ctxp@entry=0x55a945a0aa40 <gmctx>) at mem.c:1233 #4 0x00007fb8c9e7fb48 in isc_mem_destroy (mctxp=0x55a945a0aa40 <gmctx>) at mem.c:2760 #5 0x000055a9457ffba7 in cleanup () at ./nsupdate.c:3128 #6 main (argc=2, argv=0x55a946e31030) at ./nsupdate.c:3180 Bad thing is, it is memory checking at the end of program, after everything was freed but single item was not. Problem is I have not enough information to guess what did went wrong. It is memory leak detection, more or less harmless here. Could you add some parameters to nsupdate and catch its stderr output? Parameters that will enable more debug information AND very verbose memory trace logging are these: nsupdate -d -D -M Are you able to reproduce crash at least once with these parameters and saved stderr?
I suspect this bug occurs only when kerberos is used and something unusual happens. If you could provide exact input passed into nsupdate, it would be really useful. Are there simple steps to install cockpit prepared with ipa? Is there docker image available?
Hey Petr, Yes you can do the following to have the Cockpit integration tests start, and run such a test: $ git clone https://github.com/cockpit-project/cockpit $ cd cockpit $ bots/image-prepare $ test/verify/check-realms -tvs See test/README for the dependencies involved that are necessary to be installed.
This message is a reminder that Fedora 25 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 25. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '25'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 25 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.