Bug 1447268 - nsupdate crash causes sssd to fail after joining a domain
Summary: nsupdate crash causes sssd to fail after joining a domain
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: bind
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Petr Menšík
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-02 09:16 UTC by Stef Walter
Modified: 2017-12-12 10:06 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-12-12 10:06:39 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Core dump of nsupdate (470.63 KB, application/octet-stream)
2017-05-02 09:34 UTC, Stef Walter
no flags Details

Description Stef Walter 2017-05-02 09:16:28 UTC
Description of problem:

The nsupdate utility crashes when invoked by sssd.

Stack trace of thread 2224:
#0  0x00007fc4ec6f291f raise (libc.so.6)
#1  0x00007fc4ec6f451a abort (libc.so.6)
#2  0x00007fc4ee941b9f isc_assertion_failed (libisc.so.160)
#3  0x00007fc4ef26e532 dns_name_clone (libdns.so.165)
#4  0x00005601df4296e0 setzone (nsupdate)
#5  0x00005601df42b1ed send_update (nsupdate)
#6  0x00005601df42c7ba update_completed (nsupdate)
#7  0x00007fc4ee96656c run (libisc.so.160)
#8  0x00007fc4ed7476ca start_thread (libpthread.so.0)
#9  0x00007fc4ec7c4f7f __clone (libc.so.6)
                                                       
Stack trace of thread 2221:
#0  0x00007fc4ec6f2d06 __sigsuspend (libc.so.6)
#1  0x00007fc4ee96f981 isc__app_ctxrun (libisc.so.160)
#2  0x00007fc4ee97048b isc_app_run (libisc.so.160)
#3  0x00005601df4279de main (nsupdate)
#4  0x00007fc4ec6dd401 __libc_start_main (libc.so.6)
#5  0x00005601df427d0a _start (nsupdate)
                                                       
Stack trace of thread 2223:
#0  0x00007fc4ed74d460 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1  0x00007fc4ee96c680 run (libisc.so.160)
#2  0x00007fc4ed7476ca start_thread (libpthread.so.0)
#3  0x00007fc4ec7c4f7f __clone (libc.so.6)

Stack trace of thread 2222:
#0  0x00007fc4ec7c5573 epoll_wait (libc.so.6)
#1  0x00007fc4ee97a4e3 watcher (libisc.so.160)
#2  0x00007fc4ed7476ca start_thread (libpthread.so.0)
#3  0x00007fc4ec7c4f7f __clone (libc.so.6)

Version-Release number of selected component (if applicable):

bind-utils-9.10.4-4.P8.fc25.x86_64
sssd-1.15.2-2.fc25.x86_64

How reproducible:

Happens intermittently one of 10 times running through a ipa-client-install process immediately followed by a 'getent passwd'

Steps to Reproduce:
1. LANG=C /usr/sbin/ipa-client-install --domain cockpit.lan --realm COCKPIT.LAN --mkhomedir --enable-dns-updates --unattended --force-join --principal admin -W --force-ntpd
2. getent passwd admin

Actual results:

No output from getent. Due to the fact that nsupdate has crashed when invoked by sssd.

Expected results:

Successful domain join.

Comment 1 Stef Walter 2017-05-02 09:26:20 UTC
More SSSD journal logs related to this crash:

Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: ; Communication with fec0::ddfe:a6b7:3d90:17b7#53 failed: operation canceled
Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: name.c:1023: REQUIRE((__builtin_expect(!!((source) != ((void *)0)), 1) && __builtin_expect(!!(((const isc__magic_t *)(source))->magic == ((('D') << 24 | ('N') << 16 | ('S') << 8 | ('n')))), 1))) failed, back trace
Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #0 0x7fd75d5f4c47 in ??
Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #1 0x7fd75d5f4b9a in ??
Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #2 0x7fd75df21532 in ??
Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #3 0x556f76fa36e0 in ??
Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #4 0x556f76fa51ed in ??
Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #5 0x556f76fa67ba in ??
Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #6 0x7fd75d61956c in ??
Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #7 0x7fd75c3fa6ca in ??
Mai 02 05:19:12 x0.cockpit.lan sssd[2022]: #8 0x7fd75b477f7f in ??

Comment 2 Stef Walter 2017-05-02 09:33:19 UTC
Backtrace with debug info:

Missing separate debuginfos, use: dnf debuginfo-install gssproxy-0.7.0-3.fc25.x86_64 libgcc-6.3.1-1.fc25.x86_64 sssd-client-1.15.2-2.fc25.x86_64
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1  0x00007fb8c7c1d51a in __GI_abort () at abort.c:89
#2  0x00007fb8c9e6ab9f in isc_assertion_failed (
    file=file@entry=0x7fb8c9eb1af8 "mem.c", line=line@entry=1080, 
    type=type@entry=isc_assertiontype_insist, 
    cond=cond@entry=0x7fb8c9eb1bdd "ctx->stats[i].gets == 0U")
    at assertions.c:59
#3  0x00007fb8c9e7ba88 in destroy (ctx=ctx@entry=0x55a946e31030) at mem.c:1080
#4  0x00007fb8c9e7c00c in isc__mem_destroy (
    ctxp=ctxp@entry=0x55a945a0aa40 <gmctx>) at mem.c:1233
#5  0x00007fb8c9e7fb48 in isc_mem_destroy (mctxp=0x55a945a0aa40 <gmctx>)
    at mem.c:2760
#6  0x000055a9457ffba7 in cleanup () at ./nsupdate.c:3128
#7  main (argc=2, argv=0x7ffd83b1e258) at ./nsupdate.c:3180
...
(gdb) p i
$6 = 13
(gdb) p ctx->stats[i].gets
$1 = 1
(gdb) p ctx->stats[0]
$7 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[1]
$8 = {gets = 0, totalgets = 1, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[2]
$9 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[3]
$10 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[4]
$11 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[5]
$12 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[6]
$13 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[7]
$14 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[8]
$15 = {gets = 0, totalgets = 1, blocks = 1, freefrags = 513}
(gdb) p ctx->stats[9]
$16 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[10]
$17 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[11]
$18 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[12]
$19 = {gets = 0, totalgets = 2, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[13]
$20 = {gets = 1, totalgets = 3, blocks = 0, freefrags = 0}
(gdb) p ctx->stats[14]
$21 = {gets = 0, totalgets = 0, blocks = 0, freefrags = 0}
(gdb) p *ctx
$22 = {common = {impmagic = 0, magic = 0, 
    methods = 0x7fb8ca0ca220 <memmethods>}, ondestroy = {magic = 1147491188, 
    events = {head = 0x0, tail = 0x0}}, flags = 2, lock = {__data = {
      __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 3, 
      __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22 times>, 
    __align = 0}, memalloc = 0x7fb8c9e7a970 <default_memalloc>, 
  memfree = 0x7fb8c9e7a960 <default_memfree>, arg = 0x0, max_size = 1100, 
  checkfree = isc_boolean_true, stats = 0x55a946e311a0, references = 0, 
  name = '\000' <repeats 15 times>, tag = 0x0, quota = 0, total = 3365389, 
  inuse = 32, maxinuse = 2919503, hi_water = 0, lo_water = 0, 
  hi_called = isc_boolean_false, is_overmem = isc_boolean_false, water = 0x0, 
  water_arg = 0x0, pools = {head = 0x0, tail = 0x0}, poolcnt = 0, 
  mem_target = 4096, freelists = 0x55a946e39b50, 
  basic_blocks = 0x7fb8caf1b010, basic_table = 0x55a946e3df40, 
  basic_table_count = 1, basic_table_size = 1024, 
  lowest = 0x7fb8caeec010 "X\300\356ʸ\177", highest = 0x7fb8caf2c00f "", 
  debuglist = 0x0, debuglistcnt = 0, memalloc_failures = 0, link = {
    prev = 0xffffffffffffffff, next = 0xffffffffffffffff}}

Comment 3 Stef Walter 2017-05-02 09:34:55 UTC
Created attachment 1275635 [details]
Core dump of nsupdate

Comment 4 Stef Walter 2017-05-02 09:35:21 UTC
This bug was found by the Cockpit integration tests.

Comment 5 Stef Walter 2017-05-02 09:35:46 UTC
Cockpit is tracking cases of this bug as a known issue: https://github.com/cockpit-project/cockpit/issues/6422

Comment 6 Stephen Gallagher 2017-05-02 09:55:04 UTC
Stef, can you get debug logs at level 9 for that interaction? They include the exact request sent to nsupdate 

Since it sounds like a race condition, I wonder if we are trying to update the DNS entry before the network interface is fully ready or something and sending bad data into nsupdate.

Of course, nsupdate still should not crash, but maybe we can get a consistent reproducer this way.

Comment 7 Stef Walter 2017-05-02 10:43:09 UTC
Not sure what that entails. This is integration tests, not a manual process. Would it make for you to submit a change to the Cockpit integration tests to add the debug log level stuff?

Somewhere around here?

https://github.com/cockpit-project/cockpit/blob/master/test/verify/check-realms#L177

Comment 8 Stephen Gallagher 2017-05-10 12:23:23 UTC
(In reply to Stef Walter from comment #7)
> Not sure what that entails. This is integration tests, not a manual process.
> Would it make for you to submit a change to the Cockpit integration tests to
> add the debug log level stuff?
> 
> Somewhere around here?
> 
> https://github.com/cockpit-project/cockpit/blob/master/test/verify/check-
> realms#L177

Sorry, this fell off my radar until today.

Stef, can you tweak your CI to set `debug_level=9` in the [domain/DOMAINNAME] section of sssd.conf when enrolling (but before the service starts) and attach /var/log/sssd/sssd_DOMAINNAME.log ?

Comment 9 Petr Menšík 2017-05-17 18:28:42 UTC
Hi Stef, Hi Stephen.

I tried to debug coredump from comment #3, however its backtrace is something completely different than I see in bug report description, but matches comment #2.

(gdb) bt
#0  0x00007fb8c7c1b91f in __libc_signal_restore_set (set=0x7ffd83b1ddc0)
    at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#1  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:55
#2  0x00007fb8c7c1d51a in __GI_abort () at abort.c:87
#3  0x00007fb8c9e7c00c in isc__mem_destroy (ctxp=0xd, 
    ctxp@entry=0x55a945a0aa40 <gmctx>) at mem.c:1233
#4  0x00007fb8c9e7fb48 in isc_mem_destroy (mctxp=0x55a945a0aa40 <gmctx>)
    at mem.c:2760
#5  0x000055a9457ffba7 in cleanup () at ./nsupdate.c:3128
#6  main (argc=2, argv=0x55a946e31030) at ./nsupdate.c:3180

Bad thing is, it is memory checking at the end of program, after everything was freed but single item was not. Problem is I have not enough information to guess what did went wrong. It is memory leak detection, more or less harmless here.

Could you add some parameters to nsupdate and catch its stderr output?
Parameters that will enable more debug information AND very verbose memory trace logging are these:
nsupdate -d -D -M

Are you able to reproduce crash at least once with these parameters and saved stderr?

Comment 10 Petr Menšík 2017-05-17 19:34:44 UTC
I suspect this bug occurs only when kerberos is used and something unusual happens. If you could provide exact input passed into nsupdate, it would be really useful.

Are there simple steps to install cockpit prepared with ipa? Is there docker image available?

Comment 11 Stef Walter 2017-07-18 08:16:01 UTC
Hey Petr,

Yes you can do the following to have the Cockpit integration tests start, and run such a test:

$ git clone https://github.com/cockpit-project/cockpit
$ cd cockpit
$ bots/image-prepare
$ test/verify/check-realms -tvs

See test/README for the dependencies involved that are necessary to be installed.

Comment 12 Fedora End Of Life 2017-11-16 18:44:37 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 13 Fedora End Of Life 2017-12-12 10:06:39 UTC
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.