Hide Forgot
Description of problem: While trying to debug an issue with 389-ds-base, I tried running: /usr/bin/gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `/sbin/pidof ns-slapd` this crashes with: GNU gdb (GDB) Red Hat Enterprise Linux (7.2-64.el6_5.2) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/ns-slapd...Reading symbols from /usr/lib/debug/usr/sbin/ns-slapd.debug...done. done. Attaching to program: /usr/sbin/ns-slapd, process 32109 ../../gdb/linux-nat.c:1411: internal-error: linux_nat_post_attach_wait: Assertion `pid == new_pid' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) [answered Y; input not from terminal] ../../gdb/linux-nat.c:1411: internal-error: linux_nat_post_attach_wait: Assertion `pid == new_pid' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Create a core file of GDB? (y or n) [answered Y; input not from terminal] Version-Release number of selected component (if applicable): gdb-7.2-64.el6_5.2.x86_64 How reproducible: This happened when ns-slapd was in a bad state. I cannot reproduce now that it isn't.
Thanks for the report, Orion. It is a bit odd that this assertion is triggering. Would you be able to provide a reproducer for this? You said that the problem manifested when ns-slapd was in a bad state, so perhaps you could say how to reach this bad state. Thank you.
(gdb) thr appl all bt Thread 1 (Thread 0x7f7e7001b700 (LWP 28636)): #0 0x00000033aa832625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00000033aa833e05 in abort () at abort.c:92 #2 0x0000000000415ba6 in dump_core () at ../../gdb/utils.c:1038 #3 0x0000000000417cfa in internal_vproblem (problem=0xa37e60, file=<value optimized out>, line=<value optimized out>, fmt=<value optimized out>, ap=<value optimized out>) at ../../gdb/utils.c:1204 #4 0x0000000000417f79 in internal_verror (file=<value optimized out>, line=<value optimized out>, fmt=<value optimized out>, ap=<value optimized out>) at ../../gdb/utils.c:1219 #5 0x0000000000418011 in internal_error (file=0x6fdc <Address 0x6fdc out of bounds>, line=28636, string=0x6 <Address 0x6 out of bounds>) at ../../gdb/utils.c:1229 #6 0x000000000044fd2e in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x1d84368, signalled=0x1d8436c) at ../../gdb/linux-nat.c:1411 #7 0x000000000044fe3b in linux_nat_attach (ops=<value optimized out>, args=<value optimized out>, from_tty=<value optimized out>) at ../../gdb/linux-nat.c:1578 #8 0x00000000005364d8 in target_attach (args=0x7fffaec48f70 "32109", from_tty=1) at ../../gdb/target.c:3018 #9 0x0000000000502e52 in attach_command (args=0x7fffaec48f70 "32109", from_tty=1) at ../../gdb/infcmd.c:2459 #10 0x0000000000516ef7 in catch_command_errors (command=0x502d90 <attach_command>, arg= 0x7fffaec48f70 "32109", from_tty=1, mask=<value optimized out>) at ../../gdb/exceptions.c:534 #11 0x000000000040acb8 in captured_main (data=<value optimized out>) at ../../gdb/main.c:977 #12 0x0000000000516f8b in catch_errors (func=0x409c60 <captured_main>, func_args= 0x7fffaec486f0, errstring=0x69112f "", mask=<value optimized out>) at ../../gdb/exceptions.c:518 #13 0x00000000004098d4 in gdb_main (args=<value optimized out>) at ../../gdb/main.c:1076 #14 0x00000000004098a9 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../gdb/gdb.c:48 #6 0x000000000044fd2e in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x1d84368, signalled=0x1d8436c) at ../../gdb/linux-nat.c:1411 1411 gdb_assert (pid == new_pid); (gdb) list 1406 /* Try again with __WCLONE to check cloned processes. */ 1407 new_pid = my_waitpid (pid, &status, __WCLONE); 1408 *cloned = 1; 1409 } 1410 1411 gdb_assert (pid == new_pid); 1412 1413 if (!WIFSTOPPED (status)) 1414 { 1415 /* The pid we tried to attach has apparently just exited. */ (gdb) print pid $1 = 32109 (gdb) print new_pid $2 = -1
Here are the steps to reproduce with ns-slapd: - Install 389-ds-base that has the bug introduced with the initial fix for https://fedorahosted.org/389/ticket/47748. Not sure where such a build would be anymore - Enable password expiration checking - Attempt to authenticate on a sssd system as a user with an expiring password (within the password warning period, but not yet expired) - sssd will start creating lots of ldap connections to the server - eventually gdb stack traces will crash with this bug.
An update: I tried attaching gdb to a program that has many threads spawn and exit constantly in quick sucession, and that reveals issues that I'm looking at addressing (upstream). I haven't managed to trigger the assertion though. It may be a kernel bug. > (gdb) print pid > $1 = 32109 > (gdb) print new_pid > $2 = -1 Could you also print errno ?
(gdb) print errno $1 = 13
Thanks. Hmm, that's: #define EACCES 13 /* Permission denied */ Given your app does authentication things, I wonder if it is dropping privileges or some such just as GDB is attaching to the thread, and we hit some kernel-side race. I realized that frames #0-#5 may have clobbered errno by the time you go print it, though. To be certain, could you put a breakpoint in gdb's internal_error, and print errno then? Sorry for the extra roundtrip.
Orion, would you be willing to help us verifying the fix? For now I'm setting QE Conditional NAK: reproducer.
I'm afraid I don't have the ability to easily reproduce this condition any more and so can't check.
(In reply to Orion Poplawski from comment #10) > I'm afraid I don't have the ability to easily reproduce this condition any > more and so can't check. Thanks Orion, FYI, trying to reproduce this resulted in many "attach" bugs being fixed in upstream GDB. You can see the main ones here if you're curious: https://sourceware.org/ml/gdb-patches/2014-12/msg00447.html (note those are all too invasive for backporting to rhel.) Unfortunately, none of those explained your bug's case; it all indicates that the kernel managed to let gdb attach to the process, but then interacting with the process fails with EACCES... So the fix just has GDB detect the situation gracefully instead of crashing.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1325.html