Bug 1162264
| Summary: | gdb/linux-nat.c:1411: internal-error: linux_nat_post_attach_wait: Assertion `pid == new_pid' failed. | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Orion Poplawski <orion> | |
| Component: | gdb | Assignee: | Sergio Durigan Junior <sergiodj> | |
| Status: | CLOSED ERRATA | QA Contact: | Miroslav Franc <mfranc> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 6.5 | CC: | gdb-bugs, jan.kratochvil, mcermak, mfranc, ohudlick, orion, palves | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | gdb-7.2-82.el6 | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Under some conditions, while attaching to a process, GDB can perform the initial low level ptrace attach request, but then Linux kernel refuses to let the debugger finish the attach sequence (the 'waitpid' system call fails with EACCES).
Consequence: GDB did not expect this scenario and would terminate unexpectedly with an internal error.
Fix: GDB now handles the described scenario gracefully, simply reporting back to the user that the attach request failed.
Result: The user will receive a warning mentioning that GDB was unable to attach because permission was denied, but the debugging session is not affected by this.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1210135 (view as bug list) | Environment: | ||
| Last Closed: | 2015-07-22 06:34:34 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
|
Description
Orion Poplawski
2014-11-10 16:55:51 UTC
Thanks for the report, Orion. It is a bit odd that this assertion is triggering. Would you be able to provide a reproducer for this? You said that the problem manifested when ns-slapd was in a bad state, so perhaps you could say how to reach this bad state. Thank you. (gdb) thr appl all bt
Thread 1 (Thread 0x7f7e7001b700 (LWP 28636)):
#0 0x00000033aa832625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00000033aa833e05 in abort () at abort.c:92
#2 0x0000000000415ba6 in dump_core () at ../../gdb/utils.c:1038
#3 0x0000000000417cfa in internal_vproblem (problem=0xa37e60, file=<value optimized out>,
line=<value optimized out>, fmt=<value optimized out>, ap=<value optimized out>)
at ../../gdb/utils.c:1204
#4 0x0000000000417f79 in internal_verror (file=<value optimized out>,
line=<value optimized out>, fmt=<value optimized out>, ap=<value optimized out>)
at ../../gdb/utils.c:1219
#5 0x0000000000418011 in internal_error (file=0x6fdc <Address 0x6fdc out of bounds>,
line=28636, string=0x6 <Address 0x6 out of bounds>) at ../../gdb/utils.c:1229
#6 0x000000000044fd2e in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x1d84368,
signalled=0x1d8436c) at ../../gdb/linux-nat.c:1411
#7 0x000000000044fe3b in linux_nat_attach (ops=<value optimized out>,
args=<value optimized out>, from_tty=<value optimized out>) at ../../gdb/linux-nat.c:1578
#8 0x00000000005364d8 in target_attach (args=0x7fffaec48f70 "32109", from_tty=1)
at ../../gdb/target.c:3018
#9 0x0000000000502e52 in attach_command (args=0x7fffaec48f70 "32109", from_tty=1)
at ../../gdb/infcmd.c:2459
#10 0x0000000000516ef7 in catch_command_errors (command=0x502d90 <attach_command>, arg=
0x7fffaec48f70 "32109", from_tty=1, mask=<value optimized out>)
at ../../gdb/exceptions.c:534
#11 0x000000000040acb8 in captured_main (data=<value optimized out>) at ../../gdb/main.c:977
#12 0x0000000000516f8b in catch_errors (func=0x409c60 <captured_main>, func_args=
0x7fffaec486f0, errstring=0x69112f "", mask=<value optimized out>)
at ../../gdb/exceptions.c:518
#13 0x00000000004098d4 in gdb_main (args=<value optimized out>) at ../../gdb/main.c:1076
#14 0x00000000004098a9 in main (argc=<value optimized out>, argv=<value optimized out>)
at ../../gdb/gdb.c:48
#6 0x000000000044fd2e in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x1d84368,
signalled=0x1d8436c) at ../../gdb/linux-nat.c:1411
1411 gdb_assert (pid == new_pid);
(gdb) list
1406 /* Try again with __WCLONE to check cloned processes. */
1407 new_pid = my_waitpid (pid, &status, __WCLONE);
1408 *cloned = 1;
1409 }
1410
1411 gdb_assert (pid == new_pid);
1412
1413 if (!WIFSTOPPED (status))
1414 {
1415 /* The pid we tried to attach has apparently just exited. */
(gdb) print pid
$1 = 32109
(gdb) print new_pid
$2 = -1
Here are the steps to reproduce with ns-slapd: - Install 389-ds-base that has the bug introduced with the initial fix for https://fedorahosted.org/389/ticket/47748. Not sure where such a build would be anymore - Enable password expiration checking - Attempt to authenticate on a sssd system as a user with an expiring password (within the password warning period, but not yet expired) - sssd will start creating lots of ldap connections to the server - eventually gdb stack traces will crash with this bug. An update: I tried attaching gdb to a program that has many threads spawn and exit constantly in quick sucession, and that reveals issues that I'm looking at addressing (upstream). I haven't managed to trigger the assertion though. It may be a kernel bug.
> (gdb) print pid
> $1 = 32109
> (gdb) print new_pid
> $2 = -1
Could you also print errno ?
(gdb) print errno $1 = 13 Thanks. Hmm, that's: #define EACCES 13 /* Permission denied */ Given your app does authentication things, I wonder if it is dropping privileges or some such just as GDB is attaching to the thread, and we hit some kernel-side race. I realized that frames #0-#5 may have clobbered errno by the time you go print it, though. To be certain, could you put a breakpoint in gdb's internal_error, and print errno then? Sorry for the extra roundtrip. Orion, would you be willing to help us verifying the fix? For now I'm setting QE Conditional NAK: reproducer. I'm afraid I don't have the ability to easily reproduce this condition any more and so can't check. (In reply to Orion Poplawski from comment #10) > I'm afraid I don't have the ability to easily reproduce this condition any > more and so can't check. Thanks Orion, FYI, trying to reproduce this resulted in many "attach" bugs being fixed in upstream GDB. You can see the main ones here if you're curious: https://sourceware.org/ml/gdb-patches/2014-12/msg00447.html (note those are all too invasive for backporting to rhel.) Unfortunately, none of those explained your bug's case; it all indicates that the kernel managed to let gdb attach to the process, but then interacting with the process fails with EACCES... So the fix just has GDB detect the situation gracefully instead of crashing. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1325.html |