Bug 1162264

Summary:	gdb/linux-nat.c:1411: internal-error: linux_nat_post_attach_wait: Assertion `pid == new_pid' failed.
Product:	Red Hat Enterprise Linux 6	Reporter:	Orion Poplawski <orion>
Component:	gdb	Assignee:	Sergio Durigan Junior <sergiodj>
Status:	CLOSED ERRATA	QA Contact:	Miroslav Franc <mfranc>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	6.5	CC:	gdb-bugs, jan.kratochvil, mcermak, mfranc, ohudlick, orion, palves
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	gdb-7.2-82.el6	Doc Type:	Bug Fix
Doc Text:	Cause: Under some conditions, while attaching to a process, GDB can perform the initial low level ptrace attach request, but then Linux kernel refuses to let the debugger finish the attach sequence (the 'waitpid' system call fails with EACCES). Consequence: GDB did not expect this scenario and would terminate unexpectedly with an internal error. Fix: GDB now handles the described scenario gracefully, simply reporting back to the user that the attach request failed. Result: The user will receive a warning mentioning that GDB was unable to attach because permission was denied, but the debugging session is not affected by this.	Story Points:	---
Clone Of:
Clones:	1210135 (view as bug list)		Environment:
Last Closed:	2015-07-22 06:34:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Orion Poplawski 2014-11-10 16:55:51 UTC

Description of problem:

While trying to debug an issue with 389-ds-base, I tried running:

/usr/bin/gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `/sbin/pidof ns-slapd`

this crashes with:

GNU gdb (GDB) Red Hat Enterprise Linux (7.2-64.el6_5.2)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/ns-slapd...Reading symbols from /usr/lib/debug/usr/sbin/ns-slapd.debug...done.
done.
Attaching to program: /usr/sbin/ns-slapd, process 32109
../../gdb/linux-nat.c:1411: internal-error: linux_nat_post_attach_wait: Assertion `pid == new_pid' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) [answered Y; input not from terminal]
../../gdb/linux-nat.c:1411: internal-error: linux_nat_post_attach_wait: Assertion `pid == new_pid' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) [answered Y; input not from terminal]

Version-Release number of selected component (if applicable):
gdb-7.2-64.el6_5.2.x86_64

How reproducible:
This happened when ns-slapd was in a bad state.  I cannot reproduce now that it isn't.

Comment 2 Sergio Durigan Junior 2014-11-10 19:03:26 UTC

Thanks for the report, Orion.  It is a bit odd that this assertion is triggering.  Would you be able to provide a reproducer for this?  You said that the problem manifested when ns-slapd was in a bad state, so perhaps you could say how to reach this bad state.  Thank you.

Comment 3 Orion Poplawski 2014-11-14 23:21:58 UTC

(gdb) thr appl all bt

Thread 1 (Thread 0x7f7e7001b700 (LWP 28636)):
#0  0x00000033aa832625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00000033aa833e05 in abort () at abort.c:92
#2  0x0000000000415ba6 in dump_core () at ../../gdb/utils.c:1038
#3  0x0000000000417cfa in internal_vproblem (problem=0xa37e60, file=<value optimized out>,
    line=<value optimized out>, fmt=<value optimized out>, ap=<value optimized out>)
    at ../../gdb/utils.c:1204
#4  0x0000000000417f79 in internal_verror (file=<value optimized out>,
    line=<value optimized out>, fmt=<value optimized out>, ap=<value optimized out>)
    at ../../gdb/utils.c:1219
#5  0x0000000000418011 in internal_error (file=0x6fdc <Address 0x6fdc out of bounds>,
    line=28636, string=0x6 <Address 0x6 out of bounds>) at ../../gdb/utils.c:1229
#6  0x000000000044fd2e in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x1d84368,
    signalled=0x1d8436c) at ../../gdb/linux-nat.c:1411
#7  0x000000000044fe3b in linux_nat_attach (ops=<value optimized out>,
    args=<value optimized out>, from_tty=<value optimized out>) at ../../gdb/linux-nat.c:1578
#8  0x00000000005364d8 in target_attach (args=0x7fffaec48f70 "32109", from_tty=1)
    at ../../gdb/target.c:3018
#9  0x0000000000502e52 in attach_command (args=0x7fffaec48f70 "32109", from_tty=1)
    at ../../gdb/infcmd.c:2459
#10 0x0000000000516ef7 in catch_command_errors (command=0x502d90 <attach_command>, arg=
    0x7fffaec48f70 "32109", from_tty=1, mask=<value optimized out>)
    at ../../gdb/exceptions.c:534
#11 0x000000000040acb8 in captured_main (data=<value optimized out>) at ../../gdb/main.c:977
#12 0x0000000000516f8b in catch_errors (func=0x409c60 <captured_main>, func_args=
    0x7fffaec486f0, errstring=0x69112f "", mask=<value optimized out>)
    at ../../gdb/exceptions.c:518
#13 0x00000000004098d4 in gdb_main (args=<value optimized out>) at ../../gdb/main.c:1076
#14 0x00000000004098a9 in main (argc=<value optimized out>, argv=<value optimized out>)
    at ../../gdb/gdb.c:48


#6  0x000000000044fd2e in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x1d84368,
    signalled=0x1d8436c) at ../../gdb/linux-nat.c:1411
1411      gdb_assert (pid == new_pid);
(gdb) list
1406          /* Try again with __WCLONE to check cloned processes.  */
1407          new_pid = my_waitpid (pid, &status, __WCLONE);
1408          *cloned = 1;
1409        }
1410
1411      gdb_assert (pid == new_pid);
1412
1413      if (!WIFSTOPPED (status))
1414        {
1415          /* The pid we tried to attach has apparently just exited.  */
(gdb) print pid
$1 = 32109
(gdb) print new_pid
$2 = -1

Comment 4 Orion Poplawski 2014-11-14 23:28:21 UTC

Here are the steps to reproduce with ns-slapd:

- Install 389-ds-base that has the bug introduced with the initial fix for https://fedorahosted.org/389/ticket/47748.  Not sure where such a build would be anymore
- Enable password expiration checking
- Attempt to authenticate on a sssd system as a user with an expiring password (within the password warning period, but not yet expired)
- sssd will start creating lots of ldap connections to the server
- eventually gdb stack traces will crash with this bug.

Comment 5 Pedro Alves 2014-11-19 11:05:22 UTC

An update: I tried attaching gdb to a program that has many threads spawn and exit constantly in quick sucession, and that reveals issues that I'm looking at addressing (upstream).  I haven't managed to trigger the assertion though.  It may be a kernel bug.

> (gdb) print pid
> $1 = 32109
> (gdb) print new_pid
> $2 = -1

Could you also print errno ?

Comment 6 Orion Poplawski 2014-11-19 16:13:31 UTC

(gdb) print errno
$1 = 13

Comment 7 Pedro Alves 2014-11-20 09:08:22 UTC

Thanks.  Hmm, that's:

  #define EACCES          13      /* Permission denied */

Given your app does authentication things, I wonder if it is dropping privileges or some such just as GDB is attaching to the thread, and we hit some kernel-side race.

I realized that frames #0-#5 may have clobbered errno by the time you go print it, though.  To be certain, could you put a breakpoint in gdb's internal_error,
and print errno then?

Sorry for the extra roundtrip.

Comment 9 Martin Cermak 2015-02-20 06:52:50 UTC

Orion, would you be willing to help us verifying the fix? For now I'm setting QE Conditional NAK: reproducer.

Comment 10 Orion Poplawski 2015-02-20 15:57:29 UTC

I'm afraid I don't have the ability to easily reproduce this condition any more and so can't check.

Comment 13 Pedro Alves 2015-02-21 12:24:51 UTC

(In reply to Orion Poplawski from comment #10)
> I'm afraid I don't have the ability to easily reproduce this condition any
> more and so can't check.

Thanks Orion,

FYI, trying to reproduce this resulted in many "attach" bugs being fixed in upstream GDB.  You can see the main ones here if you're curious:

  https://sourceware.org/ml/gdb-patches/2014-12/msg00447.html

(note those are all too invasive for backporting to rhel.)

Unfortunately, none of those explained your bug's case; it all indicates that the kernel managed to let gdb attach to the process, but then interacting with the process fails with EACCES...  So the fix just has GDB detect the situation gracefully instead of crashing.

Comment 18 errata-xmlrpc 2015-07-22 06:34:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1325.html