Bug 808404

Summary: [abrt] gdb-7.2-51.fc14: dump_core: Process /usr/bin/gdb was killed by signal 6 (SIGABRT)
Product: Red Hat Enterprise Linux 6 Reporter: David Jaša <djasa>
Component: kernelAssignee: Oleg Nesterov <onestero>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.3CC: anton, djasa, jan.kratochvil, sergiodj
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:bf3ef9266883a16e9a8b9845671f2d65a0bdf2ed
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 716627 Environment:
Last Closed: 2014-02-02 15:31:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
yet another backtrace of gdb hang none

Description David Jaša 2012-03-30 10:42:46 UTC
Cloning to RHEL and proposing as exception for 6.3 as this prevents debugging of some hangs.


+++ This bug was initially created as a clone of Bug #716627 +++

abrt version: 1.1.18
architecture: x86_64
Attached file: backtrace, 12520 bytes
cmdline: gdb -q -nw -i mi --cd=/opt/ubuntu/home/muelli/ubuntu-maverick --command=.gdbinit /opt/ubuntu/home/muelli/ubuntu-maverick/vmlinux
component: gdb
Attached file: coredump, 47783936 bytes
crash_function: dump_core
executable: /usr/bin/gdb
kernel: 2.6.35.13-92.fc14.x86_64
package: gdb-7.2-51.fc14
rating: 4
reason: Process /usr/bin/gdb was killed by signal 6 (SIGABRT)
release: Fedora release 14 (Laughlin)
time: 1309027305
uid: 1000

How to reproduce
-----
1. I tried to debug Linux via remote GDB and QEmu
2. I clicked stop in the debugger
3.

--- Additional comment from fedora-bugs on 2011-06-25 20:48:42 CEST ---

Created attachment 509913 [details]
File: backtrace

--- Additional comment from fedora-bugs on 2011-06-26 01:15:27 CEST ---

Package: gdb-7.2-51.fc14
Architecture: x86_64
OS Release: Fedora release 14 (Laughlin)


How to reproduce
-----
1. I tried to debug Linux via remote GDB and QEmu
2. I clicked stop in the debugger
3.

--- Additional comment from jacob.oursland on 2011-09-08 07:38:06 CEST ---

Package: gdb-7.2-51.fc14
Architecture: i686
OS Release: Fedora release 14 (Laughlin)


How to reproduce
-----
1. Install Eclipse.
2. Create a New C++ Project with Hello World.
3. Debug the newly created project.
4. Crash!

--- Additional comment from abrt-bot on 2012-03-20 16:36:09 CET ---

*** Bug 612572 has been marked as a duplicate of this bug. ***

--- Additional comment from djasa on 2012-03-29 22:17:36 CEST ---

gdb hangs when connected to hung firefox (gdb --pid `pidof firefox`)

backtrace_rating: 4
Package: gdb-7.2-52.el6
OS Release: Red Hat Enterprise Linux Workstation release 6.3 Beta (Santiago)

--- Additional comment from djasa on 2012-03-29 22:17:50 CEST ---

Created attachment 573783 [details]
File: backtrace

Comment 1 David Jaša 2012-03-30 11:12:17 UTC
gdb hangs after it says it's attaching to the process:
Attaching to process 2067

and nothing happens untill I send it SIGTERM from another console. Then it prints:
../../gdb/linux-nat.c:2701: internal-error: stop_wait_callback: Assertion `lp->resumed' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

Comment 2 David Jaša 2012-04-02 09:06:54 UTC
(In reply to https://bugzilla.redhat.com/show_bug.cgi?id=716627#c7)
> Do you have any reproducibility of gdb hang when attaching to anything?
> That is for example how to reproduce that "hung firefox".

(moving to RHEL bug)

I got the hang by using "Save link as" on binary files but I can not get it anymore. Please could you provide me steps what to do in order to gather all necessary info if I hit it again in the future?

Comment 3 Jan Kratochvil 2012-04-02 13:18:51 UTC
Best do qemu "savevm" at that time. :-) But you probably do not run in a VM.
I do not know, you can at least store /proc/PID/status.
I more hoped you have the Firefox hang reproducible.

Comment 4 David Jaša 2012-04-02 13:24:48 UTC
(In reply to comment #3)
> Best do qemu "savevm" at that time. :-) But you probably do not run in a VM.

I don't, unfortunately.

> I more hoped you have the Firefox hang reproducible.

It was - at the time I was contributing to the bug. Now, after few reboots and a move to faster networks, I can not reproduce it anymore.

Comment 5 David Jaša 2012-04-09 20:32:24 UTC
Created attachment 576301 [details]
yet another backtrace of gdb hang

In addtion to backtrace, I'm attaching corresponding /proc/$PID/status of both firefox and gdb.


[djasa@dhcp-29-7 ~]$ cat /proc/`pidof firefox`/status
Name:	firefox
State:	S (sleeping)
Tgid:	3996
Pid:	3996
PPid:	1
TracerPid:	8249
Uid:	501	501	501	501
Gid:	100	100	100	100
Utrace:	1f0f
FDSize:	256
Groups:	0 10 36 100 484 486 498 
VmPeak:	 1509100 kB
VmSize:	 1358480 kB
VmLck:	       0 kB
VmHWM:	  673400 kB
VmRSS:	  386252 kB
VmData:	  790012 kB
VmStk:	     504 kB
VmExe:	      64 kB
VmLib:	   76488 kB
VmPTE:	    2812 kB
VmSwap:	    1568 kB
Threads:	21
SigQ:	0/61197
SigPnd:	0000000000040000
ShdPnd:	0000000000000000
SigBlk:	fffffffffffffef9
SigIgn:	0000000000001000
SigCgt:	00000001800144af
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	ffffffffffffffff
Cpus_allowed:	f
Cpus_allowed_list:	0-3
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	26193758
nonvoluntary_ctxt_switches:	2363491
[djasa@dhcp-29-7 ~]$ cat /proc/`pidof gdb`/status
Name:	gdb
State:	S (sleeping)
Tgid:	8249
Pid:	8249
PPid:	3065
TracerPid:	0
Uid:	501	501	501	501
Gid:	100	100	100	100
Utrace:	0
FDSize:	256
Groups:	0 10 36 100 484 486 498 
VmPeak:	  131440 kB
VmSize:	  131412 kB
VmLck:	       0 kB
VmHWM:	    6160 kB
VmRSS:	    6160 kB
VmData:	    2544 kB
VmStk:	     104 kB
VmExe:	    4288 kB
VmLib:	    4524 kB
VmPTE:	     128 kB
VmSwap:	       0 kB
Threads:	1
SigQ:	0/61197
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000001001000
SigCgt:	0000000188034087
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	ffffffffffffffff
Cpus_allowed:	f
Cpus_allowed_list:	0-3
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	18
nonvoluntary_ctxt_switches:	12

Comment 7 Jan Kratochvil 2012-05-15 13:33:30 UTC
The GDB hang - lost SIGSTOP notification - happens also on:
  3.2.0-0.bpo.1-amd64 (Debian Squeeze backports)

(gdb) bt
#0 in waitpid () from /lib/libpthread.so.0
#1 in my_waitpid (pid=561, statusp=0x7fff5b79ba8c, flags=0) at linux-nat.c:364
#2 in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x162dbc8, signalled=0x162dbcc) at linux-nat.c:1403 
#3 in linux_nat_attach (ops=<optimized out>, args=<optimized out>, from_tty=<optimized out>) at linux-nat.c:1654 
#4 in target_attach (args=0x7fff5b79d4d8 "561", from_tty=1) at target.c:3791
#5 in attach_command (args=0x7fff5b79d4d8 "561", from_tty=1) at infcmd.c:2534

Comment 10 Suzanne Logcher 2012-05-18 20:53:16 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 11 Jan Kratochvil 2012-07-24 14:04:38 UTC
I do not have any reproducer but it does not seem to be caused by GDB.
The problem is kernel lost SIGSTOP notification after PTRACE_ATTACH.

It is up to kernel / Oleg if there is an idea how that can happen or whether kernel has to CANTFIX it.

Comment 12 Oleg Nesterov 2012-07-24 15:02:25 UTC
(In reply to comment #5)
> In addtion to backtrace, I'm attaching corresponding /proc/$PID/status of
> both firefox and gdb.

And thanks for this...

I agree with Jan, this does look like a kernel problem.
But, it doesn't look like ptrace/etc problem afaics.

> [djasa@dhcp-29-7 ~]$ cat /proc/`pidof firefox`/status
> Name:	firefox
> State:	S (sleeping)

TASK_INTERRUPTIBLE

> SigPnd:	0000000000040000

SIGSTOP is pending. So why does it sleep?

because,

> SigBlk:	fffffffffffffef9

someone blocked SIGSTOP. Only the kernel can block it
but nobody should do this, this is always wrong.

Please show us /proc/`pidof firefox`/stack?


OK... 0xef9 leaves SIGINT + SIGQUIT + SIGKILL ublocked.
grep-grep-grep... fs/ncpfs/sock.c does exactly this,
and fs/autofs4/waitq.c seems to do something similar.

Do you use them?

Comment 13 David Jaša 2012-07-24 15:27:57 UTC
(In reply to comment #12)
> (In reply to comment #5)
> > In addtion to backtrace, I'm attaching corresponding /proc/$PID/status of
> > both firefox and gdb.
> 
> And thanks for this...
> 
> I agree with Jan, this does look like a kernel problem.
> But, it doesn't look like ptrace/etc problem afaics.
> 
> > [djasa@dhcp-29-7 ~]$ cat /proc/`pidof firefox`/status
> > Name:	firefox
> > State:	S (sleeping)
> 
> TASK_INTERRUPTIBLE
> 
> > SigPnd:	0000000000040000
> 
> SIGSTOP is pending. So why does it sleep?
> 
> because,
> 
> > SigBlk:	fffffffffffffef9
> 
> someone blocked SIGSTOP. Only the kernel can block it
> but nobody should do this, this is always wrong.
> 
> Please show us /proc/`pidof firefox`/stack?
> 

I didn't get the hang for quite some so I'm afraid I can't get one till I encouter it again.

Should I try using older kernel to increase chances of getting one?

> 
> OK... 0xef9 leaves SIGINT + SIGQUIT + SIGKILL ublocked.
> grep-grep-grep... fs/ncpfs/sock.c does exactly this,
> and fs/autofs4/waitq.c seems to do something similar.
> 
> Do you use them?

I don't use any of these.

Comment 14 Oleg Nesterov 2012-07-24 15:44:30 UTC
(In reply to comment #13)
>
> > Please show us /proc/`pidof firefox`/stack?
> > 
> 
> I didn't get the hang for quite some so I'm afraid I can't get one till I
> encouter it again.
> 
> Should I try using older kernel to increase chances of getting one?

Not sure this will help, but any additional info is very much
appreciated ;)
 
> > OK... 0xef9 leaves SIGINT + SIGQUIT + SIGKILL ublocked.
> > grep-grep-grep... fs/ncpfs/sock.c does exactly this,
> > and fs/autofs4/waitq.c seems to do something similar.
> > 
> > Do you use them?
> 
> I don't use any of these.

OK, thanks.

Comment 15 David Jaša 2012-08-13 20:38:38 UTC
I _think_ I got a reliable FF hang reproducer - on current RHEL 6:
1) have autofs up'n'running
2) make symlink to some automounted NFS directory:
ln -s /net/server/path/to/dir /mnt/server_dir
3) in firefox, save some page to /mnt/server_dir/path
4) kill connectivity to the server (kill VPN, iptables -I OUTPUT -d server -j DROP, ...)
5) try to save another page <-- autofs also does something wrong,
                                firefox freezes
6) attach with gdb to firefox: gdb -pid $(pidof firefox)
   gdb will also hang

so I actually _do_ use autofs4, I just didn't realize it when writing #c13. :(

(and there is something rotten in user-space part of autofs as I was getting gdb reports about it few months ago)


(writing all of this from VM so all info inline below)
======================================================


# cat /proc/$(pidof firefox)/stack
[<ffffffffa06bf831>] autofs4_wait+0x311/0x900 [autofs4]
[<ffffffffa06be492>] autofs4_d_automount+0x232/0x2f0 [autofs4]
[<ffffffff81189a39>] follow_managed+0x219/0x2d0
[<ffffffff81189b8f>] do_lookup+0x9f/0x230
[<ffffffff8118a02d>] __link_path_walk+0x20d/0x1030
[<ffffffff8118abb7>] __link_path_walk+0xd97/0x1030
[<ffffffff8118b0da>] path_walk+0x6a/0xe0
[<ffffffff8118b2ab>] do_path_lookup+0x5b/0xa0
[<ffffffff8118bf17>] user_path_at+0x57/0xa0
[<ffffffff81179b90>] sys_faccessat+0xd0/0x1d0
[<ffffffff81179ca8>] sys_access+0x18/0x20
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

# cat /proc/$(pidof firefox)/status
Name:	firefox
State:	S (sleeping)
Tgid:	1426
Pid:	1426
PPid:	1
TracerPid:	5349
Uid:	501	501	501	501
Gid:	100	100	100	100
Utrace:	1f0f
FDSize:	256
Groups:	0 10 36 100 484 486 498 
VmPeak:	 1442072 kB
VmSize:	 1380240 kB
VmLck:	       0 kB
VmHWM:	  636296 kB
VmRSS:	  563144 kB
VmData:	  853932 kB
VmStk:	     512 kB
VmExe:	      64 kB
VmLib:	   72236 kB
VmPTE:	    2612 kB
VmSwap:	       0 kB
Threads:	20
SigQ:	0/61196
SigPnd:	0000000000040000
ShdPnd:	0000000000000000
SigBlk:	fffffffffffffef9
SigIgn:	0000000000001000
SigCgt:	00000001800044af
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	ffffffffffffffff
Cpus_allowed:	f
Cpus_allowed_list:	0-3
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	8686227
nonvoluntary_ctxt_switches:	556257


# cat /proc/$(pidof gdb)/stack
[<ffffffff8106fb24>] do_wait+0x1e4/0x240
[<ffffffff8106fc23>] sys_wait4+0xa3/0x100
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

# cat /proc/$(pidof gdb)/status
Name:	gdb
State:	S (sleeping)
Tgid:	5349
Pid:	5349
PPid:	3336
TracerPid:	0
Uid:	501	501	501	501
Gid:	100	100	100	100
Utrace:	0
FDSize:	256
Groups:	0 10 36 100 484 486 498 
VmPeak:	  131440 kB
VmSize:	  131436 kB
VmLck:	       0 kB
VmHWM:	    6184 kB
VmRSS:	    6184 kB
VmData:	    2560 kB
VmStk:	     104 kB
VmExe:	    4288 kB
VmLib:	    4528 kB
VmPTE:	     136 kB
VmSwap:	       0 kB
Threads:	1
SigQ:	0/61196
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000001001000
SigCgt:	0000000188034087
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	ffffffffffffffff
Cpus_allowed:	f
Cpus_allowed_list:	0-3
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	58
nonvoluntary_ctxt_switches:	6


GDB backtrace:

Thread 1 (Thread 0x7f88cc288700 (LWP 5349)):
#0  0x0000003a2180f05e in __libc_waitpid (pid=<value optimized out>, stat_loc=0x7fff0aaea36c, options=<value optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:32
#1  0x000000000044c1e5 in my_waitpid (pid=1426, status=0x7fff0aaea36c, flags=0) at ../../gdb/linux-nat.c:426
#2  0x000000000044f243 in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x14f8f98, signalled=0x14f8f9c) at ../../gdb/linux-nat.c:1400
#3  0x000000000044f50b in linux_nat_attach (ops=<value optimized out>, args=<value optimized out>, from_tty=<value optimized out>) at ../../gdb/linux-nat.c:1578
#4  0x0000000000533068 in target_attach (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/target.c:3016
#5  0x00000000004ffb92 in attach_command (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/infcmd.c:2459
#6  0x0000000000513c37 in catch_command_errors (command=0x4ffad0 <attach_command>, arg=0x7fff0aaec46b "1426", from_tty=1, mask=<value optimized out>) at ../../gdb/exceptions.c:534
#7  0x000000000040a7a5 in captured_main (data=<value optimized out>) at ../../gdb/main.c:924
#8  0x0000000000513ccb in catch_errors (func=0x409c00 <captured_main>, func_args=0x7fff0aaea6b0, errstring=0x68d52f "", mask=<value optimized out>) at ../../gdb/exceptions.c:518
#9  0x0000000000409894 in gdb_main (args=<value optimized out>) at ../../gdb/main.c:1016
#10 0x0000000000409869 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../gdb/gdb.c:48

Thread 1 (Thread 0x7f88cc288700 (LWP 5349)):
#0  0x0000003a2180f05e in __libc_waitpid (pid=<value optimized out>, stat_loc=0x7fff0aaea36c, options=<value optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:32
        resultvar = 18446744073709551104
        oldtype = <value optimized out>
        result = <value optimized out>
#1  0x000000000044c1e5 in my_waitpid (pid=1426, status=0x7fff0aaea36c, flags=0) at ../../gdb/linux-nat.c:426
        ret = <value optimized out>
#2  0x000000000044f243 in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x14f8f98, signalled=0x14f8f9c) at ../../gdb/linux-nat.c:1400
        new_pid = <value optimized out>
        pid = 1426
        status = <value optimized out>
        __PRETTY_FUNCTION__ = "linux_nat_post_attach_wait"
#3  0x000000000044f50b in linux_nat_attach (ops=<value optimized out>, args=<value optimized out>, from_tty=<value optimized out>) at ../../gdb/linux-nat.c:1578
        lp = 0x14f8f80
        status = <value optimized out>
        ptid = {pid = 1426, lwp = 1426, tid = 0}
#4  0x0000000000533068 in target_attach (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/target.c:3016
        t = <value optimized out>
#5  0x00000000004ffb92 in attach_command (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/infcmd.c:2459
        async_exec = <value optimized out>
        back_to = 0x0
#6  0x0000000000513c37 in catch_command_errors (command=0x4ffad0 <attach_command>, arg=0x7fff0aaec46b "1426", from_tty=1, mask=<value optimized out>) at ../../gdb/exceptions.c:534
        e = {reason = 0, error = GDB_NO_ERROR, message = 0x0}
#7  0x000000000040a7a5 in captured_main (data=<value optimized out>) at ../../gdb/main.c:924
        context = <value optimized out>
        argc = 3
        argv = 0x7fff0aaea7b8
        quiet = 0
        set_args = 0
        symarg = 0x0
        execarg = 0x0
        pidarg = <value optimized out>
        corearg = 0x0
        pid_or_core_arg = <value optimized out>
        cdarg = <value optimized out>
        ttyarg = <value optimized out>
        print_help = 0
        print_version = 0
        cmdarg = <value optimized out>
        cmdsize = <value optimized out>
        ncmd = <value optimized out>
        dirarg = <value optimized out>
        dirsize = <value optimized out>
        ndir = <value optimized out>
        system_gdbinit = 0x0
        home_gdbinit = 0x0
        local_gdbinit = <value optimized out>
        i = <value optimized out>
        save_auto_load = 1
        objfile = <value optimized out>
        pre_stat_chain = 0x0
#8  0x0000000000513ccb in catch_errors (func=0x409c00 <captured_main>, func_args=0x7fff0aaea6b0, errstring=0x68d52f "", mask=<value optimized out>) at ../../gdb/exceptions.c:518
        val = 0
        exception = {reason = 0, error = GDB_NO_ERROR, message = 0x0}
#9  0x0000000000409894 in gdb_main (args=<value optimized out>) at ../../gdb/main.c:1016
No locals.
#10 0x0000000000409869 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../gdb/gdb.c:48
        args = {argc = 3, argv = 0x7fff0aaea7b8, use_windows = 0, interpreter_p = 0x67c030 "console"}

Comment 17 David Jaša 2012-08-13 21:27:50 UTC
(In reply to comment #15)
...
> (and there is something rotten in user-space part of autofs as I was getting
> gdb reports about it few months ago)
...

filed as a bug 847873.

Comment 18 Oleg Nesterov 2012-08-14 16:11:55 UTC
(In reply to comment #15)
>
> so I actually _do_ use autofs4, I just didn't realize it when writing #c13.
> :(
>
> ...
> # cat /proc/$(pidof firefox)/stack
> [<ffffffffa06bf831>] autofs4_wait+0x311/0x900 [autofs4]

Great, thanks, this confirms the theory.

Now that you opened bug 847873, probably we can close this one?

Comment 19 RHEL Program Management 2012-12-14 07:02:22 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.