Bug 808404
| Summary: | [abrt] gdb-7.2-51.fc14: dump_core: Process /usr/bin/gdb was killed by signal 6 (SIGABRT) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | David Jaša <djasa> | ||||
| Component: | kernel | Assignee: | Oleg Nesterov <onestero> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.3 | CC: | anton, djasa, jan.kratochvil, sergiodj | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | abrt_hash:bf3ef9266883a16e9a8b9845671f2d65a0bdf2ed | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 716627 | Environment: | |||||
| Last Closed: | 2014-02-02 15:31:27 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
David Jaša
2012-03-30 10:42:46 UTC
gdb hangs after it says it's attaching to the process: Attaching to process 2067 and nothing happens untill I send it SIGTERM from another console. Then it prints: ../../gdb/linux-nat.c:2701: internal-error: stop_wait_callback: Assertion `lp->resumed' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. (In reply to https://bugzilla.redhat.com/show_bug.cgi?id=716627#c7) > Do you have any reproducibility of gdb hang when attaching to anything? > That is for example how to reproduce that "hung firefox". (moving to RHEL bug) I got the hang by using "Save link as" on binary files but I can not get it anymore. Please could you provide me steps what to do in order to gather all necessary info if I hit it again in the future? Best do qemu "savevm" at that time. :-) But you probably do not run in a VM. I do not know, you can at least store /proc/PID/status. I more hoped you have the Firefox hang reproducible. (In reply to comment #3) > Best do qemu "savevm" at that time. :-) But you probably do not run in a VM. I don't, unfortunately. > I more hoped you have the Firefox hang reproducible. It was - at the time I was contributing to the bug. Now, after few reboots and a move to faster networks, I can not reproduce it anymore. Created attachment 576301 [details]
yet another backtrace of gdb hang
In addtion to backtrace, I'm attaching corresponding /proc/$PID/status of both firefox and gdb.
[djasa@dhcp-29-7 ~]$ cat /proc/`pidof firefox`/status
Name: firefox
State: S (sleeping)
Tgid: 3996
Pid: 3996
PPid: 1
TracerPid: 8249
Uid: 501 501 501 501
Gid: 100 100 100 100
Utrace: 1f0f
FDSize: 256
Groups: 0 10 36 100 484 486 498
VmPeak: 1509100 kB
VmSize: 1358480 kB
VmLck: 0 kB
VmHWM: 673400 kB
VmRSS: 386252 kB
VmData: 790012 kB
VmStk: 504 kB
VmExe: 64 kB
VmLib: 76488 kB
VmPTE: 2812 kB
VmSwap: 1568 kB
Threads: 21
SigQ: 0/61197
SigPnd: 0000000000040000
ShdPnd: 0000000000000000
SigBlk: fffffffffffffef9
SigIgn: 0000000000001000
SigCgt: 00000001800144af
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 26193758
nonvoluntary_ctxt_switches: 2363491
[djasa@dhcp-29-7 ~]$ cat /proc/`pidof gdb`/status
Name: gdb
State: S (sleeping)
Tgid: 8249
Pid: 8249
PPid: 3065
TracerPid: 0
Uid: 501 501 501 501
Gid: 100 100 100 100
Utrace: 0
FDSize: 256
Groups: 0 10 36 100 484 486 498
VmPeak: 131440 kB
VmSize: 131412 kB
VmLck: 0 kB
VmHWM: 6160 kB
VmRSS: 6160 kB
VmData: 2544 kB
VmStk: 104 kB
VmExe: 4288 kB
VmLib: 4524 kB
VmPTE: 128 kB
VmSwap: 0 kB
Threads: 1
SigQ: 0/61197
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000001001000
SigCgt: 0000000188034087
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 18
nonvoluntary_ctxt_switches: 12
The GDB hang - lost SIGSTOP notification - happens also on: 3.2.0-0.bpo.1-amd64 (Debian Squeeze backports) (gdb) bt #0 in waitpid () from /lib/libpthread.so.0 #1 in my_waitpid (pid=561, statusp=0x7fff5b79ba8c, flags=0) at linux-nat.c:364 #2 in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x162dbc8, signalled=0x162dbcc) at linux-nat.c:1403 #3 in linux_nat_attach (ops=<optimized out>, args=<optimized out>, from_tty=<optimized out>) at linux-nat.c:1654 #4 in target_attach (args=0x7fff5b79d4d8 "561", from_tty=1) at target.c:3791 #5 in attach_command (args=0x7fff5b79d4d8 "561", from_tty=1) at infcmd.c:2534 This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. I do not have any reproducer but it does not seem to be caused by GDB. The problem is kernel lost SIGSTOP notification after PTRACE_ATTACH. It is up to kernel / Oleg if there is an idea how that can happen or whether kernel has to CANTFIX it. (In reply to comment #5) > In addtion to backtrace, I'm attaching corresponding /proc/$PID/status of > both firefox and gdb. And thanks for this... I agree with Jan, this does look like a kernel problem. But, it doesn't look like ptrace/etc problem afaics. > [djasa@dhcp-29-7 ~]$ cat /proc/`pidof firefox`/status > Name: firefox > State: S (sleeping) TASK_INTERRUPTIBLE > SigPnd: 0000000000040000 SIGSTOP is pending. So why does it sleep? because, > SigBlk: fffffffffffffef9 someone blocked SIGSTOP. Only the kernel can block it but nobody should do this, this is always wrong. Please show us /proc/`pidof firefox`/stack? OK... 0xef9 leaves SIGINT + SIGQUIT + SIGKILL ublocked. grep-grep-grep... fs/ncpfs/sock.c does exactly this, and fs/autofs4/waitq.c seems to do something similar. Do you use them? (In reply to comment #12) > (In reply to comment #5) > > In addtion to backtrace, I'm attaching corresponding /proc/$PID/status of > > both firefox and gdb. > > And thanks for this... > > I agree with Jan, this does look like a kernel problem. > But, it doesn't look like ptrace/etc problem afaics. > > > [djasa@dhcp-29-7 ~]$ cat /proc/`pidof firefox`/status > > Name: firefox > > State: S (sleeping) > > TASK_INTERRUPTIBLE > > > SigPnd: 0000000000040000 > > SIGSTOP is pending. So why does it sleep? > > because, > > > SigBlk: fffffffffffffef9 > > someone blocked SIGSTOP. Only the kernel can block it > but nobody should do this, this is always wrong. > > Please show us /proc/`pidof firefox`/stack? > I didn't get the hang for quite some so I'm afraid I can't get one till I encouter it again. Should I try using older kernel to increase chances of getting one? > > OK... 0xef9 leaves SIGINT + SIGQUIT + SIGKILL ublocked. > grep-grep-grep... fs/ncpfs/sock.c does exactly this, > and fs/autofs4/waitq.c seems to do something similar. > > Do you use them? I don't use any of these. (In reply to comment #13) > > > Please show us /proc/`pidof firefox`/stack? > > > > I didn't get the hang for quite some so I'm afraid I can't get one till I > encouter it again. > > Should I try using older kernel to increase chances of getting one? Not sure this will help, but any additional info is very much appreciated ;) > > OK... 0xef9 leaves SIGINT + SIGQUIT + SIGKILL ublocked. > > grep-grep-grep... fs/ncpfs/sock.c does exactly this, > > and fs/autofs4/waitq.c seems to do something similar. > > > > Do you use them? > > I don't use any of these. OK, thanks. I _think_ I got a reliable FF hang reproducer - on current RHEL 6:
1) have autofs up'n'running
2) make symlink to some automounted NFS directory:
ln -s /net/server/path/to/dir /mnt/server_dir
3) in firefox, save some page to /mnt/server_dir/path
4) kill connectivity to the server (kill VPN, iptables -I OUTPUT -d server -j DROP, ...)
5) try to save another page <-- autofs also does something wrong,
firefox freezes
6) attach with gdb to firefox: gdb -pid $(pidof firefox)
gdb will also hang
so I actually _do_ use autofs4, I just didn't realize it when writing #c13. :(
(and there is something rotten in user-space part of autofs as I was getting gdb reports about it few months ago)
(writing all of this from VM so all info inline below)
======================================================
# cat /proc/$(pidof firefox)/stack
[<ffffffffa06bf831>] autofs4_wait+0x311/0x900 [autofs4]
[<ffffffffa06be492>] autofs4_d_automount+0x232/0x2f0 [autofs4]
[<ffffffff81189a39>] follow_managed+0x219/0x2d0
[<ffffffff81189b8f>] do_lookup+0x9f/0x230
[<ffffffff8118a02d>] __link_path_walk+0x20d/0x1030
[<ffffffff8118abb7>] __link_path_walk+0xd97/0x1030
[<ffffffff8118b0da>] path_walk+0x6a/0xe0
[<ffffffff8118b2ab>] do_path_lookup+0x5b/0xa0
[<ffffffff8118bf17>] user_path_at+0x57/0xa0
[<ffffffff81179b90>] sys_faccessat+0xd0/0x1d0
[<ffffffff81179ca8>] sys_access+0x18/0x20
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/$(pidof firefox)/status
Name: firefox
State: S (sleeping)
Tgid: 1426
Pid: 1426
PPid: 1
TracerPid: 5349
Uid: 501 501 501 501
Gid: 100 100 100 100
Utrace: 1f0f
FDSize: 256
Groups: 0 10 36 100 484 486 498
VmPeak: 1442072 kB
VmSize: 1380240 kB
VmLck: 0 kB
VmHWM: 636296 kB
VmRSS: 563144 kB
VmData: 853932 kB
VmStk: 512 kB
VmExe: 64 kB
VmLib: 72236 kB
VmPTE: 2612 kB
VmSwap: 0 kB
Threads: 20
SigQ: 0/61196
SigPnd: 0000000000040000
ShdPnd: 0000000000000000
SigBlk: fffffffffffffef9
SigIgn: 0000000000001000
SigCgt: 00000001800044af
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 8686227
nonvoluntary_ctxt_switches: 556257
# cat /proc/$(pidof gdb)/stack
[<ffffffff8106fb24>] do_wait+0x1e4/0x240
[<ffffffff8106fc23>] sys_wait4+0xa3/0x100
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# cat /proc/$(pidof gdb)/status
Name: gdb
State: S (sleeping)
Tgid: 5349
Pid: 5349
PPid: 3336
TracerPid: 0
Uid: 501 501 501 501
Gid: 100 100 100 100
Utrace: 0
FDSize: 256
Groups: 0 10 36 100 484 486 498
VmPeak: 131440 kB
VmSize: 131436 kB
VmLck: 0 kB
VmHWM: 6184 kB
VmRSS: 6184 kB
VmData: 2560 kB
VmStk: 104 kB
VmExe: 4288 kB
VmLib: 4528 kB
VmPTE: 136 kB
VmSwap: 0 kB
Threads: 1
SigQ: 0/61196
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000001001000
SigCgt: 0000000188034087
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 58
nonvoluntary_ctxt_switches: 6
GDB backtrace:
Thread 1 (Thread 0x7f88cc288700 (LWP 5349)):
#0 0x0000003a2180f05e in __libc_waitpid (pid=<value optimized out>, stat_loc=0x7fff0aaea36c, options=<value optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:32
#1 0x000000000044c1e5 in my_waitpid (pid=1426, status=0x7fff0aaea36c, flags=0) at ../../gdb/linux-nat.c:426
#2 0x000000000044f243 in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x14f8f98, signalled=0x14f8f9c) at ../../gdb/linux-nat.c:1400
#3 0x000000000044f50b in linux_nat_attach (ops=<value optimized out>, args=<value optimized out>, from_tty=<value optimized out>) at ../../gdb/linux-nat.c:1578
#4 0x0000000000533068 in target_attach (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/target.c:3016
#5 0x00000000004ffb92 in attach_command (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/infcmd.c:2459
#6 0x0000000000513c37 in catch_command_errors (command=0x4ffad0 <attach_command>, arg=0x7fff0aaec46b "1426", from_tty=1, mask=<value optimized out>) at ../../gdb/exceptions.c:534
#7 0x000000000040a7a5 in captured_main (data=<value optimized out>) at ../../gdb/main.c:924
#8 0x0000000000513ccb in catch_errors (func=0x409c00 <captured_main>, func_args=0x7fff0aaea6b0, errstring=0x68d52f "", mask=<value optimized out>) at ../../gdb/exceptions.c:518
#9 0x0000000000409894 in gdb_main (args=<value optimized out>) at ../../gdb/main.c:1016
#10 0x0000000000409869 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../gdb/gdb.c:48
Thread 1 (Thread 0x7f88cc288700 (LWP 5349)):
#0 0x0000003a2180f05e in __libc_waitpid (pid=<value optimized out>, stat_loc=0x7fff0aaea36c, options=<value optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:32
resultvar = 18446744073709551104
oldtype = <value optimized out>
result = <value optimized out>
#1 0x000000000044c1e5 in my_waitpid (pid=1426, status=0x7fff0aaea36c, flags=0) at ../../gdb/linux-nat.c:426
ret = <value optimized out>
#2 0x000000000044f243 in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x14f8f98, signalled=0x14f8f9c) at ../../gdb/linux-nat.c:1400
new_pid = <value optimized out>
pid = 1426
status = <value optimized out>
__PRETTY_FUNCTION__ = "linux_nat_post_attach_wait"
#3 0x000000000044f50b in linux_nat_attach (ops=<value optimized out>, args=<value optimized out>, from_tty=<value optimized out>) at ../../gdb/linux-nat.c:1578
lp = 0x14f8f80
status = <value optimized out>
ptid = {pid = 1426, lwp = 1426, tid = 0}
#4 0x0000000000533068 in target_attach (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/target.c:3016
t = <value optimized out>
#5 0x00000000004ffb92 in attach_command (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/infcmd.c:2459
async_exec = <value optimized out>
back_to = 0x0
#6 0x0000000000513c37 in catch_command_errors (command=0x4ffad0 <attach_command>, arg=0x7fff0aaec46b "1426", from_tty=1, mask=<value optimized out>) at ../../gdb/exceptions.c:534
e = {reason = 0, error = GDB_NO_ERROR, message = 0x0}
#7 0x000000000040a7a5 in captured_main (data=<value optimized out>) at ../../gdb/main.c:924
context = <value optimized out>
argc = 3
argv = 0x7fff0aaea7b8
quiet = 0
set_args = 0
symarg = 0x0
execarg = 0x0
pidarg = <value optimized out>
corearg = 0x0
pid_or_core_arg = <value optimized out>
cdarg = <value optimized out>
ttyarg = <value optimized out>
print_help = 0
print_version = 0
cmdarg = <value optimized out>
cmdsize = <value optimized out>
ncmd = <value optimized out>
dirarg = <value optimized out>
dirsize = <value optimized out>
ndir = <value optimized out>
system_gdbinit = 0x0
home_gdbinit = 0x0
local_gdbinit = <value optimized out>
i = <value optimized out>
save_auto_load = 1
objfile = <value optimized out>
pre_stat_chain = 0x0
#8 0x0000000000513ccb in catch_errors (func=0x409c00 <captured_main>, func_args=0x7fff0aaea6b0, errstring=0x68d52f "", mask=<value optimized out>) at ../../gdb/exceptions.c:518
val = 0
exception = {reason = 0, error = GDB_NO_ERROR, message = 0x0}
#9 0x0000000000409894 in gdb_main (args=<value optimized out>) at ../../gdb/main.c:1016
No locals.
#10 0x0000000000409869 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../gdb/gdb.c:48
args = {argc = 3, argv = 0x7fff0aaea7b8, use_windows = 0, interpreter_p = 0x67c030 "console"}
(In reply to comment #15) ... > (and there is something rotten in user-space part of autofs as I was getting > gdb reports about it few months ago) ... filed as a bug 847873. (In reply to comment #15) > > so I actually _do_ use autofs4, I just didn't realize it when writing #c13. > :( > > ... > # cat /proc/$(pidof firefox)/stack > [<ffffffffa06bf831>] autofs4_wait+0x311/0x900 [autofs4] Great, thanks, this confirms the theory. Now that you opened bug 847873, probably we can close this one? This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. |