Bug 808404
Summary: | [abrt] gdb-7.2-51.fc14: dump_core: Process /usr/bin/gdb was killed by signal 6 (SIGABRT) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | David Jaša <djasa> | ||||
Component: | kernel | Assignee: | Oleg Nesterov <onestero> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 6.3 | CC: | anton, djasa, jan.kratochvil, sergiodj | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Unspecified | ||||||
Whiteboard: | abrt_hash:bf3ef9266883a16e9a8b9845671f2d65a0bdf2ed | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 716627 | Environment: | |||||
Last Closed: | 2014-02-02 15:31:27 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
David Jaša
2012-03-30 10:42:46 UTC
gdb hangs after it says it's attaching to the process: Attaching to process 2067 and nothing happens untill I send it SIGTERM from another console. Then it prints: ../../gdb/linux-nat.c:2701: internal-error: stop_wait_callback: Assertion `lp->resumed' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. (In reply to https://bugzilla.redhat.com/show_bug.cgi?id=716627#c7) > Do you have any reproducibility of gdb hang when attaching to anything? > That is for example how to reproduce that "hung firefox". (moving to RHEL bug) I got the hang by using "Save link as" on binary files but I can not get it anymore. Please could you provide me steps what to do in order to gather all necessary info if I hit it again in the future? Best do qemu "savevm" at that time. :-) But you probably do not run in a VM. I do not know, you can at least store /proc/PID/status. I more hoped you have the Firefox hang reproducible. (In reply to comment #3) > Best do qemu "savevm" at that time. :-) But you probably do not run in a VM. I don't, unfortunately. > I more hoped you have the Firefox hang reproducible. It was - at the time I was contributing to the bug. Now, after few reboots and a move to faster networks, I can not reproduce it anymore. Created attachment 576301 [details]
yet another backtrace of gdb hang
In addtion to backtrace, I'm attaching corresponding /proc/$PID/status of both firefox and gdb.
[djasa@dhcp-29-7 ~]$ cat /proc/`pidof firefox`/status
Name: firefox
State: S (sleeping)
Tgid: 3996
Pid: 3996
PPid: 1
TracerPid: 8249
Uid: 501 501 501 501
Gid: 100 100 100 100
Utrace: 1f0f
FDSize: 256
Groups: 0 10 36 100 484 486 498
VmPeak: 1509100 kB
VmSize: 1358480 kB
VmLck: 0 kB
VmHWM: 673400 kB
VmRSS: 386252 kB
VmData: 790012 kB
VmStk: 504 kB
VmExe: 64 kB
VmLib: 76488 kB
VmPTE: 2812 kB
VmSwap: 1568 kB
Threads: 21
SigQ: 0/61197
SigPnd: 0000000000040000
ShdPnd: 0000000000000000
SigBlk: fffffffffffffef9
SigIgn: 0000000000001000
SigCgt: 00000001800144af
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 26193758
nonvoluntary_ctxt_switches: 2363491
[djasa@dhcp-29-7 ~]$ cat /proc/`pidof gdb`/status
Name: gdb
State: S (sleeping)
Tgid: 8249
Pid: 8249
PPid: 3065
TracerPid: 0
Uid: 501 501 501 501
Gid: 100 100 100 100
Utrace: 0
FDSize: 256
Groups: 0 10 36 100 484 486 498
VmPeak: 131440 kB
VmSize: 131412 kB
VmLck: 0 kB
VmHWM: 6160 kB
VmRSS: 6160 kB
VmData: 2544 kB
VmStk: 104 kB
VmExe: 4288 kB
VmLib: 4524 kB
VmPTE: 128 kB
VmSwap: 0 kB
Threads: 1
SigQ: 0/61197
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000001001000
SigCgt: 0000000188034087
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 18
nonvoluntary_ctxt_switches: 12
The GDB hang - lost SIGSTOP notification - happens also on: 3.2.0-0.bpo.1-amd64 (Debian Squeeze backports) (gdb) bt #0 in waitpid () from /lib/libpthread.so.0 #1 in my_waitpid (pid=561, statusp=0x7fff5b79ba8c, flags=0) at linux-nat.c:364 #2 in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x162dbc8, signalled=0x162dbcc) at linux-nat.c:1403 #3 in linux_nat_attach (ops=<optimized out>, args=<optimized out>, from_tty=<optimized out>) at linux-nat.c:1654 #4 in target_attach (args=0x7fff5b79d4d8 "561", from_tty=1) at target.c:3791 #5 in attach_command (args=0x7fff5b79d4d8 "561", from_tty=1) at infcmd.c:2534 This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. I do not have any reproducer but it does not seem to be caused by GDB. The problem is kernel lost SIGSTOP notification after PTRACE_ATTACH. It is up to kernel / Oleg if there is an idea how that can happen or whether kernel has to CANTFIX it. (In reply to comment #5) > In addtion to backtrace, I'm attaching corresponding /proc/$PID/status of > both firefox and gdb. And thanks for this... I agree with Jan, this does look like a kernel problem. But, it doesn't look like ptrace/etc problem afaics. > [djasa@dhcp-29-7 ~]$ cat /proc/`pidof firefox`/status > Name: firefox > State: S (sleeping) TASK_INTERRUPTIBLE > SigPnd: 0000000000040000 SIGSTOP is pending. So why does it sleep? because, > SigBlk: fffffffffffffef9 someone blocked SIGSTOP. Only the kernel can block it but nobody should do this, this is always wrong. Please show us /proc/`pidof firefox`/stack? OK... 0xef9 leaves SIGINT + SIGQUIT + SIGKILL ublocked. grep-grep-grep... fs/ncpfs/sock.c does exactly this, and fs/autofs4/waitq.c seems to do something similar. Do you use them? (In reply to comment #12) > (In reply to comment #5) > > In addtion to backtrace, I'm attaching corresponding /proc/$PID/status of > > both firefox and gdb. > > And thanks for this... > > I agree with Jan, this does look like a kernel problem. > But, it doesn't look like ptrace/etc problem afaics. > > > [djasa@dhcp-29-7 ~]$ cat /proc/`pidof firefox`/status > > Name: firefox > > State: S (sleeping) > > TASK_INTERRUPTIBLE > > > SigPnd: 0000000000040000 > > SIGSTOP is pending. So why does it sleep? > > because, > > > SigBlk: fffffffffffffef9 > > someone blocked SIGSTOP. Only the kernel can block it > but nobody should do this, this is always wrong. > > Please show us /proc/`pidof firefox`/stack? > I didn't get the hang for quite some so I'm afraid I can't get one till I encouter it again. Should I try using older kernel to increase chances of getting one? > > OK... 0xef9 leaves SIGINT + SIGQUIT + SIGKILL ublocked. > grep-grep-grep... fs/ncpfs/sock.c does exactly this, > and fs/autofs4/waitq.c seems to do something similar. > > Do you use them? I don't use any of these. (In reply to comment #13) > > > Please show us /proc/`pidof firefox`/stack? > > > > I didn't get the hang for quite some so I'm afraid I can't get one till I > encouter it again. > > Should I try using older kernel to increase chances of getting one? Not sure this will help, but any additional info is very much appreciated ;) > > OK... 0xef9 leaves SIGINT + SIGQUIT + SIGKILL ublocked. > > grep-grep-grep... fs/ncpfs/sock.c does exactly this, > > and fs/autofs4/waitq.c seems to do something similar. > > > > Do you use them? > > I don't use any of these. OK, thanks. I _think_ I got a reliable FF hang reproducer - on current RHEL 6: 1) have autofs up'n'running 2) make symlink to some automounted NFS directory: ln -s /net/server/path/to/dir /mnt/server_dir 3) in firefox, save some page to /mnt/server_dir/path 4) kill connectivity to the server (kill VPN, iptables -I OUTPUT -d server -j DROP, ...) 5) try to save another page <-- autofs also does something wrong, firefox freezes 6) attach with gdb to firefox: gdb -pid $(pidof firefox) gdb will also hang so I actually _do_ use autofs4, I just didn't realize it when writing #c13. :( (and there is something rotten in user-space part of autofs as I was getting gdb reports about it few months ago) (writing all of this from VM so all info inline below) ====================================================== # cat /proc/$(pidof firefox)/stack [<ffffffffa06bf831>] autofs4_wait+0x311/0x900 [autofs4] [<ffffffffa06be492>] autofs4_d_automount+0x232/0x2f0 [autofs4] [<ffffffff81189a39>] follow_managed+0x219/0x2d0 [<ffffffff81189b8f>] do_lookup+0x9f/0x230 [<ffffffff8118a02d>] __link_path_walk+0x20d/0x1030 [<ffffffff8118abb7>] __link_path_walk+0xd97/0x1030 [<ffffffff8118b0da>] path_walk+0x6a/0xe0 [<ffffffff8118b2ab>] do_path_lookup+0x5b/0xa0 [<ffffffff8118bf17>] user_path_at+0x57/0xa0 [<ffffffff81179b90>] sys_faccessat+0xd0/0x1d0 [<ffffffff81179ca8>] sys_access+0x18/0x20 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff # cat /proc/$(pidof firefox)/status Name: firefox State: S (sleeping) Tgid: 1426 Pid: 1426 PPid: 1 TracerPid: 5349 Uid: 501 501 501 501 Gid: 100 100 100 100 Utrace: 1f0f FDSize: 256 Groups: 0 10 36 100 484 486 498 VmPeak: 1442072 kB VmSize: 1380240 kB VmLck: 0 kB VmHWM: 636296 kB VmRSS: 563144 kB VmData: 853932 kB VmStk: 512 kB VmExe: 64 kB VmLib: 72236 kB VmPTE: 2612 kB VmSwap: 0 kB Threads: 20 SigQ: 0/61196 SigPnd: 0000000000040000 ShdPnd: 0000000000000000 SigBlk: fffffffffffffef9 SigIgn: 0000000000001000 SigCgt: 00000001800044af CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffffffffffff Cpus_allowed: f Cpus_allowed_list: 0-3 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 8686227 nonvoluntary_ctxt_switches: 556257 # cat /proc/$(pidof gdb)/stack [<ffffffff8106fb24>] do_wait+0x1e4/0x240 [<ffffffff8106fc23>] sys_wait4+0xa3/0x100 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff # cat /proc/$(pidof gdb)/status Name: gdb State: S (sleeping) Tgid: 5349 Pid: 5349 PPid: 3336 TracerPid: 0 Uid: 501 501 501 501 Gid: 100 100 100 100 Utrace: 0 FDSize: 256 Groups: 0 10 36 100 484 486 498 VmPeak: 131440 kB VmSize: 131436 kB VmLck: 0 kB VmHWM: 6184 kB VmRSS: 6184 kB VmData: 2560 kB VmStk: 104 kB VmExe: 4288 kB VmLib: 4528 kB VmPTE: 136 kB VmSwap: 0 kB Threads: 1 SigQ: 0/61196 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000001001000 SigCgt: 0000000188034087 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffffffffffff Cpus_allowed: f Cpus_allowed_list: 0-3 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 58 nonvoluntary_ctxt_switches: 6 GDB backtrace: Thread 1 (Thread 0x7f88cc288700 (LWP 5349)): #0 0x0000003a2180f05e in __libc_waitpid (pid=<value optimized out>, stat_loc=0x7fff0aaea36c, options=<value optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:32 #1 0x000000000044c1e5 in my_waitpid (pid=1426, status=0x7fff0aaea36c, flags=0) at ../../gdb/linux-nat.c:426 #2 0x000000000044f243 in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x14f8f98, signalled=0x14f8f9c) at ../../gdb/linux-nat.c:1400 #3 0x000000000044f50b in linux_nat_attach (ops=<value optimized out>, args=<value optimized out>, from_tty=<value optimized out>) at ../../gdb/linux-nat.c:1578 #4 0x0000000000533068 in target_attach (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/target.c:3016 #5 0x00000000004ffb92 in attach_command (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/infcmd.c:2459 #6 0x0000000000513c37 in catch_command_errors (command=0x4ffad0 <attach_command>, arg=0x7fff0aaec46b "1426", from_tty=1, mask=<value optimized out>) at ../../gdb/exceptions.c:534 #7 0x000000000040a7a5 in captured_main (data=<value optimized out>) at ../../gdb/main.c:924 #8 0x0000000000513ccb in catch_errors (func=0x409c00 <captured_main>, func_args=0x7fff0aaea6b0, errstring=0x68d52f "", mask=<value optimized out>) at ../../gdb/exceptions.c:518 #9 0x0000000000409894 in gdb_main (args=<value optimized out>) at ../../gdb/main.c:1016 #10 0x0000000000409869 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../gdb/gdb.c:48 Thread 1 (Thread 0x7f88cc288700 (LWP 5349)): #0 0x0000003a2180f05e in __libc_waitpid (pid=<value optimized out>, stat_loc=0x7fff0aaea36c, options=<value optimized out>) at ../sysdeps/unix/sysv/linux/waitpid.c:32 resultvar = 18446744073709551104 oldtype = <value optimized out> result = <value optimized out> #1 0x000000000044c1e5 in my_waitpid (pid=1426, status=0x7fff0aaea36c, flags=0) at ../../gdb/linux-nat.c:426 ret = <value optimized out> #2 0x000000000044f243 in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x14f8f98, signalled=0x14f8f9c) at ../../gdb/linux-nat.c:1400 new_pid = <value optimized out> pid = 1426 status = <value optimized out> __PRETTY_FUNCTION__ = "linux_nat_post_attach_wait" #3 0x000000000044f50b in linux_nat_attach (ops=<value optimized out>, args=<value optimized out>, from_tty=<value optimized out>) at ../../gdb/linux-nat.c:1578 lp = 0x14f8f80 status = <value optimized out> ptid = {pid = 1426, lwp = 1426, tid = 0} #4 0x0000000000533068 in target_attach (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/target.c:3016 t = <value optimized out> #5 0x00000000004ffb92 in attach_command (args=0x7fff0aaec46b "1426", from_tty=1) at ../../gdb/infcmd.c:2459 async_exec = <value optimized out> back_to = 0x0 #6 0x0000000000513c37 in catch_command_errors (command=0x4ffad0 <attach_command>, arg=0x7fff0aaec46b "1426", from_tty=1, mask=<value optimized out>) at ../../gdb/exceptions.c:534 e = {reason = 0, error = GDB_NO_ERROR, message = 0x0} #7 0x000000000040a7a5 in captured_main (data=<value optimized out>) at ../../gdb/main.c:924 context = <value optimized out> argc = 3 argv = 0x7fff0aaea7b8 quiet = 0 set_args = 0 symarg = 0x0 execarg = 0x0 pidarg = <value optimized out> corearg = 0x0 pid_or_core_arg = <value optimized out> cdarg = <value optimized out> ttyarg = <value optimized out> print_help = 0 print_version = 0 cmdarg = <value optimized out> cmdsize = <value optimized out> ncmd = <value optimized out> dirarg = <value optimized out> dirsize = <value optimized out> ndir = <value optimized out> system_gdbinit = 0x0 home_gdbinit = 0x0 local_gdbinit = <value optimized out> i = <value optimized out> save_auto_load = 1 objfile = <value optimized out> pre_stat_chain = 0x0 #8 0x0000000000513ccb in catch_errors (func=0x409c00 <captured_main>, func_args=0x7fff0aaea6b0, errstring=0x68d52f "", mask=<value optimized out>) at ../../gdb/exceptions.c:518 val = 0 exception = {reason = 0, error = GDB_NO_ERROR, message = 0x0} #9 0x0000000000409894 in gdb_main (args=<value optimized out>) at ../../gdb/main.c:1016 No locals. #10 0x0000000000409869 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../gdb/gdb.c:48 args = {argc = 3, argv = 0x7fff0aaea7b8, use_windows = 0, interpreter_p = 0x67c030 "console"} (In reply to comment #15) ... > (and there is something rotten in user-space part of autofs as I was getting > gdb reports about it few months ago) ... filed as a bug 847873. (In reply to comment #15) > > so I actually _do_ use autofs4, I just didn't realize it when writing #c13. > :( > > ... > # cat /proc/$(pidof firefox)/stack > [<ffffffffa06bf831>] autofs4_wait+0x311/0x900 [autofs4] Great, thanks, this confirms the theory. Now that you opened bug 847873, probably we can close this one? This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. |