Bug 144805
Summary: | New kernel causes unexpected SIGTRAPs when making inferior calls in gdb | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Diego Novillo <dnovillo> |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5 | CC: | bkoz, cagney, ezannoni, ian, jjohnstn, jreiser, pfrields, pinskia, roland, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-10-17 23:04:16 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Diego Novillo
2005-01-11 17:23:51 UTC
Seems to be fine with the rawhide kernel, and my own upstream build. which means its something that was fixed post 2.6.10. Hmm, any ideas which cset(s) may be involved ? ISTR that 2.6.10-ac8 on which the FC3 update is based had some backports of some of the signal work done post 2.6.10. Is it possible theres some bits of that missing ? I am also seeing this problem. This seems to be working better in 2.6.10-1.741_FC3. It takes a bit longer, but I can still reproduce this problem with 2.6.10-1.741_FC3smp after several inferior calls in one session. Still running into this problem after making several inferior calls in one debugging session. 2.6.10-1.760_FC3smp #1 SMP Wed Feb 2 00:29:03 EST2005 i686 i686 i386 GNU/Linux Can you try the rawhide kernel and see if the bug still shows up there? I suspect that we only have a problem with a botched backport, but we should be positive that the upstream code is good before trying to sort that out. (In reply to comment #7) > Can you try the rawhide kernel and see if the bug still shows up there? Sure. URL? Thanks. Diego. *** Bug 149432 has been marked as a duplicate of this bug. *** From bug 149432; Still occures in: Version-Release number of selected component (if applicable): kernel-2.6.10-1.760_FC3smp That kernel is already out of date. Have you tried the current fc3-updates kernel? We also haven't gotten an answer from Diego about whether the rawhide kernel has any problems, which will help us understand if we have a real bug or just a patch merging botch. This is not a machine that I can reboot at will. Several weeks may go by before I get a chance to kill everything I'm doing and start from scratch. I'll try to test the rawhide kernel in the next few days. For the record, the problem does still happen with kernel-2.6.10-1.766_FC3. I tried installing the rawhide kernel. It gives me a slew of warnings. Are these ignorable? $ sudo rpm -ivh kernel-smp-2.6.10-1.1153_FC4.i686.rpm Preparing... ########################################### [100%] 1:kernel-smp ########################################### [100%] WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/sound/pcmcia/pdaudiocf/snd-pdaudiocf.ko needs unknown symbol print_tainted WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/sound/drivers/vx/snd-vx-lib.ko needs unknown symbol print_tainted WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/sound/isa/snd-azt2320.ko needs unknown symbol print_tainted WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/sound/isa/cs423x/snd-cs4231-lib.ko needs unknown symbol print_tainted WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/sound/pci/cs46xx/snd-cs46xx.ko needs unknown symbol print_tainted WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/sound/pci/snd-azt3328.ko needs unknown symbol print_tainted WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/sound/pci/ice1712/snd-ice1724.ko needs unknown symbol print_tainted WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/sound/pci/ice1712/snd-ice1712.ko needs unknown symbol print_tainted WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/sound/pci/korg1212/snd-korg1212.ko needs unknown symbol print_tainted WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/sound/usb/usx2y/snd-usb-usx2y.ko needs unknown symbol print_tainted WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/sound/usb/snd-usb-lib.ko needs unknown symbol print_tainted WARNING: /lib/modules/2.6.10-1.1153_FC4smp/kernel/arch/i386/kernel/cpu/cpufreq/speedstep-smi.ko needs unknown symbol print_tainted [ ..... ] that particular problem should be fixedin 1.1154_FC4 Still at issue with kernel-2.6.11-1.14_FC3. Diego, in between sniffs into kleenex, says kernel-2.6.9-1.681_FC3 works. Accept no substitute! I'm bumping up priority on this sucker. -benjamin Diego, in between sniffs into kleenex, says kernel-2.6.9-1.681_FC3 works. Accept no substitute! I'm bumping up priority on this sucker. -benjamin An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you. I upgrade to Fedora Core 4 a couple of weeks ago, and I have not seen the problem since. I'll report back if it happens again. (In reply to comment #21) > I upgrade to Fedora Core 4 a couple of weeks ago, and I have not seen the > problem since. I'll report back if it happens again. Likewise. I do have other machines running FC3. I'll see if I can reproduce there with the new kernel. I can reproduce this on x86_64 running kernel-2.6.12-1.1398_FC4. The following 8-instruction program just execs itself over and over: -----execve.S #include <asm/unistd.h> /* gcc -o execve -nostartfiles -nostdlib execve.S gdb ./execve run p/x $ps # 0x202 c p/x $ps # 0x302 TF (0x100) set, but should not be */ _start: .globl _start nop; int3 popq %rcx # argc movq (%rsp),%rdi # same filename from argv[0] movq %rsp,%rsi # same argv lea 8(%rsp,%rcx,8),%rdx # same envp movl $__NR_execve,%eax # here we go 'round the mulberry bush, ... syscall -----end of execve.S When run under gdb, the TF trace flag gets set on the 2nd verse. Where did that come from? Also, if the 'int3' is replaced with 'nop', so that there is no reason at all to trap, then there is a trap under gdb anyway. Of course when run from bash without gdb, then the program just spins merrily. Also "strace -f gdb ./execve" then "run\r" spins while spewing one line per execve. So using strace has altered functional behavior in an unexpected way: it "fixed" the bug. $ gdb ./execve # after replacing 'int3'==>'nop', then re-compiling GNU gdb Red Hat Linux (6.3.0.0-1.21rh) ... This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging symbols found) Using host libthread_db library "/lib64/libthread_db.so.1". (gdb) run Starting program: /home/jreiser/execve Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000000004000b2 in _start () (gdb) p/x $ps $1 = 0x302 # TF bit (0x100) set (gdb) x/4i _start 0x4000b0 <_start>: nop 0x4000b1 <_start+1>: nop # there is no 'int3' here! 0x4000b2 <_start+2>: pop %rcx 0x4000b3 <_start+3>: mov (%rsp),%rdi (gdb) ----- -----/proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 47 model name : AMD Athlon(tm) 64 Processor 3200+ stepping : 0 ----- Same problems on i686, kernel-2.6.12-1.1398_FC4. -----execve.S #include <asm/unistd.h> /* gcc -o execve -nostartfiles -nostdlib execve.S */ _start: .globl _start nop; nop popl %ebp # argc movl (%esp),%ebx # same filename from argv[0] movl %esp,%ecx # same argv lea 4(%esp,%ebp,4),%edx # same envp movl $__NR_execve,%eax # here we go 'round the mulberry bush, ... int $0x80 ----- -----/proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 1.60GHz stepping : 4 ----- hmm, this has been around for a while, and seems to still be present. (though rawhide x86-64 now immediately gets a SIGSEGV) This is probably going to get more traction if you bring it up upstream on linux-kernel.org I just double-checked a vanilla kernel, and its present in plain 2.6.13.2 too. Transcribed to Linux kernel mailing list: Message-ID: <433C0F21.8070104> Date: Thu, 29 Sep 2005 08:58:25 -0700 From: John Reiser <jreiser> Subject: ptrace unexpected SIGTRAP (trace bit) on x86, x86_64 kernel 2.6.13.2 This is a mass-update to all currently open Fedora Core 3 kernel bugs. Fedora Core 3 support has transitioned to the Fedora Legacy project. Due to the limited resources of this project, typically only updates for new security issues are released. As this bug isn't security related, it has been migrated to a Fedora Core 4 bug. Please upgrade to this newer release, and test if this bug is still present there. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. Thank you. There was essentially no response on the LKML (Comment #26.) The bad behavior still exists on i686 Fedora Core 5 test 2, kernel-2.6.15-1.1863_FC5. [x86_64 not yet tested.] So: somebody with enough authority could change the Version of this bugzilla report to "fc5test2". (gdb) run Starting program: execve-spin Program received signal SIGTRAP, Trace/breakpoint trap. 0x08048076 in _start () (gdb) p/x $ps $1 = 0x200302 ## Trace bit (0x100) set (gdb) x/5i $pc 0x8048076 <_start+2>: pop %ebp 0x8048077 <_start+3>: mov (%esp),%ebx 0x804807a <_start+6>: mov %esp,%ecx 0x804807c <_start+8>: lea 0x4(%esp,%ebp,4),%edx 0x8048080 <_start+12>: mov $0xb,%eax (gdb) x/5i _start ## shows no 'int3'; SIGTRAP was entirely the kernel's idea. 0x8048074 <_start>: nop 0x8048075 <_start+1>: nop 0x8048076 <_start+2>: pop %ebp This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you. The extraneous Trace bit (bit value 0x100) is still seen under kernel-2.6.15-1.1826.2.10_FC5 on i686; program in Comment #24. The extraneous Trace bit (bit value 0x100) is still seen under kernel-2.6.15-1.1884_FC5 on amd64 (x86_64); program in Comment #23. So, someone with enough privileges: please change the Version to fc5test2. A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you. The testcase of Comment #24 works for me on i686 under kernel-2.6.18-1.2200.fc5; namely, when run under gdb the testcase spins (execve-ing itself over and over) with no SIGTRAP reported by gdb. [gdb address space grows by 8MB/s, but that's a different problem.] I suspect that general work in utrace, and/or the fix for bug #205659 "SIGTRAP cannot be caught" may be related. great. thanks for retesting. |