Bug 483814
| Summary: | kernel BUG at kernel/ptrace.c:1068 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Denys Vlasenko <dvlasenk> | ||||||
| Component: | kernel | Assignee: | Jerome Marchand <jmarchan> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | 5.3 | CC: | anton, dchapman, dhoward, dzickus, jpirko, jplans, ktokunag, ltroan, lwang, roland, tao, tis, vfalico | ||||||
| Target Milestone: | rc | Keywords: | ZStream | ||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2009-09-02 08:08:20 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 487394 | ||||||||
| Attachments: |
|
||||||||
|
Description
Denys Vlasenko
2009-02-03 18:29:59 UTC
Created attachment 330763 [details]
Reproducer for ia64
Recipe: on an ia64 machine:
* run ./mk.sh to compile test_select.c to test_select binary
* run ./run.sh
* watch "starting strace N, pid P..." / "...end of iteration N" messages
* after ~4000 iterations, machine freezes (tested twice on different machines)
NB: tarball includes a new strace binary, not yet released in RHEL.
I saw it on: intel-s6e5132-01.rhts.bos.redhat.com hp-diablo-01.rhts.bos.redhat.com Created attachment 331125 [details]
Reproducer for ia64, this time with correct strace binary (old one had x86 one!)
Un-private-ing comment #3 for dchapman (on-site HP partner engineer). P. Denys, what kernel rev was this seen on? P. I was finally able to reproduce on 2.6.18-130, it took over 6000 iters, I have other systems running the test that are over 8000 iters so evidently this is a rather narrow race condition. I will investigate. Easily reproduced on 2.6.18-128 (63 iters). (In reply to comment #9) > Easily reproduced on 2.6.18-128 (63 iters). Interesting, which system are you using that hits this so quickly? I have not seen it in less than 6000 iters and in bug 478419 they mention it usually taking at least 4000 iters. It would be easier to experiment if I could find the magic system that hits this quickly. Using itanium 2 with 32 cores. AFAICS, as more cpu's you have, as quicker it hits.
I think it's coming form the following code:
utrace.c:760
760 spin_lock(&utrace->lock);
761 if (unlikely(rcu_dereference(target->utrace) != utrace)
762 || unlikely(rcu_dereference(engine->ops)
763 == &dead_engine_ops)) {
764 /*
765 * By the time we got the utrace lock,
766 * it had been reaped or detached already.
767 */
768 spin_unlock(&utrace->lock);
769 utrace = ERR_PTR(-ESRCH);
770 }
And while getting the spinlock, something happens with the utrace. Will examine it further, if any updates appear - I'll write here.
And, veeery strange, but the vmcore's bt shows that it's not exiting from ptrace:1068, but from ptrace:449...
(In reply to comment #11) > Using itanium 2 with 32 cores. AFAICS, as more cpu's you have, as quicker it > hits. That's what I expected also but when I ran on the 64 cpu system back at HP it took 35000 iterations, very odd. > > And, veeery strange, but the vmcore's bt shows that it's not exiting from > ptrace:1068, but from ptrace:449... Ok, not just me. I have had the vmcore point to other places than expected also. however each time it was pointing to a BUG_ON(ret != -ESRCH) I added some printk's to ptrace.c to see what ret was set to (since it is obvioulsy not -ESRCH) I found that it is set to -114 or -EALREADY this comes from ptrace_update which calls utrace_set_flags. It appears -EALREADY is the "default" return value for utrace_set_flags. So... not sure why this is an error, and if it is an error I don't understand why it is something we would panic on. I see there are a few places where utrace_set_flags can do the return, adding some code to figure out which one we are hitting. What appears to be happening is we are trying to do a PTRACE_SINGLESTEP just as the target process exits.
in kernel/utrace.c in utrace_set_flags we have
981 int ret = -EALREADY;
....
1042 if (unlikely(target->exit_state)) {
1043 read_unlock(&tasklist_lock);
1044 spin_unlock(&utrace->lock);
1045 return ret;
1046 }
So, if the target process has exited we return -EALREADY
then in kernel/ptrace.c:
1066 ret = ptrace_update(child, state, flags, 1);
1067 if (ret)
1068 BUG_ON(ret != -ESRCH);
(ptrace_update gets its return value from utrace_set_flags)
So, my guess is this BUG_ON is invalid, unless there is some reason why this state really is invalid. It seems we should just change:
1067 if (ret)
1068 BUG_ON(ret != -ESRCH);
with
1067 if (ret)
1068 return ret;
or something along those lines.
until people who actually understand utrace comment on it, I propose replacing BUG_ON with WARN_ON(ret != ESRCH && ret != EALREADY), and making ret == EALREADY act the same as ret == ESRCH. I have confirmed that this is not ia64 specific. I was able to reproduce this on an x86_64 system running kernel-2.6.18-128.el5 Kernel BUG at kernel/ptrace.c:1068 invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/class CPU 1 Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp sg floppy i2c_i801 i2c_core e1000 ide_cd parport_pc cdrom shpchp parport serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 23377, comm: strace Not tainted 2.6.18-128.el5 #1 RIP: 0010:[<ffffffff800bfa61>] [<ffffffff800bfa61>] ptrace_common+0x146/0x19c RSP: 0018:ffff810004a9bf08 EFLAGS: 00010283 RAX: 00000000ffffff8e RBX: 0000000000000018 RCX: 0000000000000000 RDX: 0000000000000410 RSI: 0000000000000028 RDI: ffffffff803c3e80 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffff81001ca400c0 R11: ffff81001cc76b20 R12: ffff81001a38b840 R13: ffff8100050b6fc0 R14: ffff8100035cb860 R15: 000000000e1e6920 FS: 00002b7991a49250(0000) GS:ffff81001f31a7c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaf420d008 CR3: 000000001c2dc000 CR4: 00000000000006e0 Process strace (pid: 23377, threadinfo ffff810004a9a000, task ffff810003e700c0) Stack: 0000000000000001 0000000000000018 000000000e1e6910 0000000000000000 0000000000000000 ffffffff800c0570 0000000000000000 0000000000000018 000000000e1e6920 ffffffff8005d229 0000000000000000 ffff81001a38b840 Call Trace: [<ffffffff800c0570>] sys_ptrace+0xf0/0x1f7 [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 0f 0b 68 10 b6 29 80 c2 2c 04 eb 3f 48 f7 c1 80 ff ff ff bd RIP [<ffffffff800bfa61>] ptrace_common+0x146/0x19c RSP <ffff810004a9bf08> <0>Kernel panic - not syncing: Fatal exception Also on i386, with -128.el5 kernel. http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6730459 (please see the EXTERNALWATCHDOG.log) Last sub-test or RHTS test '/tools/strace/Regression/478419-threads-stop-when-strace--f-is-executed-on-multi-thread-process' caused that panic. Any updates on this? This is an urgent bug, but I've not seen any patch posted to rhkernel.
By the way, looking at kernel/ptrace.c, -EALREADY is handled in some places like this:
if (unlikely(ret == -EALREADY)) // some are not wrapped with unlikely()
ret = -ESRCH;
I'm not familiar with the ptrace code, but I'd imagine we could use the same lines for this bug case as well? We might also need it for "case PTRACE_SETOPTIONS".
In essence, this approach does the same thing as the Doug's idea unless there is a possibility ptrace_update() returns neither 0, -EALREADY nor -ESRCH.
(In reply to comment #11) > And, veeery strange, but the vmcore's bt shows that it's not exiting from > ptrace:1068, but from ptrace:449... Strange indeed as here -EALREADY is treated like -ESEARCH. Do you know which value was returned. Updating PM score. (In reply to comment #17) > I have confirmed that this is not ia64 specific. I was able to reproduce this > on an x86_64 system running kernel-2.6.18-128.el5 Did you use the same testcase, or something more specific? (In reply to comment #24) > (In reply to comment #17) > > I have confirmed that this is not ia64 specific. I was able to reproduce this > > on an x86_64 system running kernel-2.6.18-128.el5 > > Did you use the same testcase, or something more specific? I used the test case attached here that has the x86_64 version of strace. (In reply to comment #22) > (In reply to comment #11) > > And, veeery strange, but the vmcore's bt shows that it's not exiting from > > ptrace:1068, but from ptrace:449... > Strange indeed as here -EALREADY is treated like -ESEARCH. Do you know which > value was returned. Look at comment #15 where I explain this in detail including showing line numbers and values returned. (In reply to comment #26) > (In reply to comment #22) > > (In reply to comment #11) > > > And, veeery strange, but the vmcore's bt shows that it's not exiting from > > > ptrace:1068, but from ptrace:449... > > Strange indeed as here -EALREADY is treated like -ESEARCH. Do you know which > > value was returned. > > Look at comment #15 where I explain this in detail including showing line > numbers and values returned. That explain the bug at ptrace:1068, not at ptrace:449 where the case -EALREADY is handled correctely. (In reply to comment #25) > (In reply to comment #24) > > (In reply to comment #17) > > > I have confirmed that this is not ia64 specific. I was able to reproduce this > > > on an x86_64 system running kernel-2.6.18-128.el5 > > > > Did you use the same testcase, or something more specific? > > I used the test case attached here that has the x86_64 version of strace. How many cores did you use? Did you start several strace's in paralel? ...end of iteration 679334 And nothing :( on a 2.6.18-128 x86_64. And, btw, did you get SEGFAULT's on strace binary in strace? something like Feb 18 13:46:00 dell-pe1950-03 kernel: test_select[32024]: segfault at 0000000041bb4a08 rip 00000038414074f0 rsp 00007fffbfc36300 error 4 RIP is always the same, and the offset is always ending with a08 (didn't dig into it). Thank you! (In reply to comment #28) > > How many cores did you use? Did you start several strace's in paralel? I just asked rhts to give me a random system and tried it so I don't know the details. The system was nec-em3. I don't recall if I did several test instances in parallel but I have found that reproduces the failure more quickly on ia64 so I may have. > > ...end of iteration 679334 > > And nothing :( on a 2.6.18-128 x86_64. > > And, btw, did you get SEGFAULT's on strace binary in strace? > something like > Feb 18 13:46:00 dell-pe1950-03 kernel: test_select[32024]: segfault at > 0000000041bb4a08 rip 00000038414074f0 rsp 00007fffbfc36300 error 4 > RIP is always the same, and the offset is always ending with a08 (didn't dig > into it). I don't recall seeing this. Sorry for a long and somewhat stupid comment, but it's really hard to describe utrace workflow with my english level...
The bug occurs only when the process is in EXIT_ZOMBIE exit_state. Current implementation of utrace doesn't properly handle it.
Feb 20 20:13:25 dell-pe1950-03 kernel: 1) child->exit_state = 16, request = 24, data = 0
Feb 20 20:13:25 dell-pe1950-03 kernel: 2) child->exit_state = 16, request = 24, data = 0
Feb 20 20:13:25 dell-pe1950-03 kernel: 3) child->exit_state = 16, request = 24, data = 0
Feb 20 20:13:25 dell-pe1950-03 kernel: ZOMBIE detected.
Feb 20 20:13:25 dell-pe1950-03 kernel: set_utrace_flags = 523624, old_utrace_flags = 40, set_utrace_flags &~ old_utrace_flags = 523584, target->exit_state = 16
Feb 20 20:13:25 dell-pe1950-03 kernel: BUG_ON 1068, ret=-114, request=24, EEE=2, exit_state before ptrace_induce_signal = 16, signal number = 0
Feb 21 20:13:25 dell-pe1950-03 kernel:
Feb 21 20:13:25 dell-pe1950-03 kernel: 4) child->exit_state = 16, request = 24, data = 0
request = 24 = PTRACE_SYSCALL
exit_state = 16 = EXIT_ZOMBIE
(set_utrace_flags &~ old_state_flags) & _UTRACE_EVENT_DEATH == 1
where DEATH_EVENTS = (UTRACE_EVENT(DEATH) | UTRACE_EVENT(QUIESCE) = _UTRACE_EVENT_DEATH | _UTRACE_EVENT_QUIESCE
Relevant parts of code:
kernel/utrace.c
973 int EEE;
974 int
975 utrace_set_flags(struct task_struct *target,
...
994 if (unlikely(IS_ERR(utrace))){
995 EEE = 3;
996 return PTR_ERR(utrace);
997 }
...
1004 if (target->exit_state
1005 && (((flags &~ old_flags) & (UTRACE_ACTION_QUIESCE
1006 | UTRACE_ACTION_NOREAP
1007 | DEATH_EVENTS))
1008 || ((utrace->u.exit.flags & EXIT_FLAG_DEATH)
1009 && ((old_flags &~ flags) & DEATH_EVENTS))
1010 || ((utrace->u.exit.flags & EXIT_FLAG_REAP)
1011 && ((old_flags &~ flags) & UTRACE_EVENT(REAP))))) {
1012 spin_unlock(&utrace->lock);
1013 EEE = 1;
1014 return ret;
...
1046 if (unlikely(target->exit_state)) {
1047 read_unlock(&tasklist_lock);
1048 spin_unlock(&utrace->lock);
1049 printk("set_utrace_flags = %lu, old_utrace_flags = %lu, set_utrace_flags &~ old_utrace_flags = %lu, target->exit_state = %ld\n",set_utrace_flags,old_utrace_flags,set_utrace_flags &~ old_utrace_flags,target->exit_state);
1050 EEE = 2;
1051 return ret;
...
1138 EEE = 4;
1139 return ret;
1140 }
kernel/ptrace.c
995
996 extern int EEE;
997
998 static int
999 ptrace_common(long request, struct task_struct *child,
...
1050 if(pis_exit_state == EXIT_ZOMBIE){
1051 printk("ZOMBIE detected.\n");
1052 }
...
1071 if (ret){
1072 printk("BUG_ON 1068, ret=%d, request=%ld, EEE=%d, exit_state before ptrace_induce_signal = %ld, signal number = %ld\n\n",ret,request,EEE,pis_exit_state,data);
1073 /* BUG_ON(ret != -ESRCH); */
1074 }
...
1095 return ret;
1096 }
1097
1098
1099 asmlinkage long sys_ptrace(long request, long pid, long addr, long data)
1100 {
1101 struct task_struct *child = NULL;
1102 struct utrace_attached_engine *engine = NULL;
1103 struct ptrace_state *state = NULL;
1104 long ret, val;
1105
1106 pr_debug("%d sys_ptrace(%ld, %ld, %lx, %lx)\n",
1107 current->pid, request, pid, addr, data);
1108
1109 ret = ptrace_start(pid, request, &child, &engine, &state);
1110 if (ret != -EIO)
1111 goto out;
1112 if(child->exit_state){
1113 printk("1) child->exit_state = %ld, request = %ld, data = %ld\n",child->exit_state, request, data);
1114 }
1115 val = 0;
1116 ret = arch_ptrace(&request, child, engine, addr, data, &val);
1117 if(child->exit_state){
1118 printk("2) child->exit_state = %ld, request = %ld, data = %ld\n",child->exit_state, request, data);
1119 }
...
1128 switch (request) {
1129 default:
1130 if(child->exit_state){
1131 printk("3) child->exit_state = %ld, request = %ld, data = %ld\n",child->exit_state, request, data);
1132 }
1133 ret = ptrace_common(request, child, engine, state, addr, data);
1134 if(child->exit_state){
1135 printk("4) child->exit_state = %ld, request = %ld, data = %ld\n",child->exit_state, request, data);
1136 }
1137 break;
There are also a lot of other small cases, when the process to which we attach is already EXIT_ZOMBIE or EXIT_DEAD, but they are handled smoothly.
So, it happens only when, in sys_ptrace, the process we get is already in EXIT_ZOMBIE state and there is a PTRACE_SYSCALL request.
From the ptrace_common, we already see that a bug was fixed against PTRACE_DETACH event. However, there is a possibility that we ptrace_update from PTRACE_SETOPTIONS and PTRACE_OLDSETOPTIONS, where also the bug will be triggered (ptrace_common -> ptrace_update -> utrace_update -> return -EALREADY on zombie).
So, from how I understand the code, we should also change the BUG_ON from PTRACE_SETOPTIONS and PTRACE_OLDSETOPTIONS. The best option, from my point of view, would be to verify if the process is in EXIT_ZOMBIE state in sys_ptrace, and to call ptrace_update on EXIT_ZOMBIE only when there is PTRACE_KILL request. However, I don't know the code enough to say it for sure.
Could somebody, who knows better the code, affirm it?
ptrace_update has no useful effect when the task is already dead (EXIT_ZOMBIE). in kernel-2.6.18-133.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html |