Bug 217112
Summary: | PTRACE_SETOPTIONS mysterious behavior | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Tom Horsley <horsley1953> | ||||||||
Component: | kernel | Assignee: | Roland McGrath <roland> | ||||||||
Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 6 | CC: | bugsy, davej, wtogami | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-09-26 11:08:39 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Tom Horsley
2006-11-24 03:46:12 UTC
Or maybe it isn't intentional. I just noticed this in the <sys/ptrace.h> file: PTRACE_SETOPTIONS = 0x4200, That's the definition that no longer works. Things are more mysterious than I at first assumed. It appears as though the PTRACE_SETOPTIONS behavior is more random than anything else. I'm attaching a test program and a transcript of running it several times with different results each time. Created attachment 142030 [details]
test-setoptions.c program (c++ source code).
I compiled this program on both FC5 and FC6 (x86_64), and running both versions
on FC6 produces very strange behavior (see next attachment).
Created attachment 142031 [details]
transcript of test runs with commentary
Looks like PTRACE_SETOPTIONS behaves in completely random fashion.
I think I cut too much stuff out of the test program trying to make a smaller example. I'll be replacing the test later today with one that provides more info about exactly what is going on. Created attachment 142076 [details]
improved test-setoptions.c test program
OK. I admit it! I have no idea if this is a bug or not. The new test prog
shows that the source of my confusion is that the fork event message for the
parent is sometimes (randomly) delivered after the initial status for the
child has already shown up. This certainly feels like a bug (since what is
the point in getting the fork event at all if not to warn you that the child
is coming soon so you can do stuff like avoid modifying the parent till you
have control of the child and you are sure of what was in the image that
forked).
However, running this on FC5 seems to indicate that the same random behavior
exists there as well (but maybe the latest FC5 kernel also has the utrace
code?)
On a RHEL 4 system at work (4 cpus, x86_64 arch), the test program never once showed the random behavior, so I'm suspecting this really is a bug, and somehow the random status delivery order got introduced by recent ptrace changes. Even more interesting information: I ran this test program at work on a Fedora Core 5 machine with the 2.6.18-1.2239.fc5smp kernel (dual Xeon system hyperthreaded to look like 4 cpus). In a shell script loop like so: while true do ./test-setoptions 2>&1 | fgrep ERR ./test-setoptions new 2>&1 | fgrep ERR done It ran for a second with no errors, then spewed a block of about 7 ERR lines all at once, then the system was dead :-(. Completely froze up. No response to keyboard. Had to hit the reset button. It has always been a race between clone event report in parent and starting SIGSTOP report in child. It was exceedingly rare in the old implementation, probably only seen with lots of preemption or really really fast SMP. It is much less rare now. Is there in fact any SETOPTIONS issue, or just the report order? Up until comment #8, I would have said the report order being wrong was the issue (and I'd still call that a bug - I don't see any value to stopping the parent at all if it isn't going to stop first), but in #8 the test program crashed my system when running in a loop over and over, so that is definitely a bug (though perhaps a hard one to reproduce). So far I've only seen the crash on the one machine, but I'm pretty sure the attached test program is what crashed it. I just ran the loop from comment #8 again on the same machine and it crashed again after running for a bit then getting an ERR output line from one of the test runs. It seems to be a pretty reliable crash on this machine, but I haven't been able to crash my home system. The machine that crashes is running Fedora Core 5, kernel 2.6.18-1.2239.fc5smp with dual Intel(R) XEON(TM) CPU 2.20GHz CPUs (hyperthreading enabled, so it looks like 4 cpus to the kernel). 1 gig of rambus (bleh :-) memory on a Super P4DC6 or P4DC6+ (not sure which) motherboard. In two tries it has only taken about a minute to crash each time. (And the machine normally stays up forever - it has been very stable under normal use). The machine that doesn't seem to crash is an AMD Athlon 64 X2 4400+ dual core cpu, 2 gig of memory, and a BIOSTAR TForce4U socket 939 motherboard. I've got lots of different boot partitions on it and have tried both 32 and 64 bit Fedora Core 6 and Core 5, and no crash on this machine with any of those kernels (at least not in the time I was willing to wait). Tom, can you retest this? Fedora 5/6 is on kernel 2.6.20 and I can't reproduce any kind of hang on 2 x dual-core Xeon on Fedora 6 with kernel 2.6.20-1.2962.fc6. The machine I previously ran this on has just been regenned to Fedora 7, so I tested it there, and the crash no longer happens (though the out of sequence parent/child status certainly does which I'd still call a bug :-). uname -a gives: Linux tweety 2.6.21-1.3228.fc7 #1 SMP Tue Jun 12 15:37:31 EDT 2007 i686 i686 i386 GNU/Linux Ran a few minutes without crashing and since it only took a few seconds before, I assume it is working now. Such ordering has never been guaranteed and it is only luck that you have never seen it with a vanilla kernel. |