Red Hat Bugzilla – Bug 217112
PTRACE_SETOPTIONS mysterious behavior
Last modified: 2007-11-30 17:11:49 EST
Description of problem:
Back when ptrace was ptrace (instead of a layer on utrace), there were two
PTRACE_SETOPTIONS function codes. The "old" code was 21 and did almost
nothing useful, the "new" code was 0x4200 and implemented all sorts of
fabulous features for following forks and clones and wot-not.
I notice in fedora core 6, the "old" code of 21 now implements all the
fabulous new features, and the "new" code of 0x4200 doesn't work at all.
Just wondering if this is intentional or an oversight in the new ptrace
layer (certainly the "new" version of setoptions was never documented
anywhere other than the kernel source code).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Try to use ptrace function code 0x4200.
2. Try again with 21
0x4200 no longer works in FC6.
Was hoping for backward compatibility (not that it matters a lot, I long
ago produced an insanely complex test program to probe the ptrace service
call and describe to my debugger how it works on the current kernel.
With remarkable foresight (or luck :-), it already happens to be
checking both versions of PTRACE_SETOPTIONS to see which one works,
so I shouldn't have a problem, just thought someone might want to know).
Since PTRACE_SETOPTIONS is actually documented in the ptrace(2) man page
now, I'm guessing the change is intentional.
Or maybe it isn't intentional. I just noticed this in the <sys/ptrace.h> file:
PTRACE_SETOPTIONS = 0x4200,
That's the definition that no longer works.
Things are more mysterious than I at first assumed. It appears as though the
PTRACE_SETOPTIONS behavior is more random than anything else. I'm attaching a
test program and a transcript of running it several times with different
results each time.
Created attachment 142030 [details]
test-setoptions.c program (c++ source code).
I compiled this program on both FC5 and FC6 (x86_64), and running both versions
on FC6 produces very strange behavior (see next attachment).
Created attachment 142031 [details]
transcript of test runs with commentary
Looks like PTRACE_SETOPTIONS behaves in completely random fashion.
I think I cut too much stuff out of the test program trying to make a smaller
example. I'll be replacing the test later today with one that provides more
info about exactly what is going on.
Created attachment 142076 [details]
improved test-setoptions.c test program
OK. I admit it! I have no idea if this is a bug or not. The new test prog
shows that the source of my confusion is that the fork event message for the
parent is sometimes (randomly) delivered after the initial status for the
child has already shown up. This certainly feels like a bug (since what is
the point in getting the fork event at all if not to warn you that the child
is coming soon so you can do stuff like avoid modifying the parent till you
have control of the child and you are sure of what was in the image that
However, running this on FC5 seems to indicate that the same random behavior
exists there as well (but maybe the latest FC5 kernel also has the utrace
On a RHEL 4 system at work (4 cpus, x86_64 arch), the test program never
once showed the random behavior, so I'm suspecting this really is a bug,
and somehow the random status delivery order got introduced by recent
Even more interesting information: I ran this test program at work on
a Fedora Core 5 machine with the 2.6.18-1.2239.fc5smp kernel (dual
Xeon system hyperthreaded to look like 4 cpus). In a shell script loop
./test-setoptions 2>&1 | fgrep ERR
./test-setoptions new 2>&1 | fgrep ERR
It ran for a second with no errors, then spewed a block of about 7 ERR
lines all at once, then the system was dead :-(. Completely froze up. No
response to keyboard. Had to hit the reset button.
It has always been a race between clone event report in parent and starting
SIGSTOP report in child. It was exceedingly rare in the old implementation,
probably only seen with lots of preemption or really really fast SMP. It is
much less rare now.
Is there in fact any SETOPTIONS issue, or just the report order?
Up until comment #8, I would have said the report order being wrong was
the issue (and I'd still call that a bug - I don't see any value to
stopping the parent at all if it isn't going to stop first),
but in #8 the test program crashed my system when running in a loop
over and over, so that is definitely a bug (though perhaps a hard
one to reproduce).
So far I've only seen the crash on the one machine, but I'm pretty sure the
attached test program is what crashed it.
I just ran the loop from comment #8 again on the same machine and it crashed
again after running for a bit then getting an ERR output line from one of the
test runs. It seems to be a pretty reliable crash on this machine, but I haven't
been able to crash my home system.
The machine that crashes is running Fedora Core 5, kernel 2.6.18-1.2239.fc5smp
with dual Intel(R) XEON(TM) CPU 2.20GHz CPUs (hyperthreading enabled, so it
looks like 4 cpus to the kernel). 1 gig of rambus (bleh :-) memory on a
Super P4DC6 or P4DC6+ (not sure which) motherboard. In two tries it has only
taken about a minute to crash each time. (And the machine normally stays up
forever - it has been very stable under normal use).
The machine that doesn't seem to crash is an AMD Athlon 64 X2 4400+ dual core
cpu, 2 gig of memory, and a BIOSTAR TForce4U socket 939 motherboard. I've got
lots of different boot partitions on it and have tried both 32 and 64 bit
Fedora Core 6 and Core 5, and no crash on this machine with any of those
kernels (at least not in the time I was willing to wait).
Tom, can you retest this? Fedora 5/6 is on kernel 2.6.20 and I can't reproduce
any kind of hang on 2 x dual-core Xeon on Fedora 6 with kernel 2.6.20-1.2962.fc6.
The machine I previously ran this on has just been regenned to Fedora 7,
so I tested it there, and the crash no longer happens (though the out
of sequence parent/child status certainly does which I'd still call
a bug :-).
uname -a gives:
Linux tweety 2.6.21-1.3228.fc7 #1 SMP Tue Jun 12 15:37:31 EDT 2007 i686 i686
Ran a few minutes without crashing and since it only took a few seconds
before, I assume it is working now.
Such ordering has never been guaranteed and it is only luck that you have never
seen it with a vanilla kernel.