Description of problem:
I run a program on one virtual machine which generates gazillions of
SIGTRAPs by using the DBn debug registers to do address traps.
I run a completely separate virtual machine on the same host, and random
processes on that 2nd virtual machine get SIGTRAPs as though either the
traps are being delivered to the wrong process, or the contents of the
debug registers are leaking across virtual machines and causing traps
in the wrong process.
Version-Release number of selected component (if applicable):
Somewhat random, but with the program I'll attach to generate vast
numbers of address traps, it does seem to eventually happen every time.
Steps to Reproduce:
1. boot one VM, unpack watchme.tar.gz, run make
2. boot another VM
3. watch random processes complain about traps during boot
leakage across virtual machine
no leakage across virtual machines
I only see this behavior on my opteron based host system. I have another
host running xeon chips where the sigtraps have never appeared.
This may also be a vary very old problem. I was using an ancient version
of xen to host my virtual machines on the same hardware previously, and
was getting the spurious SIGTRAP problems then as well (which is one the
the motives I had to upgrade to shiny new KVM and fedora 12).
I'd also point out this could be considered a nasty security problem with
one user on one virtual machine being able to disrupt other virtual
machines at random.
Created attachment 384628 [details]
gzipped tar archive with test program to generate address traps
This program consists of a custom "debugger" (watcher) which debugs the
watchme program, using the debug registers to generate zillions of
address traps in multiple threads.
Created attachment 384630 [details]
normal boot on 2nd virtual machine
Here is a screenshot of the virtual machine booting normally. This was the
very first virtual machine booted after a reboot of the host.
Created attachment 384631 [details]
screenshot of another boot showing sigtrap disruption
In this screen shot, another virtual machine is running the test program, and
you can see the sigtrap abort being reported during this boot of the same
vm that had no problem booting before the test program was started.
Created attachment 384633 [details]
sles10i virtual machine xml definition
The sles10i virtual machine was the one I built and ran the test program
on where the debug registers were modified and the address traps generated.
My impression is that the specific virtual machines don't really make any
difference, but I include this for a complete description of the problem.
Created attachment 384635 [details]
the 2nd virtual machine sles10x xml definition
This is the sles10x virtual machine which the screen shots were from. Booting
this machine worked before I ran the test over on sles10i, and failed after
I shut it down, started the test over on sles10i, and tried booting it again.
The kvm host is an 8 core opteron with 8 gig of memory, smolt profile:
>I run a program on one virtual machine which generates gazillions of
>SIGTRAPs by using the DBn debug registers to do address traps.
I meant to say the DRn registers, not DBn.
More data (but I don't know what it means):
I've been trying to use the test program to trigger this bug on my home
system (a 4 core single chip intel box), and I have not yet seen it
The intel system at work, however, has exhibited this symptom by having
a kernel build fail with the compiler aborting due to a SIGTRAP, but I
haven't explicitly run my test program to see it trigger the failure on
an intel box.
The two systems I have now seen show this symptom are both dual socket
motherboards with two 4 core chips (opterons in one, xeon in the other),
so perhaps something about multiple cpu chips makes this more likely.
Here's the smolt profile for the dual cpu intel box:
(It is running a somewhat old fedora 11).
I have just installed fedora 13 (and grabbed all updates) on the host machine
described in the original bug. I then ran the exact same test with the
exact same test program and virtual machines hosted on the new fedora 13
host kernel, and I am happy to say that after multiple reboots of the sles10x
virtual machine while sels10i was running my test program, I did not see
a single spurious SIGTRAP interfere with the sles10x virtual machine.
It will be a while before I have lots of results from running my testbeds
continuously, but it certainly looks like this problem may be fixed in the
latest KVM code. Rpms currently on system for this test are:
Right, the fix for this went into the upstream kernel and should be available in an update for the F-12 kernel.