Description of problem: I run a program on one virtual machine which generates gazillions of SIGTRAPs by using the DBn debug registers to do address traps. I run a completely separate virtual machine on the same host, and random processes on that 2nd virtual machine get SIGTRAPs as though either the traps are being delivered to the wrong process, or the contents of the debug registers are leaking across virtual machines and causing traps in the wrong process. Version-Release number of selected component (if applicable): qemu-kvm-0.11.0-12.fc12.x86_64 kernel-2.6.31.9-174.fc12.x86_64 How reproducible: Somewhat random, but with the program I'll attach to generate vast numbers of address traps, it does seem to eventually happen every time. Steps to Reproduce: 1. boot one VM, unpack watchme.tar.gz, run make 2. boot another VM 3. watch random processes complain about traps during boot Actual results: leakage across virtual machine Expected results: no leakage across virtual machines Additional info: I only see this behavior on my opteron based host system. I have another host running xeon chips where the sigtraps have never appeared. This may also be a vary very old problem. I was using an ancient version of xen to host my virtual machines on the same hardware previously, and was getting the spurious SIGTRAP problems then as well (which is one the the motives I had to upgrade to shiny new KVM and fedora 12). I'd also point out this could be considered a nasty security problem with one user on one virtual machine being able to disrupt other virtual machines at random.
Created attachment 384628 [details] gzipped tar archive with test program to generate address traps This program consists of a custom "debugger" (watcher) which debugs the watchme program, using the debug registers to generate zillions of address traps in multiple threads.
Created attachment 384630 [details] normal boot on 2nd virtual machine Here is a screenshot of the virtual machine booting normally. This was the very first virtual machine booted after a reboot of the host.
Created attachment 384631 [details] screenshot of another boot showing sigtrap disruption In this screen shot, another virtual machine is running the test program, and you can see the sigtrap abort being reported during this boot of the same vm that had no problem booting before the test program was started.
Created attachment 384633 [details] sles10i virtual machine xml definition The sles10i virtual machine was the one I built and ran the test program on where the debug registers were modified and the address traps generated. My impression is that the specific virtual machines don't really make any difference, but I include this for a complete description of the problem.
Created attachment 384635 [details] the 2nd virtual machine sles10x xml definition This is the sles10x virtual machine which the screen shots were from. Booting this machine worked before I ran the test over on sles10i, and failed after I shut it down, started the test over on sles10i, and tried booting it again.
The kvm host is an 8 core opteron with 8 gig of memory, smolt profile: http://www.smolts.org/client/show/pub_fde3037c-3bfd-4a28-9f73-601d559b6567
>I run a program on one virtual machine which generates gazillions of >SIGTRAPs by using the DBn debug registers to do address traps. I meant to say the DRn registers, not DBn.
More data (but I don't know what it means): I've been trying to use the test program to trigger this bug on my home system (a 4 core single chip intel box), and I have not yet seen it fail. The intel system at work, however, has exhibited this symptom by having a kernel build fail with the compiler aborting due to a SIGTRAP, but I haven't explicitly run my test program to see it trigger the failure on an intel box. The two systems I have now seen show this symptom are both dual socket motherboards with two 4 core chips (opterons in one, xeon in the other), so perhaps something about multiple cpu chips makes this more likely. Here's the smolt profile for the dual cpu intel box: http://www.smolts.org/client/show/pub_7cfd66c1-0166-4f15-92ca-739b774c9559 (It is running a somewhat old fedora 11).
I have just installed fedora 13 (and grabbed all updates) on the host machine described in the original bug. I then ran the exact same test with the exact same test program and virtual machines hosted on the new fedora 13 host kernel, and I am happy to say that after multiple reboots of the sles10x virtual machine while sels10i was running my test program, I did not see a single spurious SIGTRAP interfere with the sles10x virtual machine. It will be a while before I have lots of results from running my testbeds continuously, but it certainly looks like this problem may be fixed in the latest KVM code. Rpms currently on system for this test are: qemu-kvm-0.12.3-8.fc13.x86_64 qemu-img-0.12.3-8.fc13.x86_64 qemu-system-x86-0.12.3-8.fc13.x86_64 qemu-common-0.12.3-8.fc13.x86_64 gpxe-roms-qemu-1.0.0-1.fc13.noarch kernel-2.6.33.5-112.fc13.x86_64
Right, the fix for this went into the upstream kernel and should be available in an update for the F-12 kernel.