Bug 555788
Summary: | SIGTRAP leakage between separate virtual machines | |||
---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Tom Horsley <horsley1953> | |
Component: | kvm | Assignee: | Glauber Costa <gcosta> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | |
Severity: | high | Docs Contact: | ||
Priority: | low | |||
Version: | 12 | CC: | berrange, clalance, ehabkost, gcosta, jforbes, markmc, quintela, virt-maint | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 555889 (view as bug list) | Environment: | ||
Last Closed: | 2010-06-08 15:08:41 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 546327 | |||
Bug Blocks: | 555889 | |||
Attachments: |
Description
Tom Horsley
2010-01-15 14:39:47 UTC
Created attachment 384628 [details]
gzipped tar archive with test program to generate address traps
This program consists of a custom "debugger" (watcher) which debugs the
watchme program, using the debug registers to generate zillions of
address traps in multiple threads.
Created attachment 384630 [details]
normal boot on 2nd virtual machine
Here is a screenshot of the virtual machine booting normally. This was the
very first virtual machine booted after a reboot of the host.
Created attachment 384631 [details]
screenshot of another boot showing sigtrap disruption
In this screen shot, another virtual machine is running the test program, and
you can see the sigtrap abort being reported during this boot of the same
vm that had no problem booting before the test program was started.
Created attachment 384633 [details]
sles10i virtual machine xml definition
The sles10i virtual machine was the one I built and ran the test program
on where the debug registers were modified and the address traps generated.
My impression is that the specific virtual machines don't really make any
difference, but I include this for a complete description of the problem.
Created attachment 384635 [details]
the 2nd virtual machine sles10x xml definition
This is the sles10x virtual machine which the screen shots were from. Booting
this machine worked before I ran the test over on sles10i, and failed after
I shut it down, started the test over on sles10i, and tried booting it again.
The kvm host is an 8 core opteron with 8 gig of memory, smolt profile: http://www.smolts.org/client/show/pub_fde3037c-3bfd-4a28-9f73-601d559b6567 >I run a program on one virtual machine which generates gazillions of
>SIGTRAPs by using the DBn debug registers to do address traps.
I meant to say the DRn registers, not DBn.
More data (but I don't know what it means): I've been trying to use the test program to trigger this bug on my home system (a 4 core single chip intel box), and I have not yet seen it fail. The intel system at work, however, has exhibited this symptom by having a kernel build fail with the compiler aborting due to a SIGTRAP, but I haven't explicitly run my test program to see it trigger the failure on an intel box. The two systems I have now seen show this symptom are both dual socket motherboards with two 4 core chips (opterons in one, xeon in the other), so perhaps something about multiple cpu chips makes this more likely. Here's the smolt profile for the dual cpu intel box: http://www.smolts.org/client/show/pub_7cfd66c1-0166-4f15-92ca-739b774c9559 (It is running a somewhat old fedora 11). I have just installed fedora 13 (and grabbed all updates) on the host machine described in the original bug. I then ran the exact same test with the exact same test program and virtual machines hosted on the new fedora 13 host kernel, and I am happy to say that after multiple reboots of the sles10x virtual machine while sels10i was running my test program, I did not see a single spurious SIGTRAP interfere with the sles10x virtual machine. It will be a while before I have lots of results from running my testbeds continuously, but it certainly looks like this problem may be fixed in the latest KVM code. Rpms currently on system for this test are: qemu-kvm-0.12.3-8.fc13.x86_64 qemu-img-0.12.3-8.fc13.x86_64 qemu-system-x86-0.12.3-8.fc13.x86_64 qemu-common-0.12.3-8.fc13.x86_64 gpxe-roms-qemu-1.0.0-1.fc13.noarch kernel-2.6.33.5-112.fc13.x86_64 Right, the fix for this went into the upstream kernel and should be available in an update for the F-12 kernel. |