Description of problem: hard crash due to BUG: unable to handle kernel paging request Version-Release number of selected component (if applicable): 3.6.9-2.fc17.x86_64 How reproducible: unknown Steps to Reproduce: I am using nested kvm on an Intel(R) Core(TM) i7-2860QM CPU @ 2.50GHz Lenovo W520. I added a config file under /etc/modprobe.d with options kvm_intel nested=1 I have setup oVirt on it: 1- dns VM 1GB memory 2- oVirt management VM 4GB memory 3- iscsi target VM 1GB memory 4- oVirt hypervisor node VM 4GB memory 4 CPUs (copied host CPU information), topology: 1 socket, 4 CPUs, 4 threads 5- oVirt hypervisor node VM 4GB memory 4 CPUs (copied host CPU information), topology: 1 socket, 4 CPUs, 4 threads I then setup a VM in oVirt for Fedora 17 x86_64 with a Spice session. I was installing this (nested) F17 VM (which was running on hypervisor 2; VM 5) when the crash occurred. My machine has 16GB memory, so plenty of memory, and enough for all these VMs. Actual results: crash Expected results: no crash, duh ;-) Additional info: Will attach the backtrace photos in a moment, sorry for the bad (phone camera) quality.
Created attachment 660425 [details] backtrace photo 2
Created attachment 660426 [details] backtrace photo 1
Marcelo, any thoughts on this one?
To confirm, the hypervisor (level 1 guest) and also level 2 guest are 64 bit?
yes
Hi Ferry, Two questions: 1) Is the problem reproducible? Can you attempt to reproduce it, please? 2) Can you have an overnight memtest86 run to verify memory? Thanks
Marcello, sorry but I really have no time this week to try to reproduce it. I thought it way reasonably reproducible for the setup I described. Maybe next week, then I have some time in the evenings
Ferry, can you at least run memtest86, please? (its quite easy to do that, no need for setup VMs or anything). The error is access to invalid shadow page table pointer at 0xffff87ffffffffff. This is -1 with bits 43-46 cleared. Either the hardware or software are corrupting memory.
In fact it has to be memory corruption because its not possible for KVM to setup an spte pointer with 0xffff87ffffffffff. It could be a driver, or hardware. You can try slub_debug=ZFPU kernel option to track software induced memory corruption.
Created attachment 694984 [details] memtest overnight run In reply to comment 8: I finally ran memtest overnight, see the attachment. No errors, so it's probably not a hardware problem
Ok, can add the following to your kernel boot options slub_debug=ZFPU And attempt to reproduce? Hopefully that will catch the corruptor.
note: that slub_debug option will only take effect if you install the kernel-debug package. The regular kernel build doesn't have that enabled.
Closing bug as insufficient data as this is provably not a KVM bug but memory corruption (see comment #8), corruption caused either by software or hardware. Please reopen the bug if output from slub_debug enabled kernel-debug kernel package is available for the crash.