Bug 885497 - kvm: spte pointer is 0xffff87fffffffffff in __direct_map
Summary: kvm: spte pointer is 0xffff87fffffffffff in __direct_map
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 17
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Marcelo Tosatti
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-12-09 21:00 UTC by Ferry Huberts
Modified: 2013-03-10 21:38 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-10 21:38:53 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
backtrace photo 2 (1.03 MB, image/jpeg)
2012-12-09 21:01 UTC, Ferry Huberts
no flags Details
backtrace photo 1 (1.08 MB, image/jpeg)
2012-12-09 21:02 UTC, Ferry Huberts
no flags Details
memtest overnight run (112.45 KB, image/jpeg)
2013-02-08 08:21 UTC, Ferry Huberts
no flags Details

Description Ferry Huberts 2012-12-09 21:00:54 UTC
Description of problem:
hard crash due to BUG: unable to handle kernel paging request

Version-Release number of selected component (if applicable):
3.6.9-2.fc17.x86_64

How reproducible:
unknown

Steps to Reproduce:

I am using nested kvm on an Intel(R) Core(TM) i7-2860QM CPU @ 2.50GHz Lenovo W520.
I added a config file under /etc/modprobe.d with
  options kvm_intel nested=1

I have setup oVirt on it:
1- dns VM
    1GB memory
2- oVirt management VM
    4GB memory
3- iscsi target VM
    1GB memory
4- oVirt hypervisor node VM
    4GB memory
    4 CPUs (copied host CPU information), topology: 1 socket, 4 CPUs, 4 threads
5- oVirt hypervisor node VM
    4GB memory
    4 CPUs (copied host CPU information), topology: 1 socket, 4 CPUs, 4 threads

I then setup a VM in oVirt for Fedora 17 x86_64 with a Spice session.
I was installing this (nested) F17 VM (which was running on hypervisor 2; VM 5)
when the crash occurred.

My machine has 16GB memory, so plenty of memory, and enough for all these VMs.
  
Actual results:
crash

Expected results:
no crash, duh ;-)

Additional info:
Will attach the backtrace photos in a moment, sorry for the bad (phone camera)
quality.

Comment 1 Ferry Huberts 2012-12-09 21:01:41 UTC
Created attachment 660425 [details]
backtrace photo 2

Comment 2 Ferry Huberts 2012-12-09 21:02:34 UTC
Created attachment 660426 [details]
backtrace photo 1

Comment 3 Josh Boyer 2013-01-02 14:21:42 UTC
Marcelo, any thoughts on this one?

Comment 4 Marcelo Tosatti 2013-01-03 00:11:43 UTC
To confirm, the hypervisor (level 1 guest) and also level 2 guest are 64 bit?

Comment 5 Ferry Huberts 2013-01-03 01:47:50 UTC
yes

Comment 6 Marcelo Tosatti 2013-01-11 22:01:50 UTC
Hi Ferry,

Two questions:

1) Is the problem reproducible? Can you attempt to reproduce it, please?

2) Can you have an overnight memtest86 run to verify memory?

Thanks

Comment 7 Ferry Huberts 2013-01-15 07:02:18 UTC
Marcello, sorry but I really have no time this week to try to reproduce it.
I thought it way reasonably reproducible for the setup I described.
Maybe next week, then I have some time in the evenings

Comment 8 Marcelo Tosatti 2013-01-16 14:12:43 UTC
Ferry, can you at least run memtest86, please? (its quite easy to do that, no need for setup VMs or anything).

The error is access to invalid shadow page table pointer at 0xffff87ffffffffff. This is -1 with bits 43-46 cleared.

Either the hardware or software are corrupting memory.

Comment 9 Marcelo Tosatti 2013-01-17 20:08:23 UTC
In fact it has to be memory corruption because its not possible for KVM to setup an spte pointer with 0xffff87ffffffffff. It could be a driver, or hardware. You can try

slub_debug=ZFPU 

kernel option to track software induced memory corruption.

Comment 10 Ferry Huberts 2013-02-08 08:21:16 UTC
Created attachment 694984 [details]
memtest overnight run

In reply to comment 8:
I finally ran memtest overnight, see the attachment.

No errors, so it's probably not a hardware problem

Comment 11 Marcelo Tosatti 2013-02-11 20:53:32 UTC
Ok, can add the following to your kernel boot options

slub_debug=ZFPU

And attempt to reproduce?

Hopefully that will catch the corruptor.

Comment 12 Dave Jones 2013-02-12 16:23:58 UTC
note: that slub_debug option will only take effect if you install the kernel-debug package. The regular kernel build doesn't have that enabled.

Comment 13 Marcelo Tosatti 2013-03-10 21:38:53 UTC
Closing bug as insufficient data as this is provably not a KVM bug but memory corruption (see comment #8), corruption caused either by software or hardware.

Please reopen the bug if output from slub_debug enabled kernel-debug kernel package is available for the crash.


Note You need to log in before you can comment on or make changes to this bug.