885497 – kvm: spte pointer is 0xffff87fffffffffff in __direct_map

Bug 885497 - kvm: spte pointer is 0xffff87fffffffffff in __direct_map

Summary: kvm: spte pointer is 0xffff87fffffffffff in __direct_map

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	17
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Marcelo Tosatti
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-12-09 21:00 UTC by Ferry Huberts
Modified:	2013-03-10 21:38 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-03-10 21:38:53 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
backtrace photo 2 (1.03 MB, image/jpeg) 2012-12-09 21:01 UTC, Ferry Huberts	no flags	Details
backtrace photo 1 (1.08 MB, image/jpeg) 2012-12-09 21:02 UTC, Ferry Huberts	no flags	Details
memtest overnight run (112.45 KB, image/jpeg) 2013-02-08 08:21 UTC, Ferry Huberts	no flags	Details
View All

Description Ferry Huberts 2012-12-09 21:00:54 UTC

Description of problem:
hard crash due to BUG: unable to handle kernel paging request

Version-Release number of selected component (if applicable):
3.6.9-2.fc17.x86_64

How reproducible:
unknown

Steps to Reproduce:

I am using nested kvm on an Intel(R) Core(TM) i7-2860QM CPU @ 2.50GHz Lenovo W520.
I added a config file under /etc/modprobe.d with
  options kvm_intel nested=1

I have setup oVirt on it:
1- dns VM
    1GB memory
2- oVirt management VM
    4GB memory
3- iscsi target VM
    1GB memory
4- oVirt hypervisor node VM
    4GB memory
    4 CPUs (copied host CPU information), topology: 1 socket, 4 CPUs, 4 threads
5- oVirt hypervisor node VM
    4GB memory
    4 CPUs (copied host CPU information), topology: 1 socket, 4 CPUs, 4 threads

I then setup a VM in oVirt for Fedora 17 x86_64 with a Spice session.
I was installing this (nested) F17 VM (which was running on hypervisor 2; VM 5)
when the crash occurred.

My machine has 16GB memory, so plenty of memory, and enough for all these VMs.
  
Actual results:
crash

Expected results:
no crash, duh ;-)

Additional info:
Will attach the backtrace photos in a moment, sorry for the bad (phone camera)
quality.

Comment 1 Ferry Huberts 2012-12-09 21:01:41 UTC

Created attachment 660425 [details]
backtrace photo 2

Comment 2 Ferry Huberts 2012-12-09 21:02:34 UTC

Created attachment 660426 [details]
backtrace photo 1

Comment 3 Josh Boyer 2013-01-02 14:21:42 UTC

Marcelo, any thoughts on this one?

Comment 4 Marcelo Tosatti 2013-01-03 00:11:43 UTC

To confirm, the hypervisor (level 1 guest) and also level 2 guest are 64 bit?

Comment 5 Ferry Huberts 2013-01-03 01:47:50 UTC

yes

Comment 6 Marcelo Tosatti 2013-01-11 22:01:50 UTC

Hi Ferry,

Two questions:

1) Is the problem reproducible? Can you attempt to reproduce it, please?

2) Can you have an overnight memtest86 run to verify memory?

Thanks

Comment 7 Ferry Huberts 2013-01-15 07:02:18 UTC

Marcello, sorry but I really have no time this week to try to reproduce it.
I thought it way reasonably reproducible for the setup I described.
Maybe next week, then I have some time in the evenings

Comment 8 Marcelo Tosatti 2013-01-16 14:12:43 UTC

Ferry, can you at least run memtest86, please? (its quite easy to do that, no need for setup VMs or anything).

The error is access to invalid shadow page table pointer at 0xffff87ffffffffff. This is -1 with bits 43-46 cleared.

Either the hardware or software are corrupting memory.

Comment 9 Marcelo Tosatti 2013-01-17 20:08:23 UTC

In fact it has to be memory corruption because its not possible for KVM to setup an spte pointer with 0xffff87ffffffffff. It could be a driver, or hardware. You can try

slub_debug=ZFPU 

kernel option to track software induced memory corruption.

Comment 10 Ferry Huberts 2013-02-08 08:21:16 UTC

Created attachment 694984 [details]
memtest overnight run

In reply to comment 8:
I finally ran memtest overnight, see the attachment.

No errors, so it's probably not a hardware problem

Comment 11 Marcelo Tosatti 2013-02-11 20:53:32 UTC

Ok, can add the following to your kernel boot options

slub_debug=ZFPU

And attempt to reproduce?

Hopefully that will catch the corruptor.

Comment 12 Dave Jones 2013-02-12 16:23:58 UTC

note: that slub_debug option will only take effect if you install the kernel-debug package. The regular kernel build doesn't have that enabled.

Comment 13 Marcelo Tosatti 2013-03-10 21:38:53 UTC

Closing bug as insufficient data as this is provably not a KVM bug but memory corruption (see comment #8), corruption caused either by software or hardware.

Please reopen the bug if output from slub_debug enabled kernel-debug kernel package is available for the crash.

Note You need to log in before you can comment on or make changes to this bug.