Bug 1008401 - win 7 guests BSOD: 0x50: PAGE_FAULT_IN_NONPAGED_AREA
Summary: win 7 guests BSOD: 0x50: PAGE_FAULT_IN_NONPAGED_AREA
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Vadim Rozenfeld
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-16 10:10 UTC by Chao Yang
Modified: 2015-01-08 07:12 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-08 07:12:18 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Chao Yang 2013-09-16 10:10:57 UTC
Description of problem:
I have many VMs running on my Dell R415 system which is managed by rhevm, one of VMs which is windows 7 x86_64 guest was running CPU_burn-in. I hit BSOD when connecting to guest with remote-viewer. I noticed it happened while trying to adjust screen. 
Windows has finished memory dump, but I did't find its location under C:\Windows. But I am sure I have setup memory dump correctly, cause I managed to trigger a BSOD before hitting this.

Version-Release number of selected component (if applicable):
2.6.32-414.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.398.el6.x86_64
qxl driver version in use:
Driver Date: 10/15/2012
Driver Version: 6.1.0.10016

How reproducible:
1/1

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Chao Yang 2013-09-16 10:15:52 UTC
# free -m
             total       used       free     shared    buffers     cached
Mem:         32077      31860        217          0        138      15007
-/+ buffers/cache:      16714      15363
Swap:        16111       1134      14977

qemu-kvm cli:

/usr/libexec/qemu-kvm -name win7_64_amd-2 -S -M rhel6.5.0 -cpu Opteron_G4 -enable-kvm -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid ca5ee619-caf0-4dbc-b08a-47211176f9e1 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.5.0.0.el6,serial=4C4C4544-0039-3610-8047-B9C04F463358,uuid=ca5ee619-caf0-4dbc-b08a-47211176f9e1 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7_64_amd-2.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2013-09-05T14:36:33,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/run/vdsm/payload/ca5ee619-caf0-4dbc-b08a-47211176f9e1.207b9d224b399cb56d8484c9b007a63c.img,if=none,id=drive-fdc0-0-0,readonly=on,format=raw,serial= -global isa-fdc.driveA=drive-fdc0-0-0 -drive file=/rhev/data-center/mnt/hp-z800-03.qe.lab.eng.nay.redhat.com:_var_lib_exports_iso/00f85ffd-5e9e-4e2b-8bc7-a903586aac8b/images/11111111-1111-1111-1111-111111111111/RHEV-toolsSetup_3.2_13.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/654da72f-84a4-4471-b50c-fe7868b0830f/8078a8df-ceca-4af7-8354-41220fc90680/images/ca4c0aec-d97f-48de-89a0-41d7e9b01f8a/0bb4605b-246f-432e-9cd2-10491983d682,if=none,id=drive-virtio-disk0,format=qcow2,serial=ca4c0aec-d97f-48de-89a0-41d7e9b01f8a,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=33,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:1a:4a:42:48:c4,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/win7_64_amd-2.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/win7_64_amd-2.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5906,tls-port=5907,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7

Comment 3 Vadim Rozenfeld 2013-09-16 23:36:33 UTC
Unfortunately, we cannot step forward without crashdump file.
By default, when a crash occur, a minidump is created under %systemroot%\minidumps like c:\windows\minidumps. If your system is configured to generate a full dump then MEMORY.DMP file will be created under %systemroot%\

Vadim.

Comment 4 Ronen Hod 2013-09-24 13:23:01 UTC
BTW, since this test's objective is to heat the CPU, can you also get the temperatures of the CPUs (# sensors). It might even be a hardware failure (not that I think so).

Comment 5 Thomas Manninger 2014-01-23 12:27:50 UTC
I have also a problem with a windows 7 machine, which use not the qemu cpu. This machine freeze every day with a PAGE_FAULT_IN_NONPAGED_AREA bluescreen.

/usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 5940 -smp 2,sockets=2,cores=1,threads=1 -name s-vwin06 -uuid 2d7abd1f-3840-d82a-cdce-33960adcf8f3 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/s-vwin06.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/vms/s-vwin06,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:8f:07:d8,bus=pci.0,addr=0x3 -device usb-tablet,id=input0 -vnc 0.0.0.0:5 -k de -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

root@srv:~# /usr/bin/kvm -version
QEMU emulator version 1.1.2 (qemu-kvm-1.1.2+dfsg-6.28.201307262155, Debian), Copyright (c) 2003-2008 Fabrice Bellard

minidump is attached.

Comment 6 Thomas Manninger 2014-01-23 12:47:25 UTC
(In reply to Thomas Manninger from comment #5)
> I have also a problem with a windows 7 machine, which use not the qemu cpu.
> This machine freeze every day with a PAGE_FAULT_IN_NONPAGED_AREA bluescreen.
> 
> /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 5940 -smp
> 2,sockets=2,cores=1,threads=1 -name s-vwin06 -uuid
> 2d7abd1f-3840-d82a-cdce-33960adcf8f3 -nodefconfig -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/s-vwin06.monitor,server,
> nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
> -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
> file=/vms/s-vwin06,if=none,id=drive-ide0-0-0,format=raw,cache=none -device
> ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 -netdev
> tap,fd=20,id=hostnet0,vhost=on,vhostfd=26 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:8f:07:d8,bus=pci.0,
> addr=0x3 -device usb-tablet,id=input0 -vnc 0.0.0.0:5 -k de -vga cirrus
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
> 
> root@srv:~# /usr/bin/kvm -version
> QEMU emulator version 1.1.2 (qemu-kvm-1.1.2+dfsg-6.28.201307262155, Debian),
> Copyright (c) 2003-2008 Fabrice Bellard
> 
> minidump is attached.

Sorry, the right kvm command is:
/usr/bin/kvm -S -M pc-0.14 -cpu host -enable-kvm -m 21197 -smp 7,sockets=1,cores=4,threads=2 -name win7machine -uuid cda792cc-7f72-38b9-5395-62b96ef22dbb -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7machine.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/vms/win7machine,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:15:2a:ef,bus=pci.0,addr=0x3 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k de -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

minidumps:
http://ftp.siedl.net/Transfer/MEMORY%20(2).DMP.tar.gz
http://ftp.siedl.net/Transfer/MEMORY.DMP.tar.gz

Comment 15 Chao Yang 2014-09-17 01:57:08 UTC
Hi Jun,

Please provide above info. Thanks.

Comment 16 Jun Li 2014-09-22 09:43:26 UTC
(In reply to Chao Yang from comment #15)
> Hi Jun,
> 
> Please provide above info. Thanks.

Retest this bz:
Version of some components:
Guest:
windows-7-x86_64
Host and qemu-kvm:
# rpm -qa|grep qemu-kvm && uname -r
qemu-kvm-rhev-tools-0.12.1.2-2.445.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.445.el6.x86_64
2.6.32-502.el6.x86_64


Steps:
1, boot a windows 7 x86_64 guest with following cli:
# /usr/libexec/qemu-kvm -name win7_64_amd-2 -S -M rhel6.5.0 -cpu Opteron_G4 -enable-kvm -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid ca5ee619-caf0-4dbc-b08a-47211176f9e1 -smbios type=1,manufacturer=Red_Hat,product=RHEV_Hypervisor,version=6Server-6.5.0.0.el6,serial=4C4C4544-0039-3610-8047-B9C04F463358,uuid=ca5ee619-caf0-4dbc-b08a-47211176f9e1 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7_64_amd-2.monitor,server,nowait -mon chardev=charmonitor,id=monitor1,mode=control -rtc base=2013-09-05T14:36:33,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/run/vdsm/payload/ca5ee619-caf0-4dbc-b08a-47211176f9e1.207b9d224b399cb56d8484c9b007a63c.img,if=none,id=drive-fdc0-0-0,readonly=off,format=raw,serial= -global isa-fdc.driveA=drive-fdc0-0-0 -drive file=/rhev/data-center/mnt/hp-z800-03.qe.lab.eng.nay.redhat.com:_var_lib_exports_iso/00f85ffd-5e9e-4e2b-8bc7-a903586aac8b/images/11111111-1111-1111-1111-111111111111/RHEV-toolsSetup_3.2_13.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=off,format=raw,serial= -device virtio-blk-pci,bus=pci.0,drive=drive-ide0-1-0,id=ide0-1-0,physical_block_size=4096,logical_block_size=4096 -drive file=/home/win7-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=ca4c0aec-d97f-48de-89a0-41d7e9b01f8a,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,script=/etc/qemu-ifup,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:1a:4a:42:48:c4,bus=pci.0,addr=0x8 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/win7_64_amd-2.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/win7_64_amd-2.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5906,disable-ticketing -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -monitor stdio -boot menu=on

2, run 8 CPU_burn-in inside guest for 72 hours.


Results:
After step 2, guest and host and qemu-kvm are all work well. No BSOD.

Comment 17 Vadim Rozenfeld 2014-09-22 11:37:54 UTC
(In reply to Jun Li from comment #16)
> 2, run 8 CPU_burn-in inside guest for 72 hours.
> 
> 
> Results:
> After step 2, guest and host and qemu-kvm are all work well. No BSOD.

Thanks,
Can we close it?

Best regards,
Vadim.

Comment 18 Chao Yang 2014-09-23 02:18:41 UTC
Jun, 

Please also offer the version of virtio-win driver in your test so that we can claim it works fine on explicit packages then close this bug.

Comment 19 Jun Li 2014-09-24 05:14:26 UTC
(In reply to Chao Yang from comment #18)
> Jun, 
> 
> Please also offer the version of virtio-win driver in your test so that we
> can claim it works fine on explicit packages then close this bug.

Version of components:
virtio-win-prewhql-0.1-71

Comment 20 paul.leveille 2014-10-09 20:34:33 UTC
I'm not sure how Thomas determined his attached Windows crashes were the same as Chao's since Chao did not find a core dump, but I have seen 5 additional cases over the past several months of a crash matching exactly with Thomas' samples.

Here is what I know:
- running Centos-6.5 host
- seen only on w2k8-r2 and win7 (64-bit) guests
- have occurred unpredictably (one time only one each of these 5 systems)
- all 5 cases (7 if you include Thomas') have occurred on IvyBridge systems
- has not been seen on SandyBridge (we have many such systems running similar loads)

I'm now actively trying to reproduce this problem and will update you on any progress I make.

The top-level signature is a crash where Windows reports a page-fault while trying to fetch an instruction from the middle of a page it was very recently executing from. A call occurs out of this page and when it returns (ret) the fetch fails with a page fault. See sample below.

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except,
it must be protected by a Probe.  Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: fffff9600023c3e4, memory referenced.
Arg2: 0000000000000008, value 0 = read operation, 1 = write operation.
Arg3: fffff9600023c3e4, If non-zero, the instruction address which referenced
the bad memory address.
Arg4: 0000000000000007, (reserved)
…
win32k!GreAcquireSemaphore:
fffff960`0023c3c8 4885c9          test    rcx,rcx
fffff960`0023c3cb 741c            je      win32k!GreAcquireSemaphore+0x21
(fffff960`0023c3e9)
fffff960`0023c3cd 53              push    rbx
fffff960`0023c3ce 4883ec20        sub     rsp,20h
fffff960`0023c3d2 488bd9          mov     rbx,rcx
fffff960`0023c3d5 ff15a5c01100    call    qword ptr
[win32k!_imp_PsEnterPriorityRegion (fffff960`00358480)]
fffff960`0023c3db 488bcb          mov     rcx,rbx
fffff960`0023c3de ff1554bd1100    call    qword ptr
[win32k!_imp_ExEnterCriticalRegionAndAcquireResourceExclusive
(fffff960`00358138)]
fffff960`0023c3e4 4883c420        add     rsp,20h   <<<<<<<<<<<<<< could not
execute this
fffff960`0023c3e8 5b              pop     rbx
fffff960`0023c3e9 c3              ret
…
2: kd> !pte fffff960`0023c3e4
                                           VA fffff9600023c3e4
PXE at FFFFF6FB7DBEDF90    PPE at FFFFF6FB7DBF2C00    PDE at FFFFF6FB7E580008  
 PTE at FFFFF6FCB00011E0
contains 0000000110932863  contains 0000000110983863  contains 0000000110781863
 contains 02400001111CE021
pfn 110932    ---DA--KWEV  pfn 110983    ---DA--KWEV  pfn 110781    ---DA--KWEV
 pfn 1111ce    ----A—KREV

Comment 22 paul.leveille 2014-11-17 22:15:56 UTC
We've been unable to reproduce this problem after a few weeks of IvyBridge-based testing, which is not too surprising given the spotty nature of the observed crashes. During this period we also noticed some fairly important looking microcode updates from Intel on this processor type. As a precaution, and with some hope, those microcode updates have been added to a customer site where this problem was seen 15 times in a 3-4 month period. After 30+ days we have seen no cases (with microcode in place) and will continue to monitor, expecting 60+ days before we have reasonable confidence the microcode is working better.

The microcode updates, which bring the processor up to version 0x428 with errata fixes for CA1, CA22, CA29, CA30, CA38, CA48, CA90, CA94, CA95, CA104, CA119, CA122, CA129, CA130, CA135, CA137, CA143, is described here:

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v2-spec-update.pdf

Comment 23 paul.leveille 2015-01-07 16:43:57 UTC
No new cases of this 'bogus page-fault' have occurred since the newer Ivy-Bridge microcode was put into place. We now believe this has resolved the problem.

Comment 24 Vadim Rozenfeld 2015-01-07 20:18:17 UTC
(In reply to paul.leveille from comment #23)
> No new cases of this 'bogus page-fault' have occurred since the newer
> Ivy-Bridge microcode was put into place. We now believe this has resolved
> the problem.

Thank you for keeping us updated, Paul.

Chao, are you okay with closing this bug?

Cheers,
Vadim.

Comment 25 Chao Yang 2015-01-08 05:42:22 UTC
(In reply to Vadim Rozenfeld from comment #24)
> (In reply to paul.leveille from comment #23)
> > No new cases of this 'bogus page-fault' have occurred since the newer
> > Ivy-Bridge microcode was put into place. We now believe this has resolved
> > the problem.
> 
> Thank you for keeping us updated, Paul.
> 
> Chao, are you okay with closing this bug?
> 
Yes, please go ahead.

> Cheers,
> Vadim.


Note You need to log in before you can comment on or make changes to this bug.