Bug 1019666

Summary: windows 2012r2 BSOD while installing intel 82599 driver
Product: Red Hat Enterprise Linux 6 Reporter: Chao Yang <chayang>
Component: qemu-kvmAssignee: Alex Williamson <alex.williamson>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.5CC: acathrow, alex.williamson, bcao, bdas, bsarathy, chayang, dfleytma, flang, hhuang, juzhang, knoel, michen, mkenneth, qzhang, rhod, virt-bugs, virt-maint, vrozenfe, yvugenfi
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-05-13 21:29:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Chao Yang 2013-10-16 08:33:51 UTC
Description of problem:
When testing device assignment, assigned Intel dual port 82599 PFs to windows 2012r2 guest, guest BSOD once I was trying to install its driver from Intel main page.

Version-Release number of selected component (if applicable):
2.6.32-423.el6.x86_64
qemu-kvm-0.12.1.2-2.412.el6.x86_64
virtio-win-prewhql-0.1-72

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
CLI:
/usr/libexec/qemu-kvm -name win8_64_amd-1 -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -realtime mlock=off -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/home/en_windows_server_2012_r2_datacenter_preview_x64_dvd_2358570.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/home/win2012r2.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:48:cd,bus=pci.0 -spice port=5900,disable-ticketing -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0 -monitor stdio -boot menu=on -device pci-assign,host=05:00.0,id=PF-1 -device pci-assign,host=05:00.1,id=PF-2



Microsoft (R) Windows Debugger Version 6.2.9200.20512 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Windows\MEMORY.DMP]
Kernel Bitmap Dump File: Full address space is available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 8 Kernel Version 9431 UP Free x64
Product: Server, suite: TerminalServer DataCenter SingleUserTS
Built by: 9431.0.amd64fre.winmain_bluemp.130615-1214
Machine Name:
Kernel base = 0xfffff801`cfc77000 PsLoadedModuleList = 0xfffff801`cff41990
Debug session time: Wed Oct 16 03:34:11.205 2013 (UTC - 7:00)
System Uptime: 0 days 0:08:10.963
Loading Kernel Symbols
...............................................................
................................................................
.
Loading User Symbols
.............................................
Loading unloaded module list
......
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 1A, {1233, 54445, 1, 0}

*** ERROR: Module load completed but symbols could not be loaded for iqvw64e.sys
*** ERROR: Module load completed but symbols could not be loaded for NcsColib.dll
Probably caused by : iqvw64e.sys ( iqvw64e+29fa )

Followup: MachineOwner
---------

kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

MEMORY_MANAGEMENT (1a)
    # Any other values for parameter 1 must be individually examined.
Arguments:
Arg1: 0000000000001233, The subtype of the bugcheck.
Arg2: 0000000000054445
Arg3: 0000000000000001
Arg4: 0000000000000000

Debugging Details:
------------------


BUGCHECK_STR:  0x1a_1233

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

PROCESS_NAME:  ncs2prov.exe

CURRENT_IRQL:  0

LAST_CONTROL_TRANSFER:  from fffff801cfddbf5e to fffff801cfdc7da0

STACK_TEXT:  
ffffd000`21a597e8 fffff801`cfddbf5e : 00000000`0000001a 00000000`00001233 00000000`00054445 00000000`00000001 : nt!KeBugCheckEx
ffffd000`21a597f0 fffff801`cfc9bc6c : 00000000`00000000 00000000`00000000 e0000127`76908957 00000000`c86a8004 : nt! ?? ::FNODOBFM::`string'+0x391e
ffffd000`21a598f0 fffff800`02a529fa : 00000000`00000000 00000000`80862007 00000000`00cfcd10 00000000`00000001 : nt!MmMapIoSpace+0xc
ffffd000`21a59920 fffff800`02a518cb : 00000000`00000001 ffffd000`21a59cc0 ffffe000`00eeb240 00000000`00cfa1f0 : iqvw64e+0x29fa
ffffd000`21a59950 fffff800`02a511a7 : 00000000`00000003 ffffe000`012776c0 00000000`00000001 ffffe000`00eeb240 : iqvw64e+0x18cb
ffffd000`21a59990 fffff801`d0050bb3 : 00000000`00000001 ffffd000`21a59cc0 00000000`00000001 00000000`00000001 : iqvw64e+0x11a7
ffffd000`21a599c0 fffff801`d0051daa : 00000000`00000001 00000000`00000000 00000000`00000000 00000000`00000000 : nt!IopXxxControlFile+0x8c3
ffffd000`21a59b60 fffff801`cfdd36b3 : 00000000`00000001 00000000`00cfa708 fffff801`cfc5b900 ffffd000`21a59cc0 : nt!NtDeviceIoControlFile+0x56
ffffd000`21a59bd0 00007ffc`34a4b12a : 00007ffc`32192f83 00000000`80862007 00800103`00000000 00000000`9d000895 : nt!KiSystemServiceCopyEnd+0x13
00000000`00cfa0d8 00007ffc`32192f83 : 00000000`80862007 00800103`00000000 00000000`9d000895 00000000`00000895 : ntdll!NtDeviceIoControlFile+0xa
00000000`00cfa0e0 00007ffc`33fe14f0 : 00000000`80862007 00000000`9d000895 00000000`00cfa3b8 00000000`00000000 : KERNELBASE!DeviceIoControl+0x73
00000000`00cfa150 00000000`012620fa : 00000000`00000108 00000000`00000008 00000000`00cfc9e0 00000000`00cfa878 : KERNEL32!DeviceIoControlImplementation+0x74
00000000`00cfa1a0 00000000`00000108 : 00000000`00000008 00000000`00cfc9e0 00000000`00cfa878 00000000`00000000 : NcsColib+0xa20fa
00000000`00cfa1a8 00000000`00000008 : 00000000`00cfc9e0 00000000`00cfa878 00000000`00000000 00000000`00000000 : 0x108
00000000`00cfa1b0 00000000`00cfc9e0 : 00000000`00cfa878 00000000`00000000 00000000`00000000 00000000`00cfa428 : 0x8
00000000`00cfa1b8 00000000`00cfa878 : 00000000`00000000 00000000`00000000 00000000`00cfa428 00000000`00000000 : 0xcfc9e0
00000000`00cfa1c0 00000000`00000000 : 00000000`00000000 00000000`00cfa428 00000000`00000000 00000000`00000028 : 0xcfa878


STACK_COMMAND:  kb

FOLLOWUP_IP: 
iqvw64e+29fa
fffff800`02a529fa 488d0d8f260000  lea     rcx,[iqvw64e+0x5090 (fffff800`02a55090)]

SYMBOL_STACK_INDEX:  3

SYMBOL_NAME:  iqvw64e+29fa

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: iqvw64e

IMAGE_NAME:  iqvw64e.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  508659cf

FAILURE_BUCKET_ID:  0x1a_1233_iqvw64e+29fa

BUCKET_ID:  0x1a_1233_iqvw64e+29fa

Followup: MachineOwner
---------

Comment 1 Chao Yang 2013-10-16 08:41:58 UTC
There is a similar bug - Bug 947791 got fixed on virtio-win-prewhql-0.1-68. But this issue is also reproducible with -68. 

With VFs of Intel dual port 82576 and virtio-win-prewhql-72, this issue is not reproducible.

Comment 9 langfang 2013-10-25 11:13:24 UTC
Test this bug on latest on GA version ,hit same problem(BSOD)

Senarios 1)RHEL6.5

Host: 
kernel-2.6.32-425.el6.x86_64.rpm 
qemu-kvm-0.12.1.2-2.414.el6.x86_64.rpm 

Guest:win2012r2

Senario 2) RHEL6.4-GA

Host:
2.6.32-358.el6.x86_64
qemu-kvm-0.12.1.2-2.355.el6.x86_64

Guest:win2012r2


So this bug is not regression.

Comment 10 Ronen Hod 2013-10-25 23:49:31 UTC
Since it is about device assignment, and it is not really a regression, and we are out of time, we will have to defer it.
I also removed the blocker.
Yan, do you see anything in the dump that can give Alex a hint?

Comment 11 Yvugenfi@redhat.com 2013-10-27 15:56:14 UTC
I cannot download the dump file. Can you please compress it and re-upload?

Thanks.

Comment 12 Chao Yang 2013-10-28 01:23:41 UTC
(In reply to Yan Vugenfirer from comment #11)
> I cannot download the dump file. Can you please compress it and re-upload?
> 
Please retry.

> Thanks.

Comment 15 Yvugenfi@redhat.com 2013-10-29 16:50:14 UTC
The crash is when the driver is trying to map IO range with MmMapIoSpace

http://msdn.microsoft.com/en-us/library/windows/hardware/ff554618(v=vs.85).aspx

According to the parameters of the BSOD:
MEMORY_MANAGEMENT (1a) ( http://msdn.microsoft.com/en-us/library/windows/hardware/ff557391(v=vs.85).aspx )

Arg1: 0000000000001233: A driver tried to map a physical memory page that was not locked. This is illegal because the contents or attributes of the page can change at any time. This is a bug in the code that made the mapping call. Parameter 2 is the page frame number of the physical page that the driver attempted to map.

Arg2: 0000000000054445 - pfn


Looking at the pfn:

kd> !pfn 0000000000054445
    PFN 00054445 at address FFFFFA8000FCCCF0
    flink       00000000  blink / share count 00000000  pteaddress 00000000
    reference count 0000    used entry count  0000      Cached    color 0   Priority 0
    restore pte 00000000  containing page        FFFFFFFFE  Free               



If it helps below windbg trace in order to look at the resource allocations for VFs:

kd> !pcitree
Bus 0x0 (FDO Ext ffffe00000535ae0)
  (d=0,  f=0) 80861237 devext 0xffffe0000052a9d0 devstack 0xffffe0000052a880 0600 Bridge/HOST to PCI
  (d=1,  f=0) 80867000 devext 0xffffe000005291b0 devstack 0xffffe00000529060 0601 Bridge/PCI to ISA
  (d=1,  f=1) 80867010 devext 0xffffe000005299d0 devstack 0xffffe00000529880 0101 Mass Storage Controller/IDE
  (d=1,  f=2) 80867020 devext 0xffffe000005181b0 devstack 0xffffe00000518060 0c03 Serial Bus Controller/USB
  (d=2,  f=0) 1b360100 devext 0xffffe0000056d1b0 devstack 0xffffe0000056d060 0300 Display Controller/VGA
>> Red Hat virtio devices:
  (d=3,  f=0) 1af41001 devext 0xffffe0000056d9d0 devstack 0xffffe0000056d880 0100 Mass Storage Controller/SCSI
  (d=4,  f=0) 1af41000 devext 0xffffe0000056c1b0 devstack 0xffffe0000056c060 0200 Network Controller/Ethernet
  (d=5,  f=0) 1af41003 devext 0xffffe0000056c9d0 devstack 0xffffe0000056c880 0780 Simple Serial Communications Controller/'Other'
  (d=6,  f=0) 1af41002 devext 0xffffe0000056b1b0 devstack 0xffffe0000056b060 0500 Memory Controller/RAM
>> Intel VFs
  (d=7,  f=0) 808610fb devext 0xffffe0000056b9d0 devstack 0xffffe0000056b880 0200 Network Controller/Ethernet
  (d=8,  f=0) 808610fb devext 0xffffe0000056a1b0 devstack 0xffffe0000056a060 0200 Network Controller/Ethernet


Two Intel VFs:

kd> !devext 0xffffe0000056b9d0 pci
PDO Extension, Bus 0x0, Device 7, Function 0.
  DevObj 0xffffe0000056b880  Parent FDO DevExt 0xffffe00000535ae0
  Device State = PciStarted
  Vendor ID 8086 (INTEL)  Device ID 10FB
  Subsystem Vendor ID 8086 (INTEL)  Subsystem ID 7A11
  Header Type 0, Class Base/Sub 02/00  (Network Controller/Ethernet)
  Programming Interface: 00, Revision: 01, IntPin: 01, RawLine 00
  Possible Decodes ((cmd & 7) = 7): BMI
  Capabilities: Ptr=e0, power msi msix express 
  Express capabilities: (BIOS controlled) 
  Logical Device Power State: D0
  Device Wake Level:          Unspecified
  WaitWakeIrp:                <none>
  Requirements:     Alignment Length    Minimum          Maximum
    BAR0    Mem:    00080000  00080000  0000000000000000 00000000ffffffff
    BAR2     Io:    00000020  00000020  0000000000000000 00000000ffffffff
    BAR4    Mem:    00004000  00004000  0000000000000000 00000000ffffffff
      ROM BAR:      00080000  00080000  0000000000000000 00000000ffffffff
    VF BAR0 Mem:    00080000  00080000  0000000000000000 00000000ffffffff
  Resources:        Start            Length
    BAR0    Mem:    00000000f4080000 00080000
    BAR4    Mem:    00000000f4100000 00004000
  Interrupt Requirement:
    Line Based - Min Vector = 0x0, Max Vector = 0xffffffff
    Message Based: Type - Msi-X, 0x40 messages requested
  Interrupt Resource:    Type - MSI-X, 0x13 Messages Granted


kd> !devext 0xffffe0000056a1b0  pci
PDO Extension, Bus 0x0, Device 8, Function 0.
  DevObj 0xffffe0000056a060  Parent FDO DevExt 0xffffe00000535ae0
  Device State = PciStarted
  Vendor ID 8086 (INTEL)  Device ID 10FB
  Subsystem Vendor ID 8086 (INTEL)  Subsystem ID 7A11
  Header Type 0, Class Base/Sub 02/00  (Network Controller/Ethernet)
  Programming Interface: 00, Revision: 01, IntPin: 02, RawLine 00
  Possible Decodes ((cmd & 7) = 7): BMI
  Capabilities: Ptr=e0, power msi msix express 
  Express capabilities: (BIOS controlled) 
  Logical Device Power State: D0
  Device Wake Level:          Unspecified
  WaitWakeIrp:                <none>
  Requirements:     Alignment Length    Minimum          Maximum
    BAR0    Mem:    00080000  00080000  0000000000000000 00000000ffffffff
    BAR2     Io:    00000020  00000020  0000000000000000 00000000ffffffff
    BAR4    Mem:    00004000  00004000  0000000000000000 00000000ffffffff
      ROM BAR:      00080000  00080000  0000000000000000 00000000ffffffff
    VF BAR0 Mem:    00080000  00080000  0000000000000000 00000000ffffffff
  Resources:        Start            Length
    BAR0    Mem:    00000000f4200000 00080000
    BAR4    Mem:    00000000f4280000 00004000
  Interrupt Requirement:
    Line Based - Min Vector = 0x0, Max Vector = 0xffffffff
    Message Based: Type - Msi-X, 0x40 messages requested
  Interrupt Resource:    Type - MSI-X, 0x13 Messages Granted

Comment 17 Alex Williamson 2014-05-12 21:39:45 UTC
(In reply to Yan Vugenfirer from comment #15)
> The crash is when the driver is trying to map IO range with MmMapIoSpace
> 
> http://msdn.microsoft.com/en-us/library/windows/hardware/ff554618(v=vs.85).
> aspx
> 
> According to the parameters of the BSOD:
> MEMORY_MANAGEMENT (1a) (
> http://msdn.microsoft.com/en-us/library/windows/hardware/ff557391(v=vs.85).
> aspx )
> 
> Arg1: 0000000000001233: A driver tried to map a physical memory page that
> was not locked. This is illegal because the contents or attributes of the
> page can change at any time. This is a bug in the code that made the mapping
> call. Parameter 2 is the page frame number of the physical page that the
> driver attempted to map.

This sounds like a driver bug.  Intel has told us in the past that they don't support assignment of PFs that support SR-IOV.  Does the BSOD go away if only function 0 of the PF is assigned or if both functions are assigned with guest function number matching host function?  ex.

-device pci-assign,host=05:00.0,multifunction=on,addr=6.0,id=PF-1 \
-device pci-assign,host=05:00.1,addr=6.1,id=PF-2

Comment 18 Chao Yang 2014-05-13 09:59:15 UTC
I am not able to reproduce this bug on Intel system with Intel Corporation 82599ES 10-Gigabit with latest qemu-kvm, kernel and windows driver for 82599. Except that in guest Device Manager, it displays "Intel(R) Ethernet Server Adapter X520-2" and "Intel(R) Ethernet Server Adapter X520-2 #2"

Packages tested:
qemu-kvm-0.12.1.2-2.425.el6.x86_64
2.6.32-464.el6.x86_64

Driver version:
Operating Systems: Windows Server 2012 R2*
Date: 2014/04/10
Version: 19.1 

CLI:
# /usr/libexec/qemu-kvm -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -realtime mlock=off -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -nodefaults -drive file=en_windows_server_2012_r2_x64_dvd_2707946.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=win2012r2.qcow2,if=none,id=drive-ide-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide-disk0,id=ide-disk0,bootindex=1 -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:1a:4a:42:48:cd,bus=pci.0 -k en-us -vga cirrus -vnc :1 -monitor stdio -boot menu=on -device pci-assign,host=05:00.0,id=pf-1 -device pci-assign,host=05:00.1,id=pf-2

Comment 19 Alex Williamson 2014-05-13 21:29:06 UTC
Marking this closed then, the fix might have come from the Intel driver.  X520 is the new Intel marketing name for the 82599.