Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 612460 - [WHQL] [vhost:on]W2k8-32 guest hang during virtio-win NDISTest6.5(MPE) testing
[WHQL] [vhost:on]W2k8-32 guest hang during virtio-win NDISTest6.5(MPE) testing
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: virtio-win (Show other bugs)
6.0
All Linux
low Severity medium
: rc
: ---
Assigned To: Yan Vugenfirer
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-08 05:45 EDT by Qunfang Zhang
Modified: 2013-01-09 17:50 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-11 11:31:00 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg of host when win2k8 guest hang (79.13 KB, text/plain)
2010-07-08 05:45 EDT, Qunfang Zhang
no flags Details

  None (edit)
Description Qunfang Zhang 2010-07-08 05:45:16 EDT
Created attachment 430284 [details]
dmesg of host when win2k8 guest hang

Description of problem:
When I implement whql virtio-nic NDISTest6.5(MPE) testing, guest hangs at "Start NDISTest client" job and consume 100% cpu. 

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.90.el6.x86_64
virtio-win-1.1.7-2
2.6.32-37.el6.x86_64
Tried two different seabios version:
seabios-0.5.1-0.5.20100108git669c991.el6.x86_64
seabios-0.5.1-2.el6

How reproducible:
100%

Steps to Reproduce:
1.Boot win2k8-32 guest, with the command line:
 /usr/libexec/qemu-kvm -m 6G -smp 4 -cpu qemu64,+x2apic -usbdevice tablet -drive file=win2k8-32-nic1.qcow2,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup-private -device virtio-net-pci,netdev=hostnet0,mac=00:1a:08:09:02:01,id=229-nic1-1,bus=pci.0,addr=0x4 -netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup-private -device virtio-net-pci,netdev=hostnet1,mac=00:1a:08:09:02:02,id=229-nic1-2,bus=pci.0,addr=0x5 -netdev tap,id=hostnet2,script=/etc/qemu-ifup -device e1000,netdev=hostnet2,mac=00:1a:08:09:04:01,id=229-nic1-3,bus=pci.0,addr=0x6 -boot c -uuid a4f39443-bdc8-4171-a8df-ba981fa58643 -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -name win2k8x86NIC1-229 -spice port=5930,disable-ticketing -vga qxl

2.Run virtio-win NDISTest6.5(MPE) testing
3.
  
Actual results:
Test passed.

Expected results:
Guest hang at "Start NDISTest client" job.

Additional info:
dmesg of host will be attached.

#kvm_stat

kvm statistics

 efer_reload                  0       0
 exits                276231562    4985
 fpu_reload            35596534    1748
 halt_exits             4187565     315
 halt_wakeup            3581909     316
 host_state_reload     35712126    1774
 hypercalls                   0       0
 insn_emulation       148743765    1283
 insn_emulation_fail          0       0
 invlpg                       0       0
 io_exits              32028828    1174
 irq_exits             88847512    2119
 irq_injections        85466480     516
 irq_window                   0       0
 largepages                5626       0
 mmio_exits              392044      19
 mmu_cache_miss             973       0
 mmu_flooded                  0       0
 mmu_pde_zapped               0       0
 mmu_pte_updated              0       0
 mmu_pte_write                0       0
 mmu_recycled                 0       0
 mmu_shadow_zapped         1097       0
 mmu_unsync                   0       0
 nmi_injections               6       0
 nmi_window                   0       0
 pf_fixed                 87247       0
 pf_guest                     0       0
 remote_tlb_flush           582       0
 request_irq                  0       0
 signal_exits                10       0
 tlb_flush                    0       0
Comment 1 Qunfang Zhang 2010-07-08 05:48:22 EDT
Hi, Yan, Vadim and Michael

I don't know this bug should be submitted to which component, so submit to qemu-kvm. please correct it if I am wrong.

Any additional information needed, please ping me or add comment.
Comment 3 Michael S. Tsirkin 2010-07-08 05:57:32 EDT
As I see it there are two issues here:

1. host error seen in dmesg. I think this is same as bug 602607.
Please test with this host kernel to verify:
https://brewweb.devel.redhat.com/taskinfo?taskID=2574750
(fixes host error when guest ring is corrupted)
and report. We have bz 602607 to track that.

2. some bug which corrupts the ring. Possibly virtio win.
Assigning to that component for examination.
Comment 4 Qunfang Zhang 2010-07-08 23:00:50 EDT
(In reply to comment #3)
> As I see it there are two issues here:
> 
> 1. host error seen in dmesg. I think this is same as bug 602607.
> Please test with this host kernel to verify:
> https://brewweb.devel.redhat.com/taskinfo?taskID=2574750
> (fixes host error when guest ring is corrupted)
> and report. We have bz 602607 to track that.

Re-test with this kernel, and during the testing, guest did not hang anymore but got BSOD and the error code is 7e, memory dump file:

http://10.66.65.120/mem-dump/MEMORY-win2k8-32-mstkernel.DMP
http://10.66.65.120/mem-dump/Mini070910-01win2k8-32-mstkernel.dmp


> 
> 2. some bug which corrupts the ring. Possibly virtio win.
> Assigning to that component for examination.
Comment 5 Yan Vugenfirer 2010-07-12 16:58:18 EDT
(In reply to comment #3)

> 2. some bug which corrupts the ring. Possibly virtio win.
> Assigning to that component for examination.    

Ring management didn't change for ages. I suggest to test without vhost first.
Comment 6 Dor Laor 2010-07-12 17:23:14 EDT
Maybe it's the published used one?
Comment 7 Michael S. Tsirkin 2010-07-12 18:51:35 EDT
we have it in userspace too.
You can try disabling with:
-global virtio-net-pci.publish_used=off
Comment 8 Qunfang Zhang 2010-07-14 01:13:55 EDT
(In reply to comment #7)
> we have it in userspace too.
> You can try disabling with:
> -global virtio-net-pci.publish_used=off    

Test with -global virtio-net-pci.publish_used=off, guest did not hang anymore. It get the same BSOD as Comment 4.
Comment 9 Dor Laor 2010-07-18 05:10:18 EDT
Yan asks to test it w/o vhost. Can you please do it for isolating the issue?
Comment 10 Qunfang Zhang 2010-07-19 01:47:36 EDT
(In reply to comment #9)
> Yan asks to test it w/o vhost. Can you please do it for isolating the issue?    

The test is running now, will update result later.
Comment 11 Qunfang Zhang 2010-07-20 02:45:35 EDT
Test without vhost=on, the client guest got BSOD, and the error code is 0x7e.
Screenshot will be attached. 
But win2k8-32 can not get dump file when using 6G memory. So, if I need to re-test with a smaller mem to get the dump file?
Comment 12 Yan Vugenfirer 2010-07-20 03:02:46 EDT
Please use following regestory settings:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl]
"AlwaysKeepMemoryDump"=dword:00000001

reboot after applying.

Also please keep MPE memory dumps, if this is a dump that falls under MS errata - we might need to send it to MS for review in order to pass a test.
Comment 13 Qunfang Zhang 2010-07-20 03:06:24 EDT
(In reply to comment #12)
> Please use following regestory settings:
> [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl]
> "AlwaysKeepMemoryDump"=dword:00000001
> 
> reboot after applying.
> 
> Also please keep MPE memory dumps, if this is a dump that falls under MS errata
> - we might need to send it to MS for review in order to pass a test.    

OK, will update bz after get result.
Comment 14 Qunfang Zhang 2010-07-20 06:47:17 EDT
(In reply to comment #13)
> (In reply to comment #12)
> > Please use following regestory settings:
> > [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl]
> > "AlwaysKeepMemoryDump"=dword:00000001
> > 
> > reboot after applying.
> > 
> > Also please keep MPE memory dumps, if this is a dump that falls under MS errata
> > - we might need to send it to MS for review in order to pass a test.    
> 
> OK, will update bz after get result.    



Microsoft (R) Windows Debugger Version 6.10.0003.233 X86
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Users\DTMLLUAdminUser\Desktop\MEMORY-#612460-without-vhost.DMP]
Kernel Summary Dump File: Only kernel address space is available

Symbol search path is: SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows Server 2008/Windows Vista SP1 Kernel Version 6001 (Service Pack 1) MP (4 procs) Free x86 compatible
Product: Server, suite: TerminalServer DataCenter SingleUserTS
Built by: 6001.18427.x86fre.vistasp1_gdr.100218-0019
Machine Name:
Kernel base = 0x8163b000 PsLoadedModuleList = 0x81752c70
Debug session time: Tue Jul 20 15:12:52.403 2010 (GMT-7)
System Uptime: 0 days 0:51:33.966
Loading Kernel Symbols
...............................................................
........................................................
Loading User Symbols

Loading unloaded module list
.......
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 7E, {80000003, 98e337b3, 94d60c1c, 94d60918}

*** ERROR: Module load completed but symbols could not be loaded for ndprot61.sys
Probably caused by : ndprot61.sys ( ndprot61+2d7b3 )

Followup: MachineOwner
---------

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: 80000003, The exception code that was not handled
Arg2: 98e337b3, The address that the exception occurred at
Arg3: 94d60c1c, Exception Record Address
Arg4: 94d60918, Context Record Address

Debugging Details:
------------------


EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid

FAULTING_IP: 
ndprot61+2d7b3
98e337b3 cc              int     3

EXCEPTION_RECORD:  94d60c1c -- (.exr 0xffffffff94d60c1c)
ExceptionAddress: 98e337b3 (ndprot61+0x0002d7b3)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 3
   Parameter[0]: 00000000
   Parameter[1]: 940b1d78
   Parameter[2]: 00000023

CONTEXT:  94d60918 -- (.cxr 0xffffffff94d60918)
eax=00000001 ebx=00000000 ecx=816f91de edx=00000023 esi=940b1d78 edi=00000000
eip=98e337b3 esp=94d60ce4 ebp=94d60cec iopl=0         nv up ei ng nz na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000286
ndprot61+0x2d7b3:
98e337b3 cc              int     3
Resetting default scope

DEFAULT_BUCKET_ID:  INTEL_CPU_MICROCODE_ZERO

BUGCHECK_STR:  0x7E

PROCESS_NAME:  System

CURRENT_IRQL:  0

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_PARAMETER1:  00000000

EXCEPTION_PARAMETER2:  940b1d78

EXCEPTION_PARAMETER3:  00000023

LAST_CONTROL_TRANSFER:  from 98e99232 to 98e337b3

STACK_TEXT:  
WARNING: Stack unwind information not available. Following frames may be wrong.
94d60cec 98e99232 00000000 9861d624 0000000c ndprot61+0x2d7b3
94d60d10 98e996a6 9863e700 0000000f 00000064 ndprot61+0x93232
94d60d68 98eb47f0 00000000 9861d624 9861d640 ndprot61+0x936a6
94d60d7c 81810b54 9861d640 010c6c87 00000000 ndprot61+0xae7f0
94d60dc0 81669a5e 98eb47b0 9861d640 00000000 nt!PspSystemThreadStartup+0x9d
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16


FOLLOWUP_IP: 
ndprot61+2d7b3
98e337b3 cc              int     3

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  ndprot61+2d7b3

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: ndprot61

IMAGE_NAME:  ndprot61.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4b08150e

STACK_COMMAND:  .cxr 0xffffffff94d60918 ; kb

FAILURE_BUCKET_ID:  0x7E_VRF_ndprot61+2d7b3

BUCKET_ID:  0x7E_VRF_ndprot61+2d7b3

Followup: MachineOwner
---------
Comment 15 Qunfang Zhang 2010-07-20 06:53:38 EDT
Memory dump file:
http://10.66.65.120/mem-dump/MEMORY-%23612460-without-vhost.DMP
Comment 26 Qunfang Zhang 2010-07-22 05:35:46 EDT
Summarize my test results here:

Test with "-global  virtio-net-pci.publish_used=off" for both vhost=on and off.
1. vhost=on
Guest met a BSOD (error code is 0x7e)
http://10.66.65.120/mem-dump/MEMORY-2k8-32-6.5MPE-vhostON-usedoff-612460.DMP

2. vhost=off
For the first time, the test passed.
For the second time, got BSOD (error code is 0x7e)
http://10.66.65.120/mem-dump/MEMORY-2k8-32-6.5MPE-vhostoff-usedoff-612460-7E.DMP
Comment 27 Qunfang Zhang 2010-07-23 03:13:22 EDT
Hi, all
To make the things more clear,I will change the status to ASSIGNED because it can be reproduced in virtio-win-1.1.8-0.And after it changes to ON_QA again, I will verify it in the new version.

Thanks
Qunfang
Comment 28 Amit Shah 2010-07-23 03:26:05 EDT
PUBLISH_USED was removed in qemu-kvm-0.12.1.2-2.99.el6. Can you try that package?
Comment 29 Yan Vugenfirer 2010-07-24 04:51:39 EDT
Just to be clear - BSOD in MPE test still doesn't mean the test fail. We have ERRATA from MS on their BUG that might cause BSOD. 

Each crash dump from MPE test should be investigated to see if this is related to MS bug or this is something else related to us.

I will check those dumps.
Comment 30 Qunfang Zhang 2010-07-26 00:32:36 EDT
(In reply to comment #28)
> PUBLISH_USED was removed in qemu-kvm-0.12.1.2-2.99.el6. Can you try that
> package?    

The spice related issue bug 617463 blocking me. So will verify this bug after 617463 is fixed.
Comment 33 Qunfang Zhang 2010-07-27 03:22:54 EDT
Test with the qemu-kvm build provided by Alex in bug 617463:
https://brewweb.devel.redhat.com/taskinfo?taskID=2625606

Boot the guest with vhost=on and DO NOT add "published_use" option.

And the guest does not hang any more.
Guest got BSOD and the error code is 0x7e.
memory dump file:
http://10.66.65.120/mem-dump/MEMORY-2k8-32-MPE-fixed.DMP

qzhang -> Yan
Could you help to check if it is MS errata?  Then I can change the status to VERIFIED. :-)

Thanks~
Comment 37 Qunfang Zhang 2010-08-03 22:54:59 EDT
Hi, Yan 

As described in Comment 33, does the BSOD fall into MS errata? And could I change the status to VERIFIED?

Thanks~
Comment 38 Yan Vugenfirer 2010-08-04 13:51:00 EDT
(In reply to comment #37)
> Hi, Yan 
> 
> As described in Comment 33, does the BSOD fall into MS errata? And could I
> change the status to VERIFIED?
> 
> Thanks~    

No, this is traffic hang crash, please retest withou vhost.
Comment 39 Yan Vugenfirer 2010-08-04 14:21:11 EDT
See comment #26 - it was already tested without vhost and passed.
Comment 40 Qunfang Zhang 2010-08-05 05:04:38 EDT
According to Comment 38 and Comment 38, and I re-test with vhost=on using virtio-win-1.1.10-0, this issue still exists with vhost=on. So will change the status to ASSIGNED.
Comment 50 Qunfang Zhang 2010-08-23 07:59:01 EDT
Update:
Finished testing win7-64, win2k8-R2 and win2k8-64 without vhost, NDISTest6.5 passed. (In fact, all jobs passed.)
I will change the status to VERIFIED after finish all guests.

Packages version:
virtio-win-1.1.12.0
kernel-2.6.32-66.el6
qemu-kvm-0.12.1.2-2.112.el6
Comment 51 Qunfang Zhang 2010-08-26 02:09:36 EDT
(In reply to comment #50)
> Update:
> Finished testing win7-64, win2k8-R2 and win2k8-64 without vhost, NDISTest6.5
> passed. (In fact, all jobs passed.)
> I will change the status to VERIFIED after finish all guests.
Sorry, for win2k3 and winxp, there's no NDISTest6.5(MPE), so this job is passed without vhost=on.
> 
> Packages version:
> virtio-win-1.1.12.0
> kernel-2.6.32-66.el6
> qemu-kvm-0.12.1.2-2.112.el6
Comment 52 Qunfang Zhang 2010-08-26 04:03:58 EDT
According to Comment 51, I will change the status to VERIFIED.
Comment 53 releng-rhel@redhat.com 2010-11-11 11:31:00 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.