Bug 1599631
| Summary: | [virtio-win][netkvm][whql] Job "NDISTest 6.0 - [2 Machine] - 2c_Mini6RSSSendRecv (Multi-Group Win8+)" BSOD with build154/156 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Yu Wang <wyu> | ||||
| Component: | virtio-win | Assignee: | Sameeh Jubran <sjubran> | ||||
| virtio-win sub component: | virtio-win-prewhql | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | high | CC: | ailan, ddepaula, lijin, michen, phou, sjubran, vrozenfe, xiagao, yvugenfi | ||||
| Version: | 7.6 | Keywords: | Regression | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: |
NO_DOCS
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-10-30 16:21:51 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Yu Wang
2018-07-10 08:58:57 UTC
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffff80000003, The exception code that was not handled
Arg2: fffff800cc4e3d29, The address that the exception occurred at
Arg3: ffffd000b87b10d8, Exception Record Address
Arg4: ffffd000b87b08e0, Context Record Address
Debugging Details:
------------------
EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid
FAULTING_IP:
NDProt630+a5d29
fffff800`cc4e3d29 cc int 3
EXCEPTION_RECORD: ffffd000b87b10d8 -- (.exr 0xffffd000b87b10d8)
ExceptionAddress: fffff800cc4e3d29 (NDProt630+0x00000000000a5d29)
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 1
Parameter[0]: 0000000000000000
CONTEXT: ffffd000b87b08e0 -- (.cxr 0xffffd000b87b08e0;r)
rax=0000000000000000 rbx=ffffe0018e6c8080 rcx=a42658efbf840000
rdx=0000000000000000 rsi=ffffe0018e6c8080 rdi=ffffe0018c85d580
rip=fffff800cc4e3d29 rsp=ffffd000b87b1310 rbp=0000000000000080
r8=0000000000000000 r9=ffffd000b87b0d00 r10=00000000fffffffd
r11=0000000000000000 r12=0000000000000000 r13=fffff802dd81e000
r14=ffffe0018ebafad8 r15=fffff800cc50a0a0
iopl=0 nv up ei ng nz na pe nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000282
NDProt630+0xa5d29:
fffff800`cc4e3d29 cc int 3
Last set context:
rax=0000000000000000 rbx=ffffe0018e6c8080 rcx=a42658efbf840000
rdx=0000000000000000 rsi=ffffe0018e6c8080 rdi=ffffe0018c85d580
rip=fffff800cc4e3d29 rsp=ffffd000b87b1310 rbp=0000000000000080
r8=0000000000000000 r9=ffffd000b87b0d00 r10=00000000fffffffd
r11=0000000000000000 r12=0000000000000000 r13=fffff802dd81e000
r14=ffffe0018ebafad8 r15=fffff800cc50a0a0
iopl=0 nv up ei ng nz na pe nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000282
NDProt630+0xa5d29:
fffff800`cc4e3d29 cc int 3
Resetting default scope
DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT
BUGCHECK_STR: AV
PROCESS_NAME: System
CURRENT_IRQL: 0
ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION} Breakpoint A breakpoint has been reached.
EXCEPTION_PARAMETER1: 0000000000000000
ANALYSIS_VERSION: 6.3.9600.16520 (debuggers(dbg).140127-0329) amd64fre
LAST_CONTROL_TRANSFER: from fffff800cc4ea600 to fffff800cc4e3d29
STACK_TEXT:
ffffd000`b87b1310 fffff800`cc4ea600 : fffff800`cc5be930 ffffe001`00000380 fffff800`cc5bf4f0 00000000`00003c00 : NDProt630+0xa5d29
ffffd000`b87b1350 fffff800`cc50a0f3 : ffffe001`8ebafaa8 00000000`00000001 00000000`00000000 00000000`00006a12 : NDProt630+0xac600
ffffd000`b87b1430 fffff802`dd91fc70 : ffffe001`8ebafad8 fffff960`000dfeed fffff901`42289e80 fffff960`000eabb1 : NDProt630+0xcc0f3
ffffd000`b87b1480 fffff802`dd974fc6 : fffff802`ddb21180 ffffe001`8e6c8080 fffff802`ddb7aa00 fffff802`dd882cb2 : nt!PspSystemThreadStartup+0x58
ffffd000`b87b14e0 00000000`00000000 : ffffd000`b87b2000 ffffd000`b87ab000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
FOLLOWUP_IP:
NDProt630+a5d29
fffff800`cc4e3d29 cc int 3
SYMBOL_STACK_INDEX: 0
SYMBOL_NAME: NDProt630+a5d29
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: NDProt630
IMAGE_NAME: NDProt630.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 550cea5c
STACK_COMMAND: .cxr 0xffffd000b87b08e0 ; kb
FAILURE_BUCKET_ID: AV_VRF_NDProt630+a5d29
BUCKET_ID: AV_VRF_NDProt630+a5d29
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:av_vrf_ndprot630+a5d29
FAILURE_ID_HASH: {a0550f62-2bda-baa3-4f2b-7854cdb7064d}
Followup: MachineOwner
---------
From the investigation I've made so far, it seems like the device is not notifying the driver that it has finished sending packets. Can you reproduce with vhost = off? Can you reproduce with qemu 2.9 for example? (In reply to Sameeh Jubran from comment #7) > From the investigation I've made so far, it seems like the device is not > notifying the driver that it has finished sending packets. > Can you reproduce with vhost = off? > Can you reproduce with qemu 2.9 for example? More questions: * Did you try build 144 on this same setup? (7.6) Does it pass? * Did you try the build 156 on the previous setup? (In reply to Sameeh Jubran from comment #8) > (In reply to Sameeh Jubran from comment #7) > > From the investigation I've made so far, it seems like the device is not > > notifying the driver that it has finished sending packets. > > Can you reproduce with vhost = off? > > Can you reproduce with qemu 2.9 for example? I will try it, then will tell you the result. You can refer to the answer below first. > > More questions: > > * Did you try build 144 on this same setup? (7.6) Does it pass? > * Did you try the build 156 on the previous setup? As I said in comment#0, It can pass with RHEL7.5 release build (build144), it is a regression The setup is the same(vhost=on,qemu-kvm-rhev-2.12.0-7.el7.x86_64). Thanks Yu Wang (In reply to Sameeh Jubran from comment #8) > > Can you reproduce with vhost = off? I can pass this job with vhost=off Thanks Yu Wang Created attachment 1460861 [details]
Win8/10 builds without the event suppression feature.
I have created a build with a disabled feature of the virtio queue, this might resolve the issue... can you please test if the BSOD reproduces with this build and vhost=on.
Thanks!
(In reply to Sameeh Jubran from comment #11) > Created attachment 1460861 [details] > Win8/10 builds without the event suppression feature. > > I have created a build with a disabled feature of the virtio queue, this > might resolve the issue... can you please test if the BSOD reproduces with > this build and vhost=on. It can pass without BSOD using your temp driver build Thanks Yu Wang > > Thanks! (In reply to Yu Wang from comment #12) > (In reply to Sameeh Jubran from comment #11) > > Created attachment 1460861 [details] > > Win8/10 builds without the event suppression feature. > > > > I have created a build with a disabled feature of the virtio queue, this > > might resolve the issue... can you please test if the BSOD reproduces with > > this build and vhost=on. > > It can pass without BSOD using your temp driver build > > Thanks > Yu Wang > > > > > > Thanks! Can you still pass the temp build with vhost on and one virtqueue? if no, then can build 144 pass this? (In reply to Sameeh Jubran from comment #13) > (In reply to Yu Wang from comment #12) > > (In reply to Sameeh Jubran from comment #11) > > > Created attachment 1460861 [details] > > > Win8/10 builds without the event suppression feature. > > > > > > I have created a build with a disabled feature of the virtio queue, this > > > might resolve the issue... can you please test if the BSOD reproduces with > > > this build and vhost=on. > > > > It can pass without BSOD using your temp driver build > > > > Thanks > > Yu Wang > > > > > > > > > > Thanks! > > Can you still pass the temp build with vhost on and one virtqueue? if no, > then can build 144 pass this? Can you please test the temp build on all other tests, since i can't test this on my setup as it tends to always fail with BSOD, it may be caused by the newer kernel I am using. Hi, >Can you please test the temp build on all other tests, since i can't test this on my setup as it tends to always fail with BSOD, it may be caused by the newer kernel I am using. I will test this later. I recently ran this case with build157, and it pass without BSOD, but it shows "qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio". Seems that there is a bug to set vhost=on, I reported a bug as below: Bug 1608226 - [virtual-network] prompt warning "qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio" when boot with win8+ guests Thanks Yu Wang Summary : When boot with guest with single queue,vhost=on: it occurred BSOD.(tried on build156) When boot with mq,vhost=on: will occurred error "qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio". Seems that there is a bug to set vhost=on", but can PASS this job.(tried on build157) For tmp build tests, it can pass with vhost=off. Thanks Yu Wang (In reply to Sameeh Jubran from comment #14) > > Can you please test the temp build on all other tests, since i can't test > this on my setup as it tends to always fail with BSOD, it may be caused by > the newer kernel I am using. run all tests with multi-queue or single queue? Thanks Yu Wang (In reply to Yu Wang from comment #17) > (In reply to Sameeh Jubran from comment #14) > > > > > Can you please test the temp build on all other tests, since i can't test > > this on my setup as it tends to always fail with BSOD, it may be caused by > > the newer kernel I am using. > > run all tests with multi-queue or single queue? Multiqueue please > > > Thanks > Yu Wang For Win10 we have an errata https://bugzilla.redhat.com/show_bug.cgi?id=1367251#c11 and for the test itself to pass the following should be done: Mini6RSSSendRecv (Multi-Group Win8+) test Right after the initial reboot on test initiation (Before the test itself starts!), enter the command prompt as the Administrator, and type: bcdedit.exe /set groupaware off bcdedit.exe /deletevalue groupsize shutdown /r /t 0 /f (In reply to Sameeh Jubran from comment #20) > For Win10 we have an errata > https://bugzilla.redhat.com/show_bug.cgi?id=1367251#c11 > > and for the test itself to pass the following should be done: > > Mini6RSSSendRecv (Multi-Group Win8+) test > Right after the initial reboot on test initiation (Before the test itself > starts!), enter the command prompt as the Administrator, and type: > > bcdedit.exe /set groupaware off > bcdedit.exe /deletevalue groupsize > shutdown /r /t 0 /f Thanks to Yu help in reproducing the issue and testing possible fixes, I have identified the offending commit and added a pull request: https://github.com/virtio-win/kvm-guest-drivers-windows/pull/317 The commit should make it to the next build, I have already informed vadim to add it. Ran this job with build 159 1 with 1 queue, vhost=on Pass at the first time. 2 with mq and vhost=on: pass at the second time, the first time BSOD(7e and IMAGE_NAME: NDProt630.sys , same as comment#2) Can this be counted as fixed ? Thanks Yu Wang (In reply to Yu Wang from comment #23) > Ran this job with build 159 > > 1 with 1 queue, vhost=on > Pass at the first time. > > 2 with mq and vhost=on: > pass at the second time, the first time BSOD(7e and IMAGE_NAME: > NDProt630.sys , same as comment#2) > > Can this be counted as fixed ? > > Thanks > Yu Wang Can you please supply me with the BSOD? and yes let's count this as fixed for now as we already identified the offending commit. This might be a different issue. Hi Danilo, This bug also need to be added into rhel7.6 virtio-win errata, could you help to do it? Thanks a lot It's already there. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3413 |