Bug 1599631
Summary: | [virtio-win][netkvm][whql] Job "NDISTest 6.0 - [2 Machine] - 2c_Mini6RSSSendRecv (Multi-Group Win8+)" BSOD with build154/156 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yu Wang <wyu> | ||||
Component: | virtio-win | Assignee: | Sameeh Jubran <sjubran> | ||||
virtio-win sub component: | virtio-win-prewhql | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | ailan, ddepaula, lijin, michen, phou, sjubran, vrozenfe, xiagao, yvugenfi | ||||
Version: | 7.6 | Keywords: | Regression | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: |
NO_DOCS
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-10-30 16:21:51 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Yu Wang
2018-07-10 08:58:57 UTC
******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Arguments: Arg1: ffffffff80000003, The exception code that was not handled Arg2: fffff800cc4e3d29, The address that the exception occurred at Arg3: ffffd000b87b10d8, Exception Record Address Arg4: ffffd000b87b08e0, Context Record Address Debugging Details: ------------------ EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid FAULTING_IP: NDProt630+a5d29 fffff800`cc4e3d29 cc int 3 EXCEPTION_RECORD: ffffd000b87b10d8 -- (.exr 0xffffd000b87b10d8) ExceptionAddress: fffff800cc4e3d29 (NDProt630+0x00000000000a5d29) ExceptionCode: 80000003 (Break instruction exception) ExceptionFlags: 00000000 NumberParameters: 1 Parameter[0]: 0000000000000000 CONTEXT: ffffd000b87b08e0 -- (.cxr 0xffffd000b87b08e0;r) rax=0000000000000000 rbx=ffffe0018e6c8080 rcx=a42658efbf840000 rdx=0000000000000000 rsi=ffffe0018e6c8080 rdi=ffffe0018c85d580 rip=fffff800cc4e3d29 rsp=ffffd000b87b1310 rbp=0000000000000080 r8=0000000000000000 r9=ffffd000b87b0d00 r10=00000000fffffffd r11=0000000000000000 r12=0000000000000000 r13=fffff802dd81e000 r14=ffffe0018ebafad8 r15=fffff800cc50a0a0 iopl=0 nv up ei ng nz na pe nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000282 NDProt630+0xa5d29: fffff800`cc4e3d29 cc int 3 Last set context: rax=0000000000000000 rbx=ffffe0018e6c8080 rcx=a42658efbf840000 rdx=0000000000000000 rsi=ffffe0018e6c8080 rdi=ffffe0018c85d580 rip=fffff800cc4e3d29 rsp=ffffd000b87b1310 rbp=0000000000000080 r8=0000000000000000 r9=ffffd000b87b0d00 r10=00000000fffffffd r11=0000000000000000 r12=0000000000000000 r13=fffff802dd81e000 r14=ffffe0018ebafad8 r15=fffff800cc50a0a0 iopl=0 nv up ei ng nz na pe nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000282 NDProt630+0xa5d29: fffff800`cc4e3d29 cc int 3 Resetting default scope DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT BUGCHECK_STR: AV PROCESS_NAME: System CURRENT_IRQL: 0 ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION} Breakpoint A breakpoint has been reached. EXCEPTION_PARAMETER1: 0000000000000000 ANALYSIS_VERSION: 6.3.9600.16520 (debuggers(dbg).140127-0329) amd64fre LAST_CONTROL_TRANSFER: from fffff800cc4ea600 to fffff800cc4e3d29 STACK_TEXT: ffffd000`b87b1310 fffff800`cc4ea600 : fffff800`cc5be930 ffffe001`00000380 fffff800`cc5bf4f0 00000000`00003c00 : NDProt630+0xa5d29 ffffd000`b87b1350 fffff800`cc50a0f3 : ffffe001`8ebafaa8 00000000`00000001 00000000`00000000 00000000`00006a12 : NDProt630+0xac600 ffffd000`b87b1430 fffff802`dd91fc70 : ffffe001`8ebafad8 fffff960`000dfeed fffff901`42289e80 fffff960`000eabb1 : NDProt630+0xcc0f3 ffffd000`b87b1480 fffff802`dd974fc6 : fffff802`ddb21180 ffffe001`8e6c8080 fffff802`ddb7aa00 fffff802`dd882cb2 : nt!PspSystemThreadStartup+0x58 ffffd000`b87b14e0 00000000`00000000 : ffffd000`b87b2000 ffffd000`b87ab000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16 FOLLOWUP_IP: NDProt630+a5d29 fffff800`cc4e3d29 cc int 3 SYMBOL_STACK_INDEX: 0 SYMBOL_NAME: NDProt630+a5d29 FOLLOWUP_NAME: MachineOwner MODULE_NAME: NDProt630 IMAGE_NAME: NDProt630.sys DEBUG_FLR_IMAGE_TIMESTAMP: 550cea5c STACK_COMMAND: .cxr 0xffffd000b87b08e0 ; kb FAILURE_BUCKET_ID: AV_VRF_NDProt630+a5d29 BUCKET_ID: AV_VRF_NDProt630+a5d29 ANALYSIS_SOURCE: KM FAILURE_ID_HASH_STRING: km:av_vrf_ndprot630+a5d29 FAILURE_ID_HASH: {a0550f62-2bda-baa3-4f2b-7854cdb7064d} Followup: MachineOwner --------- From the investigation I've made so far, it seems like the device is not notifying the driver that it has finished sending packets. Can you reproduce with vhost = off? Can you reproduce with qemu 2.9 for example? (In reply to Sameeh Jubran from comment #7) > From the investigation I've made so far, it seems like the device is not > notifying the driver that it has finished sending packets. > Can you reproduce with vhost = off? > Can you reproduce with qemu 2.9 for example? More questions: * Did you try build 144 on this same setup? (7.6) Does it pass? * Did you try the build 156 on the previous setup? (In reply to Sameeh Jubran from comment #8) > (In reply to Sameeh Jubran from comment #7) > > From the investigation I've made so far, it seems like the device is not > > notifying the driver that it has finished sending packets. > > Can you reproduce with vhost = off? > > Can you reproduce with qemu 2.9 for example? I will try it, then will tell you the result. You can refer to the answer below first. > > More questions: > > * Did you try build 144 on this same setup? (7.6) Does it pass? > * Did you try the build 156 on the previous setup? As I said in comment#0, It can pass with RHEL7.5 release build (build144), it is a regression The setup is the same(vhost=on,qemu-kvm-rhev-2.12.0-7.el7.x86_64). Thanks Yu Wang (In reply to Sameeh Jubran from comment #8) > > Can you reproduce with vhost = off? I can pass this job with vhost=off Thanks Yu Wang Created attachment 1460861 [details]
Win8/10 builds without the event suppression feature.
I have created a build with a disabled feature of the virtio queue, this might resolve the issue... can you please test if the BSOD reproduces with this build and vhost=on.
Thanks!
(In reply to Sameeh Jubran from comment #11) > Created attachment 1460861 [details] > Win8/10 builds without the event suppression feature. > > I have created a build with a disabled feature of the virtio queue, this > might resolve the issue... can you please test if the BSOD reproduces with > this build and vhost=on. It can pass without BSOD using your temp driver build Thanks Yu Wang > > Thanks! (In reply to Yu Wang from comment #12) > (In reply to Sameeh Jubran from comment #11) > > Created attachment 1460861 [details] > > Win8/10 builds without the event suppression feature. > > > > I have created a build with a disabled feature of the virtio queue, this > > might resolve the issue... can you please test if the BSOD reproduces with > > this build and vhost=on. > > It can pass without BSOD using your temp driver build > > Thanks > Yu Wang > > > > > > Thanks! Can you still pass the temp build with vhost on and one virtqueue? if no, then can build 144 pass this? (In reply to Sameeh Jubran from comment #13) > (In reply to Yu Wang from comment #12) > > (In reply to Sameeh Jubran from comment #11) > > > Created attachment 1460861 [details] > > > Win8/10 builds without the event suppression feature. > > > > > > I have created a build with a disabled feature of the virtio queue, this > > > might resolve the issue... can you please test if the BSOD reproduces with > > > this build and vhost=on. > > > > It can pass without BSOD using your temp driver build > > > > Thanks > > Yu Wang > > > > > > > > > > Thanks! > > Can you still pass the temp build with vhost on and one virtqueue? if no, > then can build 144 pass this? Can you please test the temp build on all other tests, since i can't test this on my setup as it tends to always fail with BSOD, it may be caused by the newer kernel I am using. Hi, >Can you please test the temp build on all other tests, since i can't test this on my setup as it tends to always fail with BSOD, it may be caused by the newer kernel I am using. I will test this later. I recently ran this case with build157, and it pass without BSOD, but it shows "qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio". Seems that there is a bug to set vhost=on, I reported a bug as below: Bug 1608226 - [virtual-network] prompt warning "qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio" when boot with win8+ guests Thanks Yu Wang Summary : When boot with guest with single queue,vhost=on: it occurred BSOD.(tried on build156) When boot with mq,vhost=on: will occurred error "qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio". Seems that there is a bug to set vhost=on", but can PASS this job.(tried on build157) For tmp build tests, it can pass with vhost=off. Thanks Yu Wang (In reply to Sameeh Jubran from comment #14) > > Can you please test the temp build on all other tests, since i can't test > this on my setup as it tends to always fail with BSOD, it may be caused by > the newer kernel I am using. run all tests with multi-queue or single queue? Thanks Yu Wang (In reply to Yu Wang from comment #17) > (In reply to Sameeh Jubran from comment #14) > > > > > Can you please test the temp build on all other tests, since i can't test > > this on my setup as it tends to always fail with BSOD, it may be caused by > > the newer kernel I am using. > > run all tests with multi-queue or single queue? Multiqueue please > > > Thanks > Yu Wang For Win10 we have an errata https://bugzilla.redhat.com/show_bug.cgi?id=1367251#c11 and for the test itself to pass the following should be done: Mini6RSSSendRecv (Multi-Group Win8+) test Right after the initial reboot on test initiation (Before the test itself starts!), enter the command prompt as the Administrator, and type: bcdedit.exe /set groupaware off bcdedit.exe /deletevalue groupsize shutdown /r /t 0 /f (In reply to Sameeh Jubran from comment #20) > For Win10 we have an errata > https://bugzilla.redhat.com/show_bug.cgi?id=1367251#c11 > > and for the test itself to pass the following should be done: > > Mini6RSSSendRecv (Multi-Group Win8+) test > Right after the initial reboot on test initiation (Before the test itself > starts!), enter the command prompt as the Administrator, and type: > > bcdedit.exe /set groupaware off > bcdedit.exe /deletevalue groupsize > shutdown /r /t 0 /f Thanks to Yu help in reproducing the issue and testing possible fixes, I have identified the offending commit and added a pull request: https://github.com/virtio-win/kvm-guest-drivers-windows/pull/317 The commit should make it to the next build, I have already informed vadim to add it. Ran this job with build 159 1 with 1 queue, vhost=on Pass at the first time. 2 with mq and vhost=on: pass at the second time, the first time BSOD(7e and IMAGE_NAME: NDProt630.sys , same as comment#2) Can this be counted as fixed ? Thanks Yu Wang (In reply to Yu Wang from comment #23) > Ran this job with build 159 > > 1 with 1 queue, vhost=on > Pass at the first time. > > 2 with mq and vhost=on: > pass at the second time, the first time BSOD(7e and IMAGE_NAME: > NDProt630.sys , same as comment#2) > > Can this be counted as fixed ? > > Thanks > Yu Wang Can you please supply me with the BSOD? and yes let's count this as fixed for now as we already identified the offending commit. This might be a different issue. Hi Danilo, This bug also need to be added into rhel7.6 virtio-win errata, could you help to do it? Thanks a lot It's already there. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3413 |