RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1599631 - [virtio-win][netkvm][whql] Job "NDISTest 6.0 - [2 Machine] - 2c_Mini6RSSSendRecv (Multi-Group Win8+)" BSOD with build154/156
Summary: [virtio-win][netkvm][whql] Job "NDISTest 6.0 - [2 Machine] - 2c_Mini6RSSSendR...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: virtio-win
Version: 7.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Sameeh Jubran
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-10 08:58 UTC by Yu Wang
Modified: 2018-10-30 16:22 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
NO_DOCS
Clone Of:
Environment:
Last Closed: 2018-10-30 16:21:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Win8/10 builds without the event suppression feature. (5.51 MB, application/zip)
2018-07-19 15:54 UTC, Sameeh Jubran
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3413 0 None None None 2018-10-30 16:22:51 UTC

Description Yu Wang 2018-07-10 08:58:57 UTC
Description of problem:


Version-Release number of selected component (if applicable):
kernel-3.10.0-919.el7.x86_64
virtio-win-prewhql-154/156
qemu-kvm-rhev-2.12.0-7.el7.x86_64
seabios-bin-1.11.0-2.el7.noarch

How reproducible:
2/2

Steps to Reproduce:
1. boot guest with virto-net device
2. submit job 
3.

Actual results:
Failed as BSOD

Expected results:
Pass

Additional info:
1 can pass with RHEL7.5 release build (build144), it is a regression
2 Failed on both rhel7 and rhel8 host.

Comment 2 Yu Wang 2018-07-10 09:34:10 UTC
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffff80000003, The exception code that was not handled
Arg2: fffff800cc4e3d29, The address that the exception occurred at
Arg3: ffffd000b87b10d8, Exception Record Address
Arg4: ffffd000b87b08e0, Context Record Address

Debugging Details:
------------------


EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid

FAULTING_IP: 
NDProt630+a5d29
fffff800`cc4e3d29 cc              int     3

EXCEPTION_RECORD:  ffffd000b87b10d8 -- (.exr 0xffffd000b87b10d8)
ExceptionAddress: fffff800cc4e3d29 (NDProt630+0x00000000000a5d29)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 1
   Parameter[0]: 0000000000000000

CONTEXT:  ffffd000b87b08e0 -- (.cxr 0xffffd000b87b08e0;r)
rax=0000000000000000 rbx=ffffe0018e6c8080 rcx=a42658efbf840000
rdx=0000000000000000 rsi=ffffe0018e6c8080 rdi=ffffe0018c85d580
rip=fffff800cc4e3d29 rsp=ffffd000b87b1310 rbp=0000000000000080
 r8=0000000000000000  r9=ffffd000b87b0d00 r10=00000000fffffffd
r11=0000000000000000 r12=0000000000000000 r13=fffff802dd81e000
r14=ffffe0018ebafad8 r15=fffff800cc50a0a0
iopl=0         nv up ei ng nz na pe nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000282
NDProt630+0xa5d29:
fffff800`cc4e3d29 cc              int     3
Last set context:
rax=0000000000000000 rbx=ffffe0018e6c8080 rcx=a42658efbf840000
rdx=0000000000000000 rsi=ffffe0018e6c8080 rdi=ffffe0018c85d580
rip=fffff800cc4e3d29 rsp=ffffd000b87b1310 rbp=0000000000000080
 r8=0000000000000000  r9=ffffd000b87b0d00 r10=00000000fffffffd
r11=0000000000000000 r12=0000000000000000 r13=fffff802dd81e000
r14=ffffe0018ebafad8 r15=fffff800cc50a0a0
iopl=0         nv up ei ng nz na pe nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000282
NDProt630+0xa5d29:
fffff800`cc4e3d29 cc              int     3
Resetting default scope

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  AV

PROCESS_NAME:  System

CURRENT_IRQL:  0

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_PARAMETER1:  0000000000000000

ANALYSIS_VERSION: 6.3.9600.16520 (debuggers(dbg).140127-0329) amd64fre

LAST_CONTROL_TRANSFER:  from fffff800cc4ea600 to fffff800cc4e3d29

STACK_TEXT:  
ffffd000`b87b1310 fffff800`cc4ea600 : fffff800`cc5be930 ffffe001`00000380 fffff800`cc5bf4f0 00000000`00003c00 : NDProt630+0xa5d29
ffffd000`b87b1350 fffff800`cc50a0f3 : ffffe001`8ebafaa8 00000000`00000001 00000000`00000000 00000000`00006a12 : NDProt630+0xac600
ffffd000`b87b1430 fffff802`dd91fc70 : ffffe001`8ebafad8 fffff960`000dfeed fffff901`42289e80 fffff960`000eabb1 : NDProt630+0xcc0f3
ffffd000`b87b1480 fffff802`dd974fc6 : fffff802`ddb21180 ffffe001`8e6c8080 fffff802`ddb7aa00 fffff802`dd882cb2 : nt!PspSystemThreadStartup+0x58
ffffd000`b87b14e0 00000000`00000000 : ffffd000`b87b2000 ffffd000`b87ab000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16


FOLLOWUP_IP: 
NDProt630+a5d29
fffff800`cc4e3d29 cc              int     3

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  NDProt630+a5d29

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: NDProt630

IMAGE_NAME:  NDProt630.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  550cea5c

STACK_COMMAND:  .cxr 0xffffd000b87b08e0 ; kb

FAILURE_BUCKET_ID:  AV_VRF_NDProt630+a5d29

BUCKET_ID:  AV_VRF_NDProt630+a5d29

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:av_vrf_ndprot630+a5d29

FAILURE_ID_HASH:  {a0550f62-2bda-baa3-4f2b-7854cdb7064d}

Followup: MachineOwner
---------

Comment 7 Sameeh Jubran 2018-07-17 12:40:46 UTC
From the investigation I've made so far, it seems like the device is not notifying the driver that it has finished sending packets.
Can you reproduce with vhost = off?
Can you reproduce with qemu 2.9 for example?

Comment 8 Sameeh Jubran 2018-07-17 13:08:25 UTC
(In reply to Sameeh Jubran from comment #7)
> From the investigation I've made so far, it seems like the device is not
> notifying the driver that it has finished sending packets.
> Can you reproduce with vhost = off?
> Can you reproduce with qemu 2.9 for example?

More questions:

* Did you try build 144 on this same setup? (7.6) Does it pass?
* Did you try the build 156 on the previous setup?

Comment 9 Yu Wang 2018-07-18 06:01:53 UTC
(In reply to Sameeh Jubran from comment #8)
> (In reply to Sameeh Jubran from comment #7)
> > From the investigation I've made so far, it seems like the device is not
> > notifying the driver that it has finished sending packets.
> > Can you reproduce with vhost = off?
> > Can you reproduce with qemu 2.9 for example?

I will try it, then will tell you the result. You can refer to the answer below first.

> 
> More questions:
> 
> * Did you try build 144 on this same setup? (7.6) Does it pass?
> * Did you try the build 156 on the previous setup?

As I said in comment#0,

It can pass with RHEL7.5 release build (build144), it is a regression
The setup is the same(vhost=on,qemu-kvm-rhev-2.12.0-7.el7.x86_64).

Thanks
Yu Wang

Comment 10 Yu Wang 2018-07-18 09:59:44 UTC
(In reply to Sameeh Jubran from comment #8)

> > Can you reproduce with vhost = off?

I can pass this job with vhost=off

Thanks
Yu Wang

Comment 11 Sameeh Jubran 2018-07-19 15:54:31 UTC
Created attachment 1460861 [details]
Win8/10 builds without the event suppression feature.

I have created a build with a disabled feature of the virtio queue, this might resolve the issue... can you please test if the BSOD reproduces with this build and vhost=on.

Thanks!

Comment 12 Yu Wang 2018-07-20 05:34:05 UTC
(In reply to Sameeh Jubran from comment #11)
> Created attachment 1460861 [details]
> Win8/10 builds without the event suppression feature.
> 
> I have created a build with a disabled feature of the virtio queue, this
> might resolve the issue... can you please test if the BSOD reproduces with
> this build and vhost=on.

It can pass without BSOD using your temp driver build

Thanks
Yu Wang


> 
> Thanks!

Comment 13 Sameeh Jubran 2018-07-24 00:22:17 UTC
(In reply to Yu Wang from comment #12)
> (In reply to Sameeh Jubran from comment #11)
> > Created attachment 1460861 [details]
> > Win8/10 builds without the event suppression feature.
> > 
> > I have created a build with a disabled feature of the virtio queue, this
> > might resolve the issue... can you please test if the BSOD reproduces with
> > this build and vhost=on.
> 
> It can pass without BSOD using your temp driver build
> 
> Thanks
> Yu Wang
> 
> 
> > 
> > Thanks!

Can you still pass the temp build with vhost on and one virtqueue? if no, then can build 144 pass this?

Comment 14 Sameeh Jubran 2018-07-24 16:26:35 UTC
(In reply to Sameeh Jubran from comment #13)
> (In reply to Yu Wang from comment #12)
> > (In reply to Sameeh Jubran from comment #11)
> > > Created attachment 1460861 [details]
> > > Win8/10 builds without the event suppression feature.
> > > 
> > > I have created a build with a disabled feature of the virtio queue, this
> > > might resolve the issue... can you please test if the BSOD reproduces with
> > > this build and vhost=on.
> > 
> > It can pass without BSOD using your temp driver build
> > 
> > Thanks
> > Yu Wang
> > 
> > 
> > > 
> > > Thanks!
> 
> Can you still pass the temp build with vhost on and one virtqueue? if no,
> then can build 144 pass this?

Can you please test the temp build on all other tests, since i can't test this on my setup as it tends to always fail with BSOD, it may be caused by the newer kernel I am using.

Comment 15 Yu Wang 2018-07-25 07:32:51 UTC
Hi, 

>Can you please test the temp build on all other tests, since i can't test this on my setup as it tends to always fail with BSOD, it may be caused by the newer kernel I am using.

I will test this later.

I recently ran this case with build157, and it pass without BSOD, 
but it shows "qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio". Seems that there is a bug to set vhost=on, I reported a bug as below:

Bug 1608226 - [virtual-network] prompt warning "qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio" when boot with win8+ guests 


Thanks
Yu Wang

Comment 16 Yu Wang 2018-07-25 09:56:47 UTC
Summary :

When boot with guest with single queue,vhost=on: it occurred BSOD.(tried on build156)

When boot with mq,vhost=on: will occurred error "qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio". Seems that there is a bug to set vhost=on", but can PASS this job.(tried on build157)

For tmp build tests, it can pass with vhost=off.

Thanks
Yu Wang

Comment 17 Yu Wang 2018-07-25 10:07:52 UTC
(In reply to Sameeh Jubran from comment #14)

> 
> Can you please test the temp build on all other tests, since i can't test
> this on my setup as it tends to always fail with BSOD, it may be caused by
> the newer kernel I am using.

run all tests with multi-queue or single queue?


Thanks
Yu Wang

Comment 18 Sameeh Jubran 2018-07-25 10:59:07 UTC
(In reply to Yu Wang from comment #17)
> (In reply to Sameeh Jubran from comment #14)
> 
> > 
> > Can you please test the temp build on all other tests, since i can't test
> > this on my setup as it tends to always fail with BSOD, it may be caused by
> > the newer kernel I am using.
> 
> run all tests with multi-queue or single queue?
Multiqueue please
> 
> 
> Thanks
> Yu Wang

Comment 20 Sameeh Jubran 2018-07-25 22:54:28 UTC
For Win10 we have an errata
https://bugzilla.redhat.com/show_bug.cgi?id=1367251#c11

and for the test itself to pass the following should be done:

Mini6RSSSendRecv (Multi-Group Win8+) test
Right after the initial reboot on test initiation (Before the test itself starts!), enter the command prompt as the Administrator, and type:

bcdedit.exe /set groupaware off
bcdedit.exe /deletevalue groupsize
shutdown /r /t 0 /f

Comment 22 Sameeh Jubran 2018-07-27 13:23:07 UTC
(In reply to Sameeh Jubran from comment #20)
> For Win10 we have an errata
> https://bugzilla.redhat.com/show_bug.cgi?id=1367251#c11
> 
> and for the test itself to pass the following should be done:
> 
> Mini6RSSSendRecv (Multi-Group Win8+) test
> Right after the initial reboot on test initiation (Before the test itself
> starts!), enter the command prompt as the Administrator, and type:
> 
> bcdedit.exe /set groupaware off
> bcdedit.exe /deletevalue groupsize
> shutdown /r /t 0 /f

Thanks to Yu help in reproducing the issue and testing possible fixes, I have identified the offending commit and added a pull request:
https://github.com/virtio-win/kvm-guest-drivers-windows/pull/317

The commit should make it to the next build, I have already informed vadim to add it.

Comment 23 Yu Wang 2018-07-31 09:44:08 UTC
Ran this job with build 159

1 with 1 queue, vhost=on
Pass at the first time.

2 with mq and vhost=on:
pass at the second time, the first time BSOD(7e and IMAGE_NAME:  NDProt630.sys , same as comment#2)

Can this be counted as fixed ?

Thanks
Yu Wang

Comment 24 Sameeh Jubran 2018-08-05 09:25:16 UTC
(In reply to Yu Wang from comment #23)
> Ran this job with build 159
> 
> 1 with 1 queue, vhost=on
> Pass at the first time.
> 
> 2 with mq and vhost=on:
> pass at the second time, the first time BSOD(7e and IMAGE_NAME: 
> NDProt630.sys , same as comment#2)
> 
> Can this be counted as fixed ?
> 
> Thanks
> Yu Wang

Can you please supply me with the BSOD?

and yes let's count this as fixed for now as we already identified the offending commit. This might be a different issue.

Comment 28 lijin 2018-09-17 02:21:26 UTC
Hi Danilo,

This bug also need to be added into rhel7.6 virtio-win errata, could you help to do it?

Thanks a lot

Comment 29 Danilo de Paula 2018-09-19 12:04:37 UTC
It's already there.

Comment 31 errata-xmlrpc 2018-10-30 16:21:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3413


Note You need to log in before you can comment on or make changes to this bug.