Bug 695053

Summary: [Whql] Job of "Ethernet-NDISTest6.5(MPE)" always failed with BSOD happened on win7 and win2k8
Product: Red Hat Enterprise Linux 6 Reporter: dawu
Component: virtio-winAssignee: Yvugenfi <yvugenfi>
Status: CLOSED CANTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: afrenkel, bcao, juzhang, llim, mdeng, michen, mshao, qzhang, rhod, tburke, vrozenfe, yvugenfi
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-08 18:52:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 580951    
Attachments:
Description Flags
win7-64-nic-MPE-BSOD-7E
none
Tool's log
none
Standby
none
Tools log without running job (NDIS Test6.5 MPE)
none
Hibernate none

Comment 1 dawu 2011-04-10 08:20:20 UTC
We have tried this job 3 times for win7-32/win7-64 separately,and upload dump
file for every time, because we are not sure dump file content is same for
every time although the error code is same.

Best Regards,
Dawn

Comment 7 dawu 2011-04-11 08:16:20 UTC
Retest on fresh win2k8-64 and win7-32 images for MPE job, however, when job almost finished, sub-job of "Start NDISTest Server" was cancelled unexpectedly without any prompt and error,just like job was stopped automatically.

We will retest on fresh installed image and update this result.

Best Regards,
Dawn

Comment 8 Qunfang Zhang 2011-04-12 06:38:37 UTC
(In reply to comment #7)
> Retest on fresh win2k8-64 and win7-32 images for MPE job, however, when job
> almost finished, sub-job of "Start NDISTest Server" was cancelled unexpectedly
> without any prompt and error,just like job was stopped automatically.
> 
> We will retest on fresh installed image and update this result.

Still failed with the same phenomenon.
Now we are re-testing with Yan and Yuri's suggestion, re-run with dbgview running 
on the back ground.
> 
> Best Regards,
> Dawn

Comment 9 Vadim Rozenfeld 2011-04-12 06:54:34 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > Retest on fresh win2k8-64 and win7-32 images for MPE job, however, when job
> > almost finished, sub-job of "Start NDISTest Server" was cancelled unexpectedly
> > without any prompt and error,just like job was stopped automatically.
> > 
> > We will retest on fresh installed image and update this result.
> 
> Still failed with the same phenomenon.
> Now we are re-testing with Yan and Yuri's suggestion, re-run with dbgview
> running 
> on the back ground.

What about the VM state in DTM Studio. Was it switched into "Debug" mode or it still remains "Ready"?
Best regards,
Vadim.
> > 
> > Best Regards,
> > Dawn

Comment 10 Qunfang Zhang 2011-04-12 07:31:10 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > Retest on fresh win2k8-64 and win7-32 images for MPE job, however, when job
> > > almost finished, sub-job of "Start NDISTest Server" was cancelled unexpectedly
> > > without any prompt and error,just like job was stopped automatically.
> > > 
> > > We will retest on fresh installed image and update this result.
> > 
> > Still failed with the same phenomenon.
> > Now we are re-testing with Yan and Yuri's suggestion, re-run with dbgview
> > running 
> > on the back ground.
> 
> What about the VM state in DTM Studio. Was it switched into "Debug" mode or it
> still remains "Ready"?

The client VM changed to "Debug" state, the server VM is still ready. 
> Best regards,
> Vadim.
> > > 
> > > Best Regards,
> > > Dawn

Comment 12 Dor Laor 2011-04-21 11:12:31 UTC
Can we close this bug since whql submission is underway and all the bsod are assumed to fall under the errata

Comment 13 Qunfang Zhang 2011-04-22 08:48:38 UTC
(In reply to comment #12)
> Can we close this bug since whql submission is underway and all the bsod are
> assumed to fall under the errata

Hi, Dor
I clear the Regression keyword and set the flag to rhel6.2.0.
Because after several times retest, we can get a BSOD that can fall into MS errata while for many BSOD, they are not the cases of MS errata.
How about let's track the other dumps (not ms errata) in this bz?
After the RHEL6.1 driver push out, we will re-install WLK1.6 and re-run the test to check the following points as Yan suggested.
(1)If MS fixed their own issues.
(2)Check if we can behave better during extreme stress on the system and in this case may be MS will not hit its own race conditions.

Yan, what's you opinion?

Thanks.

Comment 14 Yvugenfi@redhat.com 2011-04-23 20:35:15 UTC
I agree let's keep this BZ (or open another) to track WLK1.6 and efforts for the stress driver behavior.

Comment 15 Ronen Hod 2011-04-25 07:17:51 UTC
Although it is always desirable to behave better during extreme stress, I believe that we do not need to work on it. Extreme stress happens, and MS need to solve the problem in their test program.

Comment 28 Mike Cao 2011-08-01 02:25:09 UTC
Tried 1 time with windows 2008 64 bit guest ,
no BSOD happened ,but it takes 3 days to execute childjob "Start NDISTEST Server" .While it is failed. 

I will wait for it completed and resubmit job one more time .

additional info:
error log:
Root Cause
The Execute Task with Commandline

cmd /c ndistest.exe /logo /auto /server /support:{CCC14B76-EDF3-4AB7-B086-7DD19A40AF77} /msg:{7B29E046-4C24-4321-80C4-52C1DA643E5F} /jobs:server.cpp 

Failed with ExitCode c000013a (STATUS_CONTROL_C_EXIT The application terminated as a result of a CTRL+C.)

 
Resolution 
The task exited with an ExitCode other than the ExpectedTaskExitCode. This may cause the Task to Fail if it is set to Fail On Exit Code

thanks,
Mike

Comment 29 Mike Cao 2011-08-01 02:36:52 UTC
(In reply to comment #28)
> Tried 1 time with windows 2008 64 bit guest ,
> no BSOD happened ,but it takes 3 days to execute childjob "Start NDISTEST
> Server" .While it is failed. 
> 
> I will wait for it completed and resubmit job one more time .
> 
> additional info:
> error log:
> Root Cause
> The Execute Task with Commandline
> 
> cmd /c ndistest.exe /logo /auto /server
> /support:{CCC14B76-EDF3-4AB7-B086-7DD19A40AF77}
> /msg:{7B29E046-4C24-4321-80C4-52C1DA643E5F} /jobs:server.cpp 
> 
> Failed with ExitCode c000013a (STATUS_CONTROL_C_EXIT The application terminated
> as a result of a CTRL+C.)
> 
> 
> Resolution 
> The task exited with an ExitCode other than the ExpectedTaskExitCode. This may
> cause the Task to Fail if it is set to Fail On Exit Code
> 
> thanks,
> Mike
the job completed ,it shows "Ethernet-NDIStest6.5(MPE)" job passed.

Comment 49 Min Deng 2011-09-20 02:28:23 UTC
Created attachment 523940 [details]
Tool's log

Comment 50 Min Deng 2011-09-20 02:45:41 UTC
Created attachment 523942 [details]
Standby

It's for the first command.

Comment 51 Min Deng 2011-09-20 05:47:45 UTC
Created attachment 523953 [details]
Tools log without running job (NDIS Test6.5 MPE)

Hi Yan,

   I collected two logs for you on the windows2008R2 guests and there wasn't any running job (MPE) on them.Any issues please let me know.

Best regards,
Min

Comment 52 Min Deng 2011-09-20 07:00:01 UTC
(In reply to comment #50)
> Created attachment 523942 [details]
> Standby
> 
> It's for the first command.

  Attach the second log for the second command.

Comment 53 Min Deng 2011-09-20 07:02:01 UTC
Created attachment 523960 [details]
Hibernate

Attach the second (Hibernate) log for the second command.

Comment 54 Yvugenfi@redhat.com 2011-09-20 08:42:49 UTC
Thank you!

Comment 55 Yvugenfi@redhat.com 2011-09-20 09:07:26 UTC
Please run "xbootmgr.exe - remove", otherwise the tool will be active on the guests and might effect performance.

Comment 56 Dor Laor 2011-09-27 09:40:36 UTC
Moving to 6.3. From recent report we understand that this does not happens always and sometimes we do pass. Thus this is not a blocker and can be moved to 6.3

Comment 62 Mike Cao 2011-10-11 04:58:58 UTC
(In reply to comment #60)
> Can you verify that the time test VMs is synchronized with controller and host
> macihne ?

I am sure that before testing ,all the host and guest's clock are same For DHCP+AD server in private env ,I modified clock manually and 
all of them in UTC +8.

Comment 65 Mike Cao 2011-12-20 06:54:42 UTC
move back to assign since we still got BOSD during runs

Comment 66 Yvugenfi@redhat.com 2011-12-20 13:33:25 UTC
Continuing the discussion with MS regarding the ERRATA\ MS providing a new fixed test driver.

Comment 68 Min Deng 2011-12-27 08:22:26 UTC
Thanks for developer's effort,QE still can reproduce the issue on windows 2008 32 bits and it never passed so far.

Comment 69 Min Deng 2011-12-27 08:23:19 UTC
with virtio-win-prewhql-0.1-20 driver,thanks.(In reply to comment #68)
> Thanks for developer's effort,QE still can reproduce the issue on windows 2008
> 32 bits and it never passed so far.
with virtio-win-prewhql-0.1-20 driver,thanks.

Comment 70 Mike Cao 2011-12-27 09:24:56 UTC
FYI。

with -16 ,All the guests could pass MPE job
with -17 ,All the guests could not pass MPE job
with  -20 ,win2k8-32 guest never pass MPE job

Any change between virtio-win-prewhql -16  -17 and -20 ?

Comment 71 Yvugenfi@redhat.com 2011-12-27 09:31:01 UTC
(In reply to comment #70)
> FYI。
> 
> with -16 ,All the guests could pass MPE job
> with -17 ,All the guests could not pass MPE job
> with  -20 ,win2k8-32 guest never pass MPE job
> 
> Any change between virtio-win-prewhql -16  -17 and -20 ?

 There are several changes and we must check the dump files very carefully. We introduced two features (published indices and RX IP checksum offload) and also some cosmetic change (describing device as 10G) that still can have effect on internal Windows mechanism.

Comment 72 Yvugenfi@redhat.com 2011-12-27 09:32:14 UTC
(In reply to comment #68)
> Thanks for developer's effort,QE still can reproduce the issue on windows 2008
> 32 bits and it never passed so far.

Please provide the full list of all failed guests and the dump files for each failure.

Thanks.

Comment 75 Mike Cao 2011-12-31 03:28:27 UTC
(In reply to comment #74)

> Bottom line:
> Nothing special for published events for now.
> No new problems observed.
> win2k8 32bit, that “never passes” – let’s give more attempts + collect dumps.

Hi, Yan 

You are right .The job 1/5 passed(very hard to pass ) .
I re-submit another 10 win2k8-32 MPE jobs to see what will happen.

Best Regards,
Mike

Comment 76 Ronen Hod 2012-01-08 18:52:11 UTC
We have an errata (2101).
The bug does not reproduce with ndprot62.sys that was built by MSFT from their current code (not officially released)
Enough is enough, closing.