Bug 769163

Summary: [WHQL]Lots of jobs failed due to "unexpected reboot"
Product: Red Hat Enterprise Linux 6 Reporter: Mike Cao <bcao>
Component: virtio-winAssignee: Vadim Rozenfeld <vrozenfe>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 6.3CC: acathrow, bcao, bsarathy, dawu, gleb, jasowang, lijin, mdeng, michen, rhod
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-10 13:58:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Mike Cao 2011-12-20 06:21:03 UTC
Description of problem:


Version-Release number of selected component (if applicable):
virtio-win-prewhql-0.1.20


How reproducible:
sometimes 

Steps to Reproduce:
1.win2k8R2 - CHAOS-Concurrent Hardware And OS Test (278)
2.win2k8_64 - Common Scenario Stress With IO(676)
3.win2k8_32 - Sleep_Stress_With_IO (672)
4.Win7_64_  - CHAO-Concurrent Hardware And OS Test (278)
  
Actual results:
job failed .and logs are similiar.

Root Cause 
2 12/19/2011 6:26:39 PM Execution Agent 20-7-BLK Task Cancelled Because of an Unexpected Reboot

Machine Rebooted Unexpectedly when Task "RunJob - Pnpdtest with concurrent IO in parallel with DevPathExer - Library Job" was running

 
Resolution 
This computer is found to have had an unexpected reboot. None of the tasks that were running before the reboot requested a reboot. 
The job will be cancelled as a result.  



Additional info:
1.this is not a regression or test blocker
2.hit this issue every cycle 
3.can not 100% reproduce .

Comment 2 Mike Cao 2011-12-20 08:41:41 UTC
 job Common Scenario Stress With DiskIO(627) for win2k3_32 also failed with the same log

Comment 3 Vadim Rozenfeld 2011-12-20 13:16:15 UTC
(In reply to comment #2)
>  job Common Scenario Stress With DiskIO(627) for win2k3_32 also failed with the
> same log

Hi Mike,

Could you please upload log files?
Btw, does it happen with viostor driver only?

Best regards,
Vadim.

Comment 4 Mike Cao 2011-12-21 05:57:37 UTC
(In reply to comment #3)
> (In reply to comment #2)
> >  job Common Scenario Stress With DiskIO(627) for win2k3_32 also failed with the
> > same log
> 
> Hi Mike,
> 
> Could you please upload log files?
> Btw, does it happen with viostor driver only?
> 
> Best regards,
> Vadim.

Hi, Vadim

I check all the failed job ,there is no task log for those jobs .
I check the history test results ,found that  win2k8-32 NDIS Test6.5(MPE) also hit this issue with virtio-win-prewhql-0-1-17.

Best Regards,
Mike

Comment 5 Mike Cao 2011-12-28 06:12:42 UTC
Hit this issue when win2k8 32bit guest running sleep stress with IO during vioserial whql test .
But the failed job was deleted by some mistaken operations ,I will provide cpk file when hit this issue Next time

Comment 6 Mike Cao 2011-12-29 01:45:52 UTC
Created attachment 549888 [details]
win2k8_32_unexpected_reboot error

the attachment is the cpk file of win2k8 32bit guest when running balloon .job Sleep Stress with IO hit this issue .
win2k8 64bit guest hit this issue too.

Comment 7 Vadim Rozenfeld 2012-01-05 09:18:56 UTC
Looks like this problem is not related to any particular virtio driver,
but a generic problem mostly related to QEMU/DTM settings or whatever else.

But we need to figure out what's wrong here before we start
intensive WHQL testing.

Vadim.

Comment 8 Mike Cao 2012-02-08 05:35:10 UTC
(In reply to comment #7)
> Looks like this problem is not related to any particular virtio driver,
> but a generic problem mostly related to QEMU/DTM settings or whatever else.
> 
> But we need to figure out what's wrong here before we start
> intensive WHQL testing.
> 
> Vadim.

FYI, this bug might related to bug https://bugzilla.redhat.com/show_bug.cgi?id=607510

Comment 9 Vadim Rozenfeld 2012-02-08 08:00:19 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > Looks like this problem is not related to any particular virtio driver,
> > but a generic problem mostly related to QEMU/DTM settings or whatever else.
> > 
> > But we need to figure out what's wrong here before we start
> > intensive WHQL testing.
> > 
> > Vadim.
> 
> FYI, this bug might related to bug
> https://bugzilla.redhat.com/show_bug.cgi?id=607510

Thank you.

Will keep my eyes open on this bug.

Vadim.

Comment 15 Ronen Hod 2012-07-10 09:13:56 UTC
Mike,

Is this still an issue?
Does it also happen with HCK?

Thanks.

Comment 16 Mike Cao 2012-07-10 13:50:48 UTC
(In reply to comment #15)
> Mike,
> 
> Is this still an issue?
> Does it also happen with HCK?
> 
> Thanks.

No. shall we close this as worksforme ?

Thanks,
Mike

Comment 17 Mike Cao 2012-08-20 03:09:34 UTC
FYI ,I found this is a MSFT known issue . Detailed referring to http://msdn.microsoft.com/en-us/library/windows/hardware/hh852378.aspx#X-201205241641326
Power management tests may fail with "Task Cancelled because of an Unexpected Reboot" Error. This error applies to the following tests:

Device Tests:
Sleep and PNP (disable and enable) with IO Before and After (Certification)
Concurrent Hardware And Operating System (CHAOS) Test (Certification)
System Tests:
System - Sleep and PNP (disable and enable) with IO Before and After (Certification)
System - Sleep with IO Before and After (Certification)
Software Device (Filter) Tests:
Sleep and PNP (disable and enable) with IO Before and After (Certification - Software Device)
These tests may not produce any logs, and right clicking on "Run Test" task and clicking on "Error" option may show the following:

Cause: Machine Rebooted Unexpected when Task "Run Test" was running

Failure: Task Cancelled Because of an Unexpected Reboot"

If you see this error, please review event logs under "Windows Logs->System" from Event Viewer. If event logs contain the following log entry, then you may qualify for a manual errata:

"Windows failed to resume from hibernate with error status <status code>"

Please ensure that the log entry is from the precise time when the test was running.

After confirming the exact error string in event logs mentioned above, and confirming the timing of the log entry matches with when the test was running, please submit the HCKX file and event logs so we can manually confirm the same, and accept your submission for processing. Please refer to this note in the README file with the submission.

IMPORTANT: "Task Cancelled Because Of An Unexpected Reboot" error is also seen when the system bugchecks during the test run. This note does NOT apply to the case when the system bugchecks during the test run. If you do not see "Windows failed to resume from hibernate" message in event logs, please rerun the tests with the test system connected to a Kernel Debugger. This will cause the system to break into the debugger if your test system bugchecks during the test run.