Bug 988351

Summary: [virtio-win]win2012 failed to resume after doing s4 on rhel7 host
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: guo jiang <jguo>
Component: qemu-kvmAssignee: ybendito
qemu-kvm sub component: General QA Contact: FuXiangChun <xfu>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: medium CC: ailan, areis, chayang, jsnow, juzhang, knoel, kraxel, kwolf, kzhang, lijin, marcandre.lureau, virt-bugs, virt-maint, ybendito, yvugenfi
Version: 8.0   
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Cause: Some releases of Windows above Win7/2008R2 with IDE and 4G+ memory do not set IDE bus master on resume from S4. Some releases (for ex. Win10 1903) have a fix, some other have hotfixes, some (as 2012 at time of writing) do not. Furter hotfixes may contain solution for this problem. Consequence: This causes resume from S4 to fail (immediate shutdown or stuck forever) Workaround (if any): Use SeaBios build with ATA_DMA=y Result:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-05 14:47:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
failed to resume after doing s4 none

Description guo jiang 2013-07-25 11:37:39 UTC
Description of problem:
win2012-64 guest with qcow2v3 format image, failed to resume after doing s4, guest stuck in boot screen without BSOD.(screenshot will be uploaded)   

Version-Release number of selected component (if applicable):
   Red Hat Enterprise Linux Server release 7.0 Beta(Maipo)
   kernel-3.10.0-2.el7.x86_64 
   qemu-kvm-tools-1.5.1-2.el7.x86_64
   virtio-win-prewhql-0.1-65
   spice-server-0.12.3-1.el7.x86_64
   seabios-1.7.2-2.el7.x86_64
   vgabios-0.6c-9.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1.Boot guest with CLI:
/usr/libexec/qemu-kvm -M pc -m 4G -smp 4,cores=4 -cpu SandyBridge -usb -device usb-tablet -netdev tap,sndbuf=0,id=hostnet2,vhost=on,script=/etc/qemu-ifup,downscript=no -device virtio-net-pci,netdev=hostnet2,mac=00:32:15:12:56:a2,bus=pci.0,addr=0x6,id=virtio-net-pci0 -uuid a2c49844-967e-4e28-be8a-96e0153ff080 -chardev socket,id=aaaa,path=/tmp/monitor-win2012-netkvm,server,nowait -mon chardev=aaaa,mode=readline -name win2012-netkvm -vnc :2 -vga cirrus -enable-kvm -rtc base=localtime,clock=host,driftfix=slew -drive file=win2012.qcow2v3,if=none,id=drive-ide0-0-0,format=qcow2,rerror=stop,werror=stop,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -global kvm-pit.lost_tick_policy=discard -monitor stdio

2.Do s4

3.After qemu quit, reboot guest with the same CLI.

Actual results:
Guest failed to resume.

Expected results:
Guest could resume successfully.

Additional info:
 1.win2012 on rhel6 host is ok.
 2.win2k8 on rhel7 host is ok.

Comment 1 guo jiang 2013-07-25 11:38:37 UTC
Created attachment 778234 [details]
failed to resume after doing s4

Comment 6 lijin 2013-11-27 08:05:45 UTC
Even without any virtio-win devices,win2012 still hit this issue with -m 4G;
If I change -m to 2G,win2012 can s4/s3 and resume correctly.

package info:
    kernel-3.10.0-53.el7.x86_64
    qemu-kvm-rhev-1.5.3-19.el7.x86_64
    seabios-1.7.2.2-4.el7.x86_64

following is the qemu-kvm command:
/usr/libexec/qemu-kvm -M pc -m 4G -smp 2,cores=2 -cpu Penryn -usb -device usb-tablet -drive file=win2012-balloon.qcow3,format=qcow2,if=none,id=drive0,boot=on,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive0,id=ide-blk-pci0,bootindex=1 -boot c -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -chardev socket,id=chardev1,path=/tmp/w2012-nic,server,nowait -mon chardev=chardev1,mode=readline -name win2012-balloon -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -spice disable-ticketing,port=5903 -vga qxl -global qxl-vga.revision=3 -monitor stdio -cdrom /usr/share/virtio-win/virtio-win.iso -netdev tap,id=hostnet1,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet1,id=net1,mac=00:52:81:10:22:11

Comment 7 Ronen Hod 2013-11-27 11:38:04 UTC
lijin,

(In reply to lijin from comment #6)
> Even without any virtio-win devices,win2012 still hit this issue with -m 4G;
> If I change -m to 2G,win2012 can s4/s3 and resume correctly.

Thanks for the analysis.
Can you also verify that it is not related to QCOW2v3

Comment 8 lijin 2013-11-27 23:31:31 UTC
(In reply to Ronen Hod from comment #7)
> lijin,
> 
> (In reply to lijin from comment #6)
> > Even without any virtio-win devices,win2012 still hit this issue with -m 4G;
> > If I change -m to 2G,win2012 can s4/s3 and resume correctly.
> 
> Thanks for the analysis.
> Can you also verify that it is not related to QCOW2v3

retry with qcow2v3,qcow2 and raw images,all hit this issue.

Comment 12 ybendito 2016-11-21 11:59:46 UTC
Although seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=835872 BUT
happens also with https://support.microsoft.com/en-us/kb/2822241

Also happens with cache=none, usually just stops responding upon resume from hibernation, but sometimes BSOD happens (creates only minidump, although kernel dump configured) with access to invalid address during mem copy operation.

Win 3G memory works OK.

Comment 13 ybendito 2018-07-01 09:06:57 UTC
Reproducible with Win10 and Win8.1 with memory size of 4G (does not happen with 3G). What is worse, on Win8.1+ happens with shutdown also in case the 'fast boot) enabled. The reason is that in this case shutdown involves hibernation (if S4 enabled), so the next boot after shutdown is unsuccessful.
The problem seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1411105 (the bug is for AHCI) but happens with IDE(!!) controller.

Comment 14 ybendito 2018-07-09 07:05:19 UTC
Reproducible 100% also in case the ATA channel related to HDD is configured to work with PIO and does not execute any DMA operations (hibernation takes long time, but resume works only with 3G, not with 4G or 5G). The hibernation file is created and has reasonable size (as big as memory size approx).
Tracing of attempt to resume from hibernation does not contain any suspicious events.
Win10 consistently skips the resume from hibernation after reading from the hibernation file approx. 50K sectors (25M of the data).

Surprizingly, Win7 does resume from hibernation with 4G and 5G, in the logs there is no major difference with Win10 (Win7 does not use mult reads, when Win10 does, but the same behavior we can see if mult operations are suppressed in qemu).

I would be very helpful to have an advice from IDE maintainers - where to dig?

Comment 16 John Snow 2018-07-10 22:42:56 UTC
ybendito: Can you post some updated information about your case and what you're seeing?

- Is this i440fx or Q35?
- On AHCI or IDE? Both? Does it crash with virtio-blk? virtio-scsi?
- What does the crashing behavior look like? Is it a hang or a BSOD?
- What is your command line?
- What version(s) of QEMU are you testing with? If you can reproduce it using the upstream version, you can file a launchpad bug against the QEMU project to track it there.
- What version(s) of KVM?

Comment 19 ybendito 2018-07-11 07:10:17 UTC
(In reply to John Snow from comment #16)
> ybendito: Can you post some updated information about your case and what
> you're seeing?
> 
> - Is this i440fx or Q35?
i440fx (default)
> - On AHCI or IDE? Both? Does it crash with virtio-blk? virtio-scsi?
IDE, i.e. simplest setup which does not involve additional drivers for HDD
> - What does the crashing behavior look like? Is it a hang or a BSOD?
Reject to start from hibernation, just shutdown after attempt to resume; event viewer contains record of failure to resume
> - What is your command line?
> - What version(s) of QEMU are you testing with? If you can reproduce it
> using the upstream version, you can file a launchpad bug against the QEMU
> project to track it there.
Upstream. Also downstream.
> - What version(s) of KVM?
What is version of KVM? Tried last time on kernel 4.11.3

Comment 24 ybendito 2019-06-15 10:44:15 UTC
I've investigated how upstream qemu behaves with several different releases on Windows:
Windows builds: Win10 1903, Win10 1803, Server 2012 with full updates (i.e. kb2822241 applied)
QEMU builds: selected builds from current upstream back to 2.12

I've found that on Win10 1903 hibernation with >= 4G RAM the resume from s4 works with all the QEMU builds
I've found that on Windows 10 1803 and on 2012 in the conditions resume from s4 work only with SeaBios release 0.12.0
This (probably) COULD be related to the fact that SeaBios release 0.12.0 for qemu was (probably) built ATA_DMA=y when the default is ATA_DMA=n

Now the question: where the bug is? Is it 1903 that fixed the bug in Windows that was present for long time or just 1903 includes a workaround for problem that exist in the SeaBios?

Comment 27 Gerd Hoffmann 2019-06-18 10:50:36 UTC
(In reply to ybendito from comment #24)
> I've investigated how upstream qemu behaves with several different releases
> on Windows:
> Windows builds: Win10 1903, Win10 1803, Server 2012 with full updates (i.e.
> kb2822241 applied)
> QEMU builds: selected builds from current upstream back to 2.12
> 
> I've found that on Win10 1903 hibernation with >= 4G RAM the resume from s4
> works with all the QEMU builds
> I've found that on Windows 10 1803 and on 2012 in the conditions resume from
> s4 work only with SeaBios release 0.12.0
> This (probably) COULD be related to the fact that SeaBios release 0.12.0 for
> qemu was (probably) built ATA_DMA=y when the default is ATA_DMA=n
> 
> Now the question: where the bug is? Is it 1903 that fixed the bug in Windows
> that was present for long time or just 1903 includes a workaround for
> problem that exist in the SeaBios?

My guess would be windows versions older than 1903 didn't enable the busmaster
bit in pci config space before doing dma.  seabios does that with ATA_DMA=y,
which probably serves as workaround for the windows bug.

Comment 29 ybendito 2019-06-27 05:24:54 UTC
I suggest to close this BZ as can't fix.
If solution needed - probably it is possible to issue some binary of SeaBios with ATA_DMA=y

Comment 30 Ademar Reis 2020-02-05 22:40:11 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 31 Yvugenfi@redhat.com 2020-03-05 14:47:29 UTC
Based on comment #29