Bug 988351
Summary: | [virtio-win]win2012 failed to resume after doing s4 on rhel7 host | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | guo jiang <jguo> | ||||
Component: | qemu-kvm | Assignee: | ybendito | ||||
qemu-kvm sub component: | General | QA Contact: | FuXiangChun <xfu> | ||||
Status: | CLOSED WONTFIX | Docs Contact: | |||||
Severity: | medium | ||||||
Priority: | medium | CC: | ailan, areis, chayang, jsnow, juzhang, knoel, kraxel, kwolf, kzhang, lijin, marcandre.lureau, virt-bugs, virt-maint, ybendito, yvugenfi | ||||
Version: | 8.0 | ||||||
Target Milestone: | rc | ||||||
Target Release: | 8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Known Issue | |||||
Doc Text: |
Cause: Some releases of Windows above Win7/2008R2 with IDE and 4G+ memory do not set IDE bus master on resume from S4. Some releases (for ex. Win10 1903) have a fix, some other have hotfixes, some (as 2012 at time of writing) do not. Furter hotfixes may contain solution for this problem.
Consequence: This causes resume from S4 to fail (immediate shutdown or stuck forever)
Workaround (if any): Use SeaBios build with ATA_DMA=y
Result:
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-03-05 14:47:29 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
guo jiang
2013-07-25 11:37:39 UTC
Created attachment 778234 [details]
failed to resume after doing s4
Even without any virtio-win devices,win2012 still hit this issue with -m 4G; If I change -m to 2G,win2012 can s4/s3 and resume correctly. package info: kernel-3.10.0-53.el7.x86_64 qemu-kvm-rhev-1.5.3-19.el7.x86_64 seabios-1.7.2.2-4.el7.x86_64 following is the qemu-kvm command: /usr/libexec/qemu-kvm -M pc -m 4G -smp 2,cores=2 -cpu Penryn -usb -device usb-tablet -drive file=win2012-balloon.qcow3,format=qcow2,if=none,id=drive0,boot=on,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive0,id=ide-blk-pci0,bootindex=1 -boot c -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -chardev socket,id=chardev1,path=/tmp/w2012-nic,server,nowait -mon chardev=chardev1,mode=readline -name win2012-balloon -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -spice disable-ticketing,port=5903 -vga qxl -global qxl-vga.revision=3 -monitor stdio -cdrom /usr/share/virtio-win/virtio-win.iso -netdev tap,id=hostnet1,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet1,id=net1,mac=00:52:81:10:22:11 lijin, (In reply to lijin from comment #6) > Even without any virtio-win devices,win2012 still hit this issue with -m 4G; > If I change -m to 2G,win2012 can s4/s3 and resume correctly. Thanks for the analysis. Can you also verify that it is not related to QCOW2v3 (In reply to Ronen Hod from comment #7) > lijin, > > (In reply to lijin from comment #6) > > Even without any virtio-win devices,win2012 still hit this issue with -m 4G; > > If I change -m to 2G,win2012 can s4/s3 and resume correctly. > > Thanks for the analysis. > Can you also verify that it is not related to QCOW2v3 retry with qcow2v3,qcow2 and raw images,all hit this issue. Although seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=835872 BUT happens also with https://support.microsoft.com/en-us/kb/2822241 Also happens with cache=none, usually just stops responding upon resume from hibernation, but sometimes BSOD happens (creates only minidump, although kernel dump configured) with access to invalid address during mem copy operation. Win 3G memory works OK. Reproducible with Win10 and Win8.1 with memory size of 4G (does not happen with 3G). What is worse, on Win8.1+ happens with shutdown also in case the 'fast boot) enabled. The reason is that in this case shutdown involves hibernation (if S4 enabled), so the next boot after shutdown is unsuccessful. The problem seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1411105 (the bug is for AHCI) but happens with IDE(!!) controller. Reproducible 100% also in case the ATA channel related to HDD is configured to work with PIO and does not execute any DMA operations (hibernation takes long time, but resume works only with 3G, not with 4G or 5G). The hibernation file is created and has reasonable size (as big as memory size approx). Tracing of attempt to resume from hibernation does not contain any suspicious events. Win10 consistently skips the resume from hibernation after reading from the hibernation file approx. 50K sectors (25M of the data). Surprizingly, Win7 does resume from hibernation with 4G and 5G, in the logs there is no major difference with Win10 (Win7 does not use mult reads, when Win10 does, but the same behavior we can see if mult operations are suppressed in qemu). I would be very helpful to have an advice from IDE maintainers - where to dig? ybendito: Can you post some updated information about your case and what you're seeing? - Is this i440fx or Q35? - On AHCI or IDE? Both? Does it crash with virtio-blk? virtio-scsi? - What does the crashing behavior look like? Is it a hang or a BSOD? - What is your command line? - What version(s) of QEMU are you testing with? If you can reproduce it using the upstream version, you can file a launchpad bug against the QEMU project to track it there. - What version(s) of KVM? (In reply to John Snow from comment #16) > ybendito: Can you post some updated information about your case and what > you're seeing? > > - Is this i440fx or Q35? i440fx (default) > - On AHCI or IDE? Both? Does it crash with virtio-blk? virtio-scsi? IDE, i.e. simplest setup which does not involve additional drivers for HDD > - What does the crashing behavior look like? Is it a hang or a BSOD? Reject to start from hibernation, just shutdown after attempt to resume; event viewer contains record of failure to resume > - What is your command line? > - What version(s) of QEMU are you testing with? If you can reproduce it > using the upstream version, you can file a launchpad bug against the QEMU > project to track it there. Upstream. Also downstream. > - What version(s) of KVM? What is version of KVM? Tried last time on kernel 4.11.3 I've investigated how upstream qemu behaves with several different releases on Windows: Windows builds: Win10 1903, Win10 1803, Server 2012 with full updates (i.e. kb2822241 applied) QEMU builds: selected builds from current upstream back to 2.12 I've found that on Win10 1903 hibernation with >= 4G RAM the resume from s4 works with all the QEMU builds I've found that on Windows 10 1803 and on 2012 in the conditions resume from s4 work only with SeaBios release 0.12.0 This (probably) COULD be related to the fact that SeaBios release 0.12.0 for qemu was (probably) built ATA_DMA=y when the default is ATA_DMA=n Now the question: where the bug is? Is it 1903 that fixed the bug in Windows that was present for long time or just 1903 includes a workaround for problem that exist in the SeaBios? (In reply to ybendito from comment #24) > I've investigated how upstream qemu behaves with several different releases > on Windows: > Windows builds: Win10 1903, Win10 1803, Server 2012 with full updates (i.e. > kb2822241 applied) > QEMU builds: selected builds from current upstream back to 2.12 > > I've found that on Win10 1903 hibernation with >= 4G RAM the resume from s4 > works with all the QEMU builds > I've found that on Windows 10 1803 and on 2012 in the conditions resume from > s4 work only with SeaBios release 0.12.0 > This (probably) COULD be related to the fact that SeaBios release 0.12.0 for > qemu was (probably) built ATA_DMA=y when the default is ATA_DMA=n > > Now the question: where the bug is? Is it 1903 that fixed the bug in Windows > that was present for long time or just 1903 includes a workaround for > problem that exist in the SeaBios? My guess would be windows versions older than 1903 didn't enable the busmaster bit in pci config space before doing dma. seabios does that with ATA_DMA=y, which probably serves as workaround for the windows bug. I suggest to close this BZ as can't fix. If solution needed - probably it is possible to issue some binary of SeaBios with ATA_DMA=y QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Based on comment #29 |