Bug 808463
Summary: | document that transient hotplugged devices don't persist through s4 state | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dave Allan <dallan> |
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.4 | CC: | ajia, amit.shah, cwei, dyuan, gsun, imammedo, jdenemar, mkletzan, mprivozn, mzhan, rbalakri, shyu, weizhan, xuzhang, zhwang |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-0.10.2-36.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-10-14 04:13:57 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 766958, 808680, 827499, 839661 | ||
Bug Blocks: | 912287 |
Description
Dave Allan
2012-03-30 13:27:04 UTC
(In reply to comment #0) > If a user hotplugs a device into a VM without specifying persistent, the device > will disappear if the guest enters s4. From libvirt's perspective, that makes > sense--the hypervisor has exited, but from the user's perspective, it's likely > to be surprising behavior. It's also wrong from the guest (at least Linux) perspective: resuming from S4 is not supported over changed hardware. I understand why this is problem, and even know what is causing it: guest enters s4 -> qemu emits STOP event (no difference to regular shutdown) -> libvirt releases all transient devices and return them tho the host. On the next boot libvirt won't attach the same transient devices as transient devices aren't supposed to survive shutdown. So I guess we have two options here: 1) document as known limitation (if qemu decides to not publicize S4 state) 2) if qemu decides to emit an s4 event -not trivial and IMO impossible for all cases, but still may be possible for some - we can fix this for those cases. Therefore I am suggesting to wait for a while and see how qemu decides/deal with this so we can choose our best possibility. See bug 808680 for more details. (In reply to comment #6) > So I guess we have two options here: > 1) document as known limitation (if qemu decides to not publicize S4 state) > > 2) if qemu decides to emit an s4 event -not trivial and IMO impossible for > all cases, but still may be possible for some - we can fix this for those > cases. > > Therefore I am suggesting to wait for a while and see how qemu decides/deal > with this so we can choose our best possibility. See bug 808680 for more > details. 3rd option would be to use options added by: qemu upstream fix that will allow to enable/disable S3/S4 on startup commit 459ae5ea5ad682c2b3220beb244d4102c1a4e332 And if guest started with s4 support then libvirt should make persistent changes to cpu count on hotlug. If guest started without s4 then allow non-persistent cpu count. But IMHO persistent cpu count for all cases would be better. (In reply to comment #8) > (In reply to comment #6) > > So I guess we have two options here: > > 1) document as known limitation (if qemu decides to not publicize S4 state) > > > > 2) if qemu decides to emit an s4 event -not trivial and IMO impossible for > > all cases, but still may be possible for some - we can fix this for those > > cases. > > > > Therefore I am suggesting to wait for a while and see how qemu decides/deal > > with this so we can choose our best possibility. See bug 808680 for more > > details. > > 3rd option would be to use options added by: > > qemu upstream fix that will allow to enable/disable S3/S4 on startup > commit 459ae5ea5ad682c2b3220beb244d4102c1a4e332 > > And if guest started with s4 support then libvirt should make persistent > changes to cpu count on hotlug. If guest started without s4 then allow > non-persistent cpu count. > But IMHO persistent cpu count for all cases would be better. Correct me if I am wrong, but I think that commit just let users set if bios advertises s3/s4 states or not. If this is the case, it won't help as starting guest with s4 advertised doesn't actually mean guest will ever enter it. The support for controlling QEMU BIOS advertisement of S3 and S4 states is already in upstream libvirt, but as was mentioned by Michal in comment #9 that'll not help with this particular issue. Another problem might be that even when the BIOS says it doesn't support S4, the guest can work around this by powering off, but that's different issue. For helping with this bug QEMU already supports SUSPEND_DISK event as discussed here: http://lists.nongnu.org/archive/html/qemu-devel/2012-07/msg02323.html with the patch for QMP being sent here http://lists.nongnu.org/archive/html/qemu-devel/2012-08/msg01630.html I have the bug #839661 which I plan to implement ASAP. With the event being known in libvirt I believe we can make this one disappear as well, but only when the the guest really uses S4 and not poweroff (which, IMHO, is the default for many non-Linux OSes). I've made a test with F17; If S4 is advertised by BIOS, then the kernel correctly use it and subsequently qemu emits SUSPEND_DISK event. However, if I turn it off (by adding <pm> <suspend-to-disk enabled='no'/> </pm> into the domain xml) then the kernel uses just poweroff. Hence, no suspend event from qemu. The same applies for Win7, surprisingly. Having said that - I think it's reasonable to fix this for those OSes that play nicely. And those which don't - well, we can't do anything. However, there is a problem related to this. Libvirt keeps state XML under /var/run; The live XML contains all these hotplugged devices and basically all runtime infos. It is okay for now: if libvirtd gets restarted all state XMLs are there, and if the host get rebooted, content of /var/run is gone, but there is no machine running so we are in consistent state. But if a domain gets suspended, the host is rebooted we will lost the state XML - so we need to keep state XMLs under different path than /var/run. This area seems to be a problem without a clear solution, so I think the best option is to document that transiently hotplugged devices don't play well with S4. Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2014-April/msg01156.html As for comment 18, there are 4 updates, all of them are add the following sentence in 4 parts: + * Be aware that hotplug changes might not persist across a domain going + * into S4 state (also known as hibernation) unless you also modify the + * persistent domain definition. As you can see in the following 4 parts, the first three part are OK. Only the 4 part seems like be updated in the wrong place, it's better keep same format with the first three part, should not update the sentence in "returen". Part 1: http://libvirt.org/html/libvirt-libvirt.html#virDomainAttachDevice ...... For compatibility, this method can also be used to change the media in an existing CDROM/Floppy device, however, applications are recommended to use the virDomainUpdateDeviceFlag method instead. Be aware that hotplug changes might not persist across a domain going into S4 state (also known as hibernation) unless you also modify the persistent domain definition. domain pointer to domain object xml pointer to XML description of one device Returns 0 in case of success, -1 in case of failure. ...... Part 2: http://libvirt.org/html/libvirt-libvirt.html#virDomainAttachDeviceFlags ...... For compatibility, this method can also be used to change the media in an existing CDROM/Floppy device, however, applications are recommended to use the virDomainUpdateDeviceFlag method instead. Be aware that hotplug changes might not persist across a domain going into S4 state (also known as hibernation) unless you also modify the persistent domain definition. domain pointer to domain object xml pointer to XML description of one device flags bitwise-OR of virDomainDeviceModifyFlags Returns 0 in case of success, -1 in case of failure ...... Part 3: http://libvirt.org/html/libvirt-libvirt.html#virDomainDetachDevice ...... Destroy a virtual device attachment to backend. This function, having hot-unplug semantics, is only allowed on an active domain. Be aware that hotplug changes might not persist across a domain going into S4 state (also known as hibernation) unless you also modify the persistent domain definition. domain pointer to domain object xml pointer to XML description of one device Returns 0 in case of success, -1 in case of failure. ...... part 4: http://libvirt.org/html/libvirt-libvirt.html#virDomainDetachDeviceFlags ...... Beware that depending on the hypervisor and device type, detaching a device from a running domain may be asynchronous. That is, calling virDomainDetachDeviceFlags may just request device removal while the device is actually removed later (in cooperation with a guest OS). Previously, this fact was ignored and the device could have been removed from domain configuration before it was actually removed by the hypervisor causing various failures on subsequent operations. To check whether the device was successfully removed, either recheck domain configuration using virDomainGetXMLDesc() or add handler for VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event. In case the device is already gone when virDomainDetachDeviceFlags domain pointer to domain object xml pointer to XML description of one device flags bitwise-OR of virDomainDeviceModifyFlags Returns the event is delivered before this API call ends. To help existing clients work better in most cases, this API will try to transform an asynchronous device removal that finishes shortly after the request into a synchronous removal. In other words, this API may wait a bit for the removal to complete in case it was not synchronous. Be aware that hotplug changes might not persist across a domain going into S4 state (also known as hibernation) unless you also modify the persistent domain definition. Returns 0 in case of success, -1 in case of failure. ...... (In reply to Zhang Xuesong from comment #20) > ... > part 4: > http://libvirt.org/html/libvirt-libvirt.html#virDomainDetachDeviceFlags > ...... Heh, what a funny bug. Anyway, I've proposed a patch on the upstream list: https://www.redhat.com/archives/libvir-list/2014-May/msg00341.html Moving to POST again: http://post-office.corp.redhat.com/archives/rhvirt-patches/2014-May/msg00322.html Verify steps is same with comment 20, all the 4 update part are ok now. As for the part 4, it display correctly now: Part 4: http://libvirt.org/html/libvirt-libvirt.html#virDomainDetachDeviceFlags ........ Beware that depending on the hypervisor and device type, detaching a device from a running domain may be asynchronous. That is, calling virDomainDetachDeviceFlags may just request device removal while the device is actually removed later (in cooperation with a guest OS). Previously, this fact was ignored and the device could have been removed from domain configuration before it was actually removed by the hypervisor causing various failures on subsequent operations. To check whether the device was successfully removed, either recheck domain configuration using virDomainGetXMLDesc() or add handler for VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event. In case the device is already gone when virDomainDetachDeviceFlags returns, the event is delivered before this API call ends. To help existing clients work better in most cases, this API will try to transform an asynchronous device removal that finishes shortly after the request into a synchronous removal. In other words, this API may wait a bit for the removal to complete in case it was not synchronous. Be aware that hotplug changes might not persist across a domain going into S4 state (also known as hibernation) unless you also modify the persistent domain definition. domain pointer to domain object xml pointer to XML description of one device flags bitwise-OR of virDomainDeviceModifyFlags Returns 0 in case of success, -1 in case of failure. ........ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1374.html |