Bug 808463

Summary: document that transient hotplugged devices don't persist through s4 state
Product: Red Hat Enterprise Linux 6 Reporter: Dave Allan <dallan>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4CC: ajia, amit.shah, cwei, dyuan, gsun, imammedo, jdenemar, mkletzan, mprivozn, mzhan, rbalakri, shyu, weizhan, xuzhang, zhwang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.10.2-36.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-14 04:13:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 766958, 808680, 827499, 839661    
Bug Blocks: 912287    

Description Dave Allan 2012-03-30 13:27:04 UTC
If a user hotplugs a device into a VM without specifying persistent, the device will disappear if the guest enters s4.  From libvirt's perspective, that makes sense--the hypervisor has exited, but from the user's perspective, it's likely to be surprising behavior.

Comment 2 Amit Shah 2012-03-30 16:31:23 UTC
(In reply to comment #0)
> If a user hotplugs a device into a VM without specifying persistent, the device
> will disappear if the guest enters s4.  From libvirt's perspective, that makes
> sense--the hypervisor has exited, but from the user's perspective, it's likely
> to be surprising behavior.

It's also wrong from the guest (at least Linux) perspective: resuming from S4 is not supported over changed hardware.

Comment 3 Michal Privoznik 2012-03-31 05:38:18 UTC
I understand why this is problem, and even know what is causing it:
guest enters s4 -> qemu emits STOP event (no difference to regular shutdown) -> libvirt releases all transient devices and return them tho the host. On the next boot libvirt won't attach the same transient devices as transient devices aren't supposed to survive shutdown.

Comment 6 Michal Privoznik 2012-05-14 17:16:12 UTC
So I guess we have two options here:
1) document as known limitation (if qemu decides to not publicize S4 state)

2) if qemu decides to emit an s4 event -not trivial and IMO impossible for all cases, but still may be possible for some - we can fix this for those cases.

Therefore I am suggesting to wait for a while and see how qemu decides/deal with this so we can choose our best possibility. See bug 808680 for more details.

Comment 8 Igor Mammedov 2012-09-04 09:50:34 UTC
(In reply to comment #6)
> So I guess we have two options here:
> 1) document as known limitation (if qemu decides to not publicize S4 state)
> 
> 2) if qemu decides to emit an s4 event -not trivial and IMO impossible for
> all cases, but still may be possible for some - we can fix this for those
> cases.
> 
> Therefore I am suggesting to wait for a while and see how qemu decides/deal
> with this so we can choose our best possibility. See bug 808680 for more
> details.

3rd option would be to use options added by:

qemu upstream fix that will allow to enable/disable S3/S4 on startup
commit 459ae5ea5ad682c2b3220beb244d4102c1a4e332

And if guest started with s4 support then libvirt should make persistent changes to cpu count on hotlug. If guest started without s4 then allow non-persistent cpu count.
But IMHO persistent cpu count for all cases would be better.

Comment 9 Michal Privoznik 2012-09-04 11:52:45 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > So I guess we have two options here:
> > 1) document as known limitation (if qemu decides to not publicize S4 state)
> > 
> > 2) if qemu decides to emit an s4 event -not trivial and IMO impossible for
> > all cases, but still may be possible for some - we can fix this for those
> > cases.
> > 
> > Therefore I am suggesting to wait for a while and see how qemu decides/deal
> > with this so we can choose our best possibility. See bug 808680 for more
> > details.
> 
> 3rd option would be to use options added by:
> 
> qemu upstream fix that will allow to enable/disable S3/S4 on startup
> commit 459ae5ea5ad682c2b3220beb244d4102c1a4e332
> 
> And if guest started with s4 support then libvirt should make persistent
> changes to cpu count on hotlug. If guest started without s4 then allow
> non-persistent cpu count.
> But IMHO persistent cpu count for all cases would be better.

Correct me if I am wrong, but I think that commit just let users set if bios advertises s3/s4 states or not. If this is the case, it won't help as starting guest with s4 advertised doesn't actually mean guest will ever enter it.

Comment 11 Martin Kletzander 2012-09-05 06:07:34 UTC
The support for controlling QEMU BIOS advertisement of S3 and S4 states is already in upstream libvirt, but as was mentioned by Michal in comment #9 that'll not help with this particular issue.  Another problem might be that even when the BIOS says it doesn't support S4, the guest can work around this by powering off, but that's different issue.  For helping with this bug QEMU already supports SUSPEND_DISK event as discussed here: http://lists.nongnu.org/archive/html/qemu-devel/2012-07/msg02323.html with the patch for QMP being sent here http://lists.nongnu.org/archive/html/qemu-devel/2012-08/msg01630.html

I have the bug #839661 which I plan to implement ASAP.  With the event being known in libvirt I believe we can make this one disappear as well, but only when the the guest really uses S4 and not poweroff (which, IMHO, is the default for many non-Linux OSes).

Comment 12 Michal Privoznik 2012-09-05 08:24:01 UTC
I've made a test with F17; If S4 is advertised by BIOS, then the kernel correctly use it and subsequently qemu emits SUSPEND_DISK event. However, if I turn it off (by adding   <pm> <suspend-to-disk enabled='no'/> </pm> into the domain xml) then the kernel uses just poweroff. Hence, no suspend event from qemu. The same applies for Win7, surprisingly.

Having said that - I think it's reasonable to fix this for those OSes that play nicely. And those which don't - well, we can't do anything.

However, there is a problem related to this. Libvirt keeps state XML under /var/run; The live XML contains all these hotplugged devices and basically all runtime infos. It is okay for now: if libvirtd gets restarted all state XMLs are there, and if the host get rebooted, content of /var/run is gone, but there is no machine running so we are in consistent state. But if a domain gets suspended, the host is rebooted we will lost the state XML - so we need to keep state XMLs under different path than /var/run.

Comment 14 Dave Allan 2013-08-05 15:09:40 UTC
This area seems to be a problem without a clear solution, so I think the best option is to document that transiently hotplugged devices don't play well with S4.

Comment 17 Michal Privoznik 2014-04-30 13:55:35 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2014-April/msg01156.html

Comment 20 Xuesong Zhang 2014-05-12 07:14:38 UTC
As for comment 18, there are 4 updates, all of them are add the following sentence in 4 parts:
+ * Be aware that hotplug changes might not persist across a domain going
+ * into S4 state (also known as hibernation) unless you also modify the
+ * persistent domain definition.


As you can see in the following 4 parts, the first three part are OK. Only the 4 part seems like be updated in the wrong place, it's better keep same format with the first three part, should not update the sentence in "returen".




Part 1:
http://libvirt.org/html/libvirt-libvirt.html#virDomainAttachDevice
......
For compatibility, this method can also be used to change the media in an existing CDROM/Floppy device, however, applications are recommended to use the virDomainUpdateDeviceFlag method instead.

Be aware that hotplug changes might not persist across a domain going into S4 state (also known as hibernation) unless you also modify the persistent domain definition.

domain
    pointer to domain object
xml
    pointer to XML description of one device
Returns
    0 in case of success, -1 in case of failure.
......



Part 2:
http://libvirt.org/html/libvirt-libvirt.html#virDomainAttachDeviceFlags
......
For compatibility, this method can also be used to change the media in an existing CDROM/Floppy device, however, applications are recommended to use the virDomainUpdateDeviceFlag method instead.

Be aware that hotplug changes might not persist across a domain going into S4 state (also known as hibernation) unless you also modify the persistent domain definition.

domain
    pointer to domain object
xml
    pointer to XML description of one device
flags
    bitwise-OR of virDomainDeviceModifyFlags
Returns
    0 in case of success, -1 in case of failure
......



Part 3:
http://libvirt.org/html/libvirt-libvirt.html#virDomainDetachDevice
......
Destroy a virtual device attachment to backend. This function, having hot-unplug semantics, is only allowed on an active domain.

Be aware that hotplug changes might not persist across a domain going into S4 state (also known as hibernation) unless you also modify the persistent domain definition.

domain
    pointer to domain object
xml
    pointer to XML description of one device
Returns
    0 in case of success, -1 in case of failure.
......



part 4:
http://libvirt.org/html/libvirt-libvirt.html#virDomainDetachDeviceFlags
......
Beware that depending on the hypervisor and device type, detaching a device from a running domain may be asynchronous. That is, calling virDomainDetachDeviceFlags may just request device removal while the device is actually removed later (in cooperation with a guest OS). Previously, this fact was ignored and the device could have been removed from domain configuration before it was actually removed by the hypervisor causing various failures on subsequent operations. To check whether the device was successfully removed, either recheck domain configuration using virDomainGetXMLDesc() or add handler for VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event. In case the device is already gone when virDomainDetachDeviceFlags

domain
    pointer to domain object
xml
    pointer to XML description of one device
flags
    bitwise-OR of virDomainDeviceModifyFlags
Returns
    the event is delivered before this API call ends. To help existing clients work better in most cases, this API will try to transform an asynchronous device removal that finishes shortly after the request into a synchronous removal. In other words, this API may wait a bit for the removal to complete in case it was not synchronous. Be aware that hotplug changes might not persist across a domain going into S4 state (also known as hibernation) unless you also modify the persistent domain definition. Returns 0 in case of success, -1 in case of failure.
......

Comment 21 Michal Privoznik 2014-05-12 12:31:56 UTC
(In reply to Zhang Xuesong from comment #20)
> ...
> part 4:
> http://libvirt.org/html/libvirt-libvirt.html#virDomainDetachDeviceFlags
> ......

Heh, what a funny bug. Anyway, I've proposed a patch on the upstream list:

https://www.redhat.com/archives/libvir-list/2014-May/msg00341.html

Comment 22 Michal Privoznik 2014-05-13 08:09:16 UTC
Moving to POST again:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2014-May/msg00322.html

Comment 24 Xuesong Zhang 2014-05-20 03:00:29 UTC
Verify steps is same with comment 20, all the 4 update part are ok now.
As for the part 4, it display correctly now:

Part 4:
http://libvirt.org/html/libvirt-libvirt.html#virDomainDetachDeviceFlags
........
Beware that depending on the hypervisor and device type, detaching a device from a running domain may be asynchronous. That is, calling virDomainDetachDeviceFlags may just request device removal while the device is actually removed later (in cooperation with a guest OS). Previously, this fact was ignored and the device could have been removed from domain configuration before it was actually removed by the hypervisor causing various failures on subsequent operations. To check whether the device was successfully removed, either recheck domain configuration using virDomainGetXMLDesc() or add handler for VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event. In case the device is already gone when virDomainDetachDeviceFlags returns, the event is delivered before this API call ends. To help existing clients work better in most cases, this API will try to transform an asynchronous device removal that finishes shortly after the request into a synchronous removal. In other words, this API may wait a bit for the removal to complete in case it was not synchronous.

Be aware that hotplug changes might not persist across a domain going into S4 state (also known as hibernation) unless you also modify the persistent domain definition.

domain
pointer to domain object
xml
pointer to XML description of one device
flags
bitwise-OR of virDomainDeviceModifyFlags
Returns
0 in case of success, -1 in case of failure.
........

Comment 27 errata-xmlrpc 2014-10-14 04:13:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1374.html