Bug 1032695 - libvirt: machines get killed when scopes are destroyed
Summary: libvirt: machines get killed when scopes are destroyed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 961200 (view as bug list)
Depends On: 1064976
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-20 15:25 UTC by Daniel Berrangé
Modified: 2014-06-18 00:59 UTC (History)
16 users (show)

Fixed In Version: libvirt-1.1.1-25.el7
Doc Type: Bug Fix
Doc Text:
Clone Of: 1031696
: 1064976 (view as bug list)
Environment:
Last Closed: 2014-06-13 10:09:56 UTC
Target Upstream Version:


Attachments (Terms of Use)
The log about libvirt-guests (691.72 KB, text/plain)
2014-03-03 12:07 UTC, zhenfeng wang
no flags Details
The dmesg info of the libvirt-guests (424.27 KB, text/plain)
2014-03-05 03:50 UTC, zhenfeng wang
no flags Details
The syslog about the libvirt-guests (179.96 KB, text/plain)
2014-03-05 03:51 UTC, zhenfeng wang
no flags Details
The libvirtd and libvirt-guests' log which output by journalctl command (8.69 KB, text/plain)
2014-03-06 02:56 UTC, zhenfeng wang
no flags Details
The rc.local file in my host (4.50 KB, text/plain)
2014-03-06 03:00 UTC, zhenfeng wang
no flags Details

Description Daniel Berrangé 2013-11-20 15:25:36 UTC
+++ This bug was initially created as a clone of Bug #1031696 +++

Description of problem:
http://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg14252.html

v0lZy reported on IRC that his qemu machines get killed when shutting
down the host. libvirt-guests.service is designed to suspend them
during shutdown, but when it was run, the guests were all already dead.

And indeed, each qemu is running inside a scope, which is not
connected by any dependencies to either systemd-machine.service, or
libvirt-guests.service. libvirt-guests.service does not depend on
systemd-machine.service either. This means that when shutdown is
ordered, the scopes will stopped in parallel to other
libvirt-guests.service, and depending on timing, qemus will be just
killed with SIGTERM.

For this whole thing to work correctly, we need to ensure that
scopes are not terminated prematurely. If we introduced a target
like libvirt-ready.target, and made libvirt-guests.service be
After=libvirt-ready.target, and made all the scopes be
Before=libvirt-ready.target, I think the vms would have a chance
to shutdown properly. But that's pretty complicated.
And I'm not even sure how to do that properly. Any better
ideas?

Version-Release number of selected component (if applicable):
systemd-208-4.fc20.x86_64
libvirt-1.1.4-2.fc20.x86_64
libvirt-client-1.1.4-2.fc20.x86_64

--- Additional comment from Eric Blake on 2013-11-18 15:46:21 GMT ---

Might be related to bug 906009

--- Additional comment from Zbigniew Jędrzejewski-Szmek on 2013-11-18 15:48:08 GMT ---

(In reply to Eric Blake from comment #1)
> Might be related to bug 906009
They are related in the sense that both are about missing dependencies... But #906009 should be very easy to fix contrary to this one.

--- Additional comment from Juan Orti Alcaine on 2013-11-20 07:17:51 GMT ---

I have upgraded to F20 and now my virtual machines don't shutdown when I
shutdown the host.

In /etc/sysconfig/libvirt-guests I have:
ON_SHUTDOWN=shutdown
SHUTDOWN_TIMEOUT=100

This is what I see in the log (in Spanish):

# journalctl -a _SYSTEMD_UNIT=libvirt-guests.service
_SYSTEMD_UNIT=libvirtd.service

nov 17 23:38:15 xenon.miceliux.com libvirtd[750]: libvirt version: 1.1.3.1,
package: 1.fc20 (Fedora Project, 2013-11-06-18:12:08,
buildvm-04.phx2.fedoraproject.org)
nov 17 23:38:15 xenon.miceliux.com libvirtd[750]: Received unexpected event 1
nov 17 23:38:16 xenon.miceliux.com libvirtd[750]: Error interno: Fin del
archivo desde monitor
nov 17 23:38:16 xenon.miceliux.com libvirtd[750]: Error interno: Falta objeto
de respuesta de monitor
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Ejecutando
huéspedes en URI default:lithium
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Cerrando
huéspedes en URI default...
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Starting shutdown
on guest: lithium
nov 17 23:38:17 xenon.miceliux.com libvirtd[750]: Object (nil) ((unknown)) is
not a virObjectLockable instance
nov 17 23:38:17 xenon.miceliux.com libvirtd[750]: Argumento inválido: el
monitor no debe poseer un valor NULL
nov 17 23:38:17 xenon.miceliux.com libvirt-guests.sh[21049]: Error:Falló al
apagar el dominio 3ee0acb2-3da9-7045-868c-3dec16e03ad1
nov 17 23:38:17 xenon.miceliux.com libvirt-guests.sh[21049]: Error:Argumento
inválido: el monitor no debe poseer un valor NULL


Which translated is something like:

nov 17 23:38:15 xenon.miceliux.com libvirtd[750]: libvirt version: 1.1.3.1,
package: 1.fc20 (Fedora Project, 2013-11-06-18:12:08,
buildvm-04.phx2.fedoraproject.org)
nov 17 23:38:15 xenon.miceliux.com libvirtd[750]: Received unexpected event 1
nov 17 23:38:16 xenon.miceliux.com libvirtd[750]: Internal error: End of file
from monitor
nov 17 23:38:16 xenon.miceliux.com libvirtd[750]: Internal error: Response
object from monitor is missing
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Running guests in
URI default:lithium
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Closing guests in
default URI...
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Starting shutdown
on guest: lithium
nov 17 23:38:17 xenon.miceliux.com libvirtd[750]: Object (nil) ((unknown)) is
not a virObjectLockable instance
nov 17 23:38:17 xenon.miceliux.com libvirtd[750]: Invalid argument: monitor
must not have a NULL value
nov 17 23:38:17 xenon.miceliux.com libvirt-guests.sh[21049]: Error:Failed to
shutdown domain 3ee0acb2-3da9-7045-868c-3dec16e03ad1
nov 17 23:38:17 xenon.miceliux.com libvirt-guests.sh[21049]: Error:Invalid
argument: monitr must not have a NULL value

Comment 2 Jiri Denemark 2013-12-09 16:22:25 UTC
*** Bug 961200 has been marked as a duplicate of this bug. ***

Comment 3 Michal Privoznik 2014-01-15 14:29:59 UTC
I'm getting feeling that this might be a blocker. Lennart, do you have any bright idea, please?

Comment 4 Harald Hoyer 2014-01-28 16:56:48 UTC
(In reply to Daniel Berrange from comment #0)
> For this whole thing to work correctly, we need to ensure that
> scopes are not terminated prematurely. If we introduced a target
> like libvirt-ready.target, and made libvirt-guests.service be
> After=libvirt-ready.target, and made all the scopes be
> Before=libvirt-ready.target, I think the vms would have a chance
> to shutdown properly. But that's pretty complicated.
> And I'm not even sure how to do that properly. Any better
> ideas?
> 


Lennart commented, that Daniel analyzed it correctly.

Comment 5 Lennart Poettering 2014-01-30 17:19:58 UTC
It took as some time to discuss this, sorry for the delay.

Sooo, here's what we'd propose:

In systemd we'll define a new generic target unit called "machines.target". Then, we'll change systemd-machined.service to implicitly add Before= dependencies to the machine scopes for this target. 

In libvirt we'd change the libvirt.service unit file to do After=machines.target. This would then result in the following ordering chain:

    machine-*.scope → machines.target → libvirtd.service

Now, during start-up this ordering would be mostly pointless, as the scopes would not exist in the dependency network before libvirtd.service actually creates them. However, at shutdown this logic would have an effect: in systemd the shutdown order is always the inverse of the start-up order. This hence results in this shutdown ordering:

    libvirtd.service → machines.target → machine-*.scope

Which means: first libvirt would shut down, taking down the machine scopes with them, simply by telling qemu to suspend/terminate them. Then, machines.target would be shut down, and finally the remaining scopes (if there are any would be removed).

Now, machines.target would be generically useful for the non-libvirt case too. For example, if people encapsulate docker or nspawn services in individual units, they could pull those in from machines.target and things would somewhat make sense in that case for the start-up case as well.

I hope this makes some sense. I will make the necessary changes to systemd upstream soon, we cann then backport this to RHEL7.

Comment 6 Cole Robinson 2014-01-30 17:47:49 UTC
Thanks Lennart.

Just to clarify we would be adding the After=machines.target to libvirt-guests.service, which is the optional service which does VM save/restore. libvirtd likely shouldn't have any additional dependency since it isn't supposed to touch running qemu processes on service start/shutdown.

Comment 7 Daniel Berrangé 2014-01-30 17:57:25 UTC
libvirt-guests.service has  'After libvirtd.service'  so I believe the ordering would still work as Lennart describes on startup

    machine-*.scope → machines.target → libvirt-guests.service → libvirtd.service

and on shutdown the reverse

    libvirt-guests.service → libvirtd.service → machines.target → machine-*.scope

Comment 8 Cole Robinson 2014-01-30 18:02:27 UTC
(In reply to Daniel Berrange from comment #7)
> libvirt-guests.service has  'After libvirtd.service'  so I believe the
> ordering would still work as Lennart describes on startup
> 
>     machine-*.scope → machines.target → libvirt-guests.service →
> libvirtd.service
> 
> and on shutdown the reverse
> 
>     libvirt-guests.service → libvirtd.service → machines.target →
> machine-*.scope

But if a user turns off libvirt-guests, should systemd wait for libvirt to stop before it kills all scopes at host shutdown time? there's no reason it should as far as I can tell

Comment 9 Daniel Berrangé 2014-01-30 18:19:22 UTC
(In reply to Cole Robinson from comment #8)
> (In reply to Daniel Berrange from comment #7)
> > libvirt-guests.service has  'After libvirtd.service'  so I believe the
> > ordering would still work as Lennart describes on startup
> > 
> >     machine-*.scope → machines.target → libvirt-guests.service →
> > libvirtd.service
> > 
> > and on shutdown the reverse
> > 
> >     libvirt-guests.service → libvirtd.service → machines.target →
> > machine-*.scope
> 
> But if a user turns off libvirt-guests, should systemd wait for libvirt to
> stop before it kills all scopes at host shutdown time? there's no reason it
> should as far as I can tell

Opps, i got the second example the wrong way around. On shtudown it would be 

 libvirtd.service → libvirt-guests.service → machines.target → machine-*.scope

So if libvirt-guests.service were no activated, I don't believe libvirtd would block on machines.target

Comment 10 Michal Privoznik 2014-01-31 10:22:31 UTC
(In reply to Lennart Poettering from comment #5)
> It took as some time to discuss this, sorry for the delay.
> 
> Sooo, here's what we'd propose:
> 
> In systemd we'll define a new generic target unit called "machines.target".
> Then, we'll change systemd-machined.service to implicitly add Before=
> dependencies to the machine scopes for this target. 
> 
> In libvirt we'd change the libvirt.service unit file to do
> After=machines.target. This would then result in the following ordering
> chain:
> 
>     machine-*.scope → machines.target → libvirtd.service
> 
> Now, during start-up this ordering would be mostly pointless, as the scopes
> would not exist in the dependency network before libvirtd.service actually
> creates them. However, at shutdown this logic would have an effect: in
> systemd the shutdown order is always the inverse of the start-up order. This
> hence results in this shutdown ordering:
> 
>     libvirtd.service → machines.target → machine-*.scope
> 
> Which means: first libvirt would shut down, taking down the machine scopes
> with them, simply by telling qemu to suspend/terminate them. Then,
> machines.target would be shut down, and finally the remaining scopes (if
> there are any would be removed).
> 
> Now, machines.target would be generically useful for the non-libvirt case
> too. For example, if people encapsulate docker or nspawn services in
> individual units, they could pull those in from machines.target and things
> would somewhat make sense in that case for the start-up case as well.
> 
> I hope this makes some sense. I will make the necessary changes to systemd
> upstream soon, we cann then backport this to RHEL7.

So should I clone this bug for systemd too?

Comment 11 Lennart Poettering 2014-01-31 10:54:18 UTC
Sooo, after thinking about this for a couple of more hours here at the hackfest we came to the conclusion that "machines.target" is probably not a good idea after all, since it cannot properly distinguish clean termination of machines by libvirt from the "emergency" clean-up done by systemd should libvirt die abnormally. The machines.target above would only cover the "emergency" case, which is certainly much less interesting though than the clean termination case. Moreover it actually enforces the "emergency" ordering even when a clean termination is done, a job that libvirt/machined would queue for termination of a machine scope would be delayed after libvirt would itself complete, which of course is mostly a chance for deadlock but certainly not useful...

Instead, we want propose a different approach here, that covers this case much nicer: when creating a scope we'd add an additional, optional property parameter, maybe called "ScopeManager" or so, which takes a string. If specified it should contain the bus name (unique name, possibly well-known name) of a peer that systemd will send a bus signal to instead of sending SIGTERM to the scope's processes when it would like to shutdown the scope unit.

libvirt would simply set this property parameter to its own unique name, and then when systemd wants the machine scopes to go away, it would be libvirt that gets asked this way to terminate the scopes, and systemd would not terminate them directly (well, subject to a timeout after which systemd would SIGKILL them...). When the system is shut down this would have the effect that systemd would send both SIGTERM to libvirt (since it wants to terminate libvirt itself), and a bus signal for each machine scope that is running, also to libvirt. (The SIGTERM and the bus signals would not be ordered though, but that should not be a problem, or would it?)

A nice effect of this is that libvirt would also get hooked into the shutdown logic of its machines if people use "systemctl stop machine-xyz.scope", thus streamlining the termination logic of its machines both during runtime and at shut down time...

Does this make sense to you? I personally find this a much more convincing solution, since it kinda puts libvirt into the right position of being the manager of its own scopes, which it should have been in the first place... systemd would never take things into its own hands anymore, except when libvirt for some reason did not react to the bus signal, and it would step in as last resort (though even that would be up to libvirt to decide if it really wants to, just by using the already existing SendSIGKILL property...)

Sorry for the forth and back on this!

Comment 12 Lennart Poettering 2014-01-31 17:03:48 UTC
This is the systemd side of things:

http://cgit.freedesktop.org/systemd/systemd/commit/?id=2d4a39e759c4ab846ad8a546abeddd40bc8d736e

libvirt should now simply add a property "Controller" containing a string with the unique name to the array it passes to the CreateMachine() call. Then, it should subscribe to the RequestStop() signal coming from PID 1's scope unit. And when it gets it it should shutdown that specific machine in whatever way it likes.

I'll add documentation about this to the Wiki shortly.

Comment 13 Lennart Poettering 2014-02-05 14:29:02 UTC
There's a new paragraph explaining this now at the end of this wiki text:

http://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/

I also added the new signal to the dbus API description:

http://www.freedesktop.org/wiki/Software/systemd/dbus/

It's admittedly terse. If you need more information, please ping me.

Comment 14 Michal Privoznik 2014-02-10 09:45:51 UTC
Now that I'm trying to implement this in libvirt, it seems to me, that libvirt must take over the libvirt-guests script. Because currently the functionality is split into the script. In RHEL-6 world, whenever host went down, the script was invoked and machines were (optionaly) saved/shutdown. Then, when the host went up again and the script was invoked at bootup, the machines were restored again. No DBus involved.

So now we have two options:
1) since libvirtd is already listening on dbus, move the piece of functionality to libvirtd

2) make libvirt-guests script listen on dbus and set "Controller: libvirt-guests.service". This can possibly mean, that script will hang around forever.

BTW: From the documentation you're linking: "As before in either case this will be followed by SIGKILL to the scope unit processes after a timeout."

This is highly critical to libvirt. What is the timeout? In the libvirt-guests script configruation we allow users to set an arbitrary timeout for saving/shuting down guests. We even allow them to turn timeout off, meaning the host shutdown process is postponed untill all guests are saved/shut down. Hence, we must make the timeout configurable in the systemd too.

Comment 15 Daniel Berrangé 2014-02-10 10:23:10 UTC
FWIW, I've always thought that this functionality should live inside libvirtd. For the QEMU session driver we have indeed added such a capability, albeit more limited in flexibility. This triggered upon desktop session logout or host shutdown. This would clearly be a much larger job though, so probably not reasonable for 7.0.

Comment 16 Michal Privoznik 2014-02-10 10:42:57 UTC
Well, the current script works for all drivers (not just qemu), in fact you can define any URI that whill shutdown guests (and it can be a remote host, not the one that is shutting down). For example if you have two hosts: hostA running VMs, hostB serving VM disks over NFS, libvirt-guests can be configured on hostB to shutdown all the VMs on hostA. And there doesn't have to be any libvirtd running on hostB. If we merge libvirt-guests into the libvirtd we will mangle not only this use case but all the other systems that doesn't use systemd at all.

Anyway, if we are able to send a signal on dying scope, can we just run an arbitrary script instead? And when I say 'we' I mean systemd.

Comment 17 Daniel Berrangé 2014-02-10 10:49:16 UTC
(In reply to Michal Privoznik from comment #16)
> Well, the current script works for all drivers (not just qemu), in fact you
> can define any URI that whill shutdown guests (and it can be a remote host,
> not the one that is shutting down). For example if you have two hosts: hostA
> running VMs, hostB serving VM disks over NFS, libvirt-guests can be
> configured on hostB to shutdown all the VMs on hostA. And there doesn't have
> to be any libvirtd running on hostB. If we merge libvirt-guests into the
> libvirtd we will mangle not only this use case but all the other systems
> that doesn't use systemd at all.

Obviously we need this work to be done for all the stateful drivers in libvirtd. I can imagine this would involve some generic code in libvirtd and a perhaps a handful of new driver APIs.

The ability to shutdown VMS on host B, when host A shuts down has always struck me as a particularly pointless feature. It is a solution in search of a problem IMHO. The core important feature is shutting down VMs running in stateful libvirtd drivers. If we have to loose anything else, in order to do the core feature well, then so be it.  We shouldn't have to loose support for non-systemd 
hosts either. All that I see happening is that the libvirt-guests sysvinit script would become very small - it'd just have to make an API call to libvirtd to trigger its job, or send a signal to it.

Comment 18 Michal Privoznik 2014-02-10 12:16:55 UTC
I still don't understand, why we need to take the more complicated approach here. I think the best solution would be, if scope understand Before= and After= even for services. That is, in the scope we can have:

   Before=libvirt-guest.service

So when systemd computes the shutdown order, libvirt-guest.service will always be before any scope termination. In which case, libvirt-guest.service would kill some scopes and all the remaining scopes after the script is run we don't care about.

This approach is even better than having Script= like I'm suggesting in comment 16. Because scopes can be killed in parallel in which case we need to adapt libvirt-guest.

Comment 19 Daniel Berrangé 2014-02-13 16:18:26 UTC
(In reply to Lennart Poettering from comment #11)
> libvirt would simply set this property parameter to its own unique name, and
> then when systemd wants the machine scopes to go away, it would be libvirt
> that gets asked this way to terminate the scopes, and systemd would not
> terminate them directly (well, subject to a timeout after which systemd
> would SIGKILL them...). When the system is shut down this would have the
> effect that systemd would send both SIGTERM to libvirt (since it wants to
> terminate libvirt itself), and a bus signal for each machine scope that is
> running, also to libvirt. (The SIGTERM and the bus signals would not be
> ordered though, but that should not be a problem, or would it?)

I think what you describe with Controller= could work with 2 caveats

  - We would need to use a well-known bus name, not a unique bus name, since libvirtd needs to be able to restart without affecting guests, and such restarts would  result in libvirt getting a new unique name. I don't think this is a problem, since there's nothing in the code that prevents use of a well-known bus name. It is just a matter of libvirt registering org.libvirt.system on the bus at startup.

  - We need to solve the SIGTERM vs scope bus signals race. If SIGTERM can arrive to libvirt before the scope bus signals, then chances are libvirt is going to have already shut itself down before it is notified that scopes need to be shutdown.  AFAICT, this requires the ability for us to also set Before=libvirtd.service on the VM scopes, to ensure that the scopes will be scheduled for shutdown before libvirt ever gets a SIGTERM.

I think using  Controller= and Before= on the scopes is the right long term approach to fix this problem.

For RHEL-7, however, I don't think we can do the work to support Controller=, and thus we need the ability to set After= on scopes, so we can control their shutdown ordering wrt libvirt-guests.service and libvirtd.service.

So regardless of what approach we take for making this work, AFAICT, we really do need support for After= and Before= on scopes.

Comment 21 Daniel Berrangé 2014-02-20 12:46:02 UTC
After discussion with Lennart via email I found out why we were having problems with Before=/After= support. The key is that those properties must be provided as an array of strings, not a single comma separated string.

eg something like this

@@ -243,8 +243,9 @@ int virSystemdCreateMachine(const char *name,
                           iscontainer ? "container" : "vm",
                           (unsigned int)pidleader,
                           rootdir ? rootdir : "",
-                          1, "Slice", "s",
-                          slicename) < 0)
+                          2,
+                          "Slice", "s", slicename,
+                          "Before", "as", 1, "libvirtd.service") < 0)
         goto cleanup;
 
     ret = 0;

Comment 22 Michal Privoznik 2014-02-20 16:04:14 UTC
(In reply to Daniel Berrange from comment #21)
> After discussion with Lennart via email I found out why we were having
> problems with Before=/After= support. The key is that those properties must
> be provided as an array of strings, not a single comma separated string.
> 
> eg something like this
> 
> @@ -243,8 +243,9 @@ int virSystemdCreateMachine(const char *name,
>                            iscontainer ? "container" : "vm",
>                            (unsigned int)pidleader,
>                            rootdir ? rootdir : "",
> -                          1, "Slice", "s",
> -                          slicename) < 0)
> +                          2,
> +                          "Slice", "s", slicename,
> +                          "Before", "as", 1, "libvirtd.service") < 0)
>          goto cleanup;
>  
>      ret = 0;

I've reached this code [*] after reading systemd source code (the documentation is just lacking) like a week ago. I even tried it out, but without any success. There's a difference between upstream and downstream systemd and I think this falls into grey area. Moreover, I've even tried setting "DefaultDependencies" to false too. No success either. Therefore I'm restoring the dependency back. And sorry for not updating the BZ.

* - in fact I was trying something like this:

@@ -243,8 +243,9 @@ int virSystemdCreateMachine(const char *name,
                           iscontainer ? "container" : "vm",
                           (unsigned int)pidleader,
                           rootdir ? rootdir : "",
-                          1, "Slice", "s",
-                          slicename) < 0)
+                          2,
+                          "Slice", "s", slicename,
+                          "Before", "as", 1, "libvirt-guests.service") < 0)
         goto cleanup;
 
     ret = 0;



and this:

@@ -243,8 +243,10 @@ int virSystemdCreateMachine(const char *name,
                           iscontainer ? "container" : "vm",
                           (unsigned int)pidleader,
                           rootdir ? rootdir : "",
-                          1, "Slice", "s",
-                          slicename) < 0)
+                          3,
+                          "Slice", "s", slicename,
+                          "DefaultDependencies", "b", 0,
+                          "Before", "s", "libvirt-guests.service") < 0)


which is interesting that in this case I get I/O error when communicating on dbus, and nothing but reboot can make it work again. Nor reverting my patch, rebuilding & restarting the libvirtd daemon.

Comment 23 Michal Privoznik 2014-02-21 12:33:47 UTC
Patches proposed upstream:

https://www.redhat.com/archives/libvir-list/2014-February/msg01357.html

Comment 25 Eric Blake 2014-02-25 15:20:02 UTC
Moving back to ASSIGNED to make sure we also modify the .spec file for a reproducible build.

Comment 26 Jiri Denemark 2014-02-25 15:29:19 UTC
See https://www.redhat.com/archives/libvir-list/2014-February/msg01539.html for details

Comment 27 Eric Blake 2014-02-25 20:25:41 UTC
Additional patches pending upstream:
https://www.redhat.com/archives/libvir-list/2014-February/msg01559.html

It turns out that the RHEL 7 build already depends on systemd_daemon based on an indirect dependency on udev; so backporting the additional patches won't change the binary, but will make the resulting rpm be deterministic for anyone rebuilding the rpm outside of the RHEL 7 build farm.

Comment 28 Michal Privoznik 2014-02-26 13:35:53 UTC
Okay, moving to POST once again:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2014-February/msg00853.html

Comment 30 zhenfeng wang 2014-03-03 12:02:57 UTC
Hi Michal
I met a issue that some guests can't be resumed and the libvirt-guests service was in failed status after i finish reboot the host, please help have a look thanks

pkginfo
qemu-kvm-rhev-1.5.3-50.el7.x86_64
kernel-3.10.0-97.el7.x86_64
libvirt-1.1.1-25.el7.x86_64

steps
1.Prepare 3 running guests and enable the libvirt-guests service
# virsh list
 Id    Name                           State
----------------------------------------------------
 2     rhel7                          running
 3     rhel72                         running
 6     rhel75                         running

# service libvirt-guests status
Redirecting to /bin/systemctl status  libvirt-guests.service
libvirt-guests.service - Suspend Active Libvirt Guests
   Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; enabled)
   Active: active (exited) since Mon 2014-03-03 19:59:11 CST; 3s ago
  Process: 3275 ExecStart=/usr/libexec/libvirt-guests.sh start (code=exited, status=0/SUCCESS)
 Main PID: 3275 (code=exited, status=0/SUCCESS)


2. keep all configure as default in /etc/sysconfig/libvirt-guests
ON_SHUTDOWN=suspend
ON_BOOT=start

3. reboot the *host*:
   # reboot

4.Check the guest status after the host reboot, find one guest wasn't resumed
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     rhel7                          running
 3     rhel72                         running
 -     rhel75                         shut off

# ll /var/lib/libvirt/qemu/save/
total 994060
-rw-------. 1 root root 1017913963 Mar  3 19:50 rhel75.save



5.Check the libvirt-guests status
# systemctl status libvirt-guests -l
libvirt-guests.service - Suspend Active Libvirt Guests
   Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; enabled)
   Active: failed (Result: exit-code) since Mon 2014-03-03 19:55:33 CST; 1min 46s ago
  Process: 1800 ExecStart=/usr/libexec/libvirt-guests.sh start (code=exited, status=1/FAILURE)
 Main PID: 1800 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/libvirt-guests.service

Mar 03 19:53:51 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com libvirt-guests.sh[1800]: Resuming guests on default URI...
Mar 03 19:53:55 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com libvirt-guests.sh[1800]: Resuming guest rhel7: done
Mar 03 19:53:58 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com libvirt-guests.sh[1800]: Resuming guest rhel72: done
Mar 03 19:55:33 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com libvirt-guests.sh[1800]: Resuming guest rhel75: error: Failed to start domain rhel75
Mar 03 19:55:33 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com libvirt-guests.sh[1800]: error: End of file while reading data: Input/output error
Mar 03 19:55:33 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com libvirt-guests.sh[1800]: error: One or more references were leaked after disconnect from the hypervisor
Mar 03 19:55:33 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com libvirt-guests.sh[1800]: error: Failed to reconnect to the hypervisor
Mar 03 19:55:33 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com systemd[1]: libvirt-guests.service: main process exited, code=exited, status=1/FAILURE
Mar 03 19:55:33 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com systemd[1]: Failed to start Suspend Active Libvirt Guests.
Mar 03 19:55:33 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com systemd[1]: Unit libvirt-guests.service entered failed state.

6. I'll attach the log info in the attachment

Comment 31 zhenfeng wang 2014-03-03 12:07:02 UTC
Created attachment 869905 [details]
The log about libvirt-guests

Comment 32 Michal Privoznik 2014-03-03 14:23:38 UTC
(In reply to zhenfeng wang from comment #30)
> 
> 4.Check the guest status after the host reboot, find one guest wasn't resumed
> # virsh list --all
>  Id    Name                           State
> ----------------------------------------------------
>  2     rhel7                          running
>  3     rhel72                         running
>  -     rhel75                         shut off
> 
> # ll /var/lib/libvirt/qemu/save/
> total 994060
> -rw-------. 1 root root 1017913963 Mar  3 19:50 rhel75.save
> 
> 
> 
> 5.Check the libvirt-guests status
> # systemctl status libvirt-guests -l
> libvirt-guests.service - Suspend Active Libvirt Guests
>    Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; enabled)
>    Active: failed (Result: exit-code) since Mon 2014-03-03 19:55:33 CST;
> 1min 46s ago
>   Process: 1800 ExecStart=/usr/libexec/libvirt-guests.sh start (code=exited,
> status=1/FAILURE)
>  Main PID: 1800 (code=exited, status=1/FAILURE)
>    CGroup: /system.slice/libvirt-guests.service
> 
> Mar 03 19:53:51 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com
> libvirt-guests.sh[1800]: Resuming guests on default URI...
> Mar 03 19:53:55 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com
> libvirt-guests.sh[1800]: Resuming guest rhel7: done
> Mar 03 19:53:58 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com
> libvirt-guests.sh[1800]: Resuming guest rhel72: done
> Mar 03 19:55:33 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com
> libvirt-guests.sh[1800]: Resuming guest rhel75: error: Failed to start
> domain rhel75
> Mar 03 19:55:33 ibm-x3650m3-07.qe.lab.eng.nay.redhat.com
> libvirt-guests.sh[1800]: error: End of file while reading data: Input/output
> error

This seems like libvirtd encountered a segmentation fault. Do you happen to have a coredump or be able to reproduce?
What happens when you 'virsh restore rhel75'?

BTW: wasn't rhel75 freshly booted up or shutting down prior to reboot? Maybe we are not handling that correctly.

Michal

Comment 33 zhenfeng wang 2014-03-04 06:38:13 UTC
Hi Michal
Sorry to tell you that i didn't find the coredump in my host, and i find the libvirtd was in running status while the host start completely, maybe the libvirtd ever crash during the host start process. I can ofen reproduce this issue while start many guests (guest numbers >=3), can't reproduce it while only start one guest. BTW, The rhel75 guest can be restored successfully with the virsh restore command and the guest will recover to the place where it left 


Check the systemlog , i find the following record, hope it help you
Mar  4 14:32:06 ibm-x3650m3-07 systemd: Stopping Virtualization daemon...
Mar  4 14:33:36 ibm-x3650m3-07 systemd: libvirtd.service stopping timed out. Killing.
Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: Resuming guest rhel75: error: Failed to start domain rhel75
Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: error: End of file while reading data: Input/output error
Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: error: One or more references were leaked after disconnect from the hypervisor
Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: error: Failed to reconnect to the hypervisor
Mar  4 14:33:36 ibm-x3650m3-07 systemd: libvirtd.service: main process exited, code=killed, status=9/KILL
Mar  4 14:33:36 ibm-x3650m3-07 systemd: Unit libvirtd.service entered failed state.
Mar  4 14:33:36 ibm-x3650m3-07 systemd: Starting Virtualization daemon...
Mar  4 14:33:36 ibm-x3650m3-07 systemd: libvirt-guests.service: main process exited, code=exited, status=1/FAILURE
Mar  4 14:33:36 ibm-x3650m3-07 systemd: Failed to start Suspend Active Libvirt Guests.
Mar  4 14:33:36 ibm-x3650m3-07 systemd: Unit libvirt-guests.service entered failed state.
Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.005+0000: 2717: info : libvirt version: 1.1.1, package: 25.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-02-26-10:34:02, x86-017.build.eng.bos.redhat.com)
Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.005+0000: 2717: debug : virLogParseOutputs:1346 : outputs=1:file:/var/log/libvirt/libvirtd.log
Mar  4 14:33:37 ibm-x3650m3-07 systemd: Started Virtualization daemon.
Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.023+0000: 2732: debug : virFileClose:90 : Closed fd 3
Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.023+0000: 2732: debug : virFileClose:90 : Closed fd 5
Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.023+0000: 2732: debug : virFileClose:90 : Closed fd 6
Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.023+0000: 2732: debug : virFileClose:90 : Closed fd 7
Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.023+0000: 2732: debug : virFileClose:90 : Closed fd 8
Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.023+0000: 2732: debug : virFileClose:90 : Closed fd 9
Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.023+0000: 2732: debug : virFileClose:90 : Closed fd 10

Comment 34 Michal Privoznik 2014-03-04 13:37:23 UTC
(In reply to zhenfeng wang from comment #33)
> Hi Michal
> Sorry to tell you that i didn't find the coredump in my host, and i find the
> libvirtd was in running status while the host start completely, maybe the
> libvirtd ever crash during the host start process. I can ofen reproduce this
> issue while start many guests (guest numbers >=3), can't reproduce it while
> only start one guest. BTW, The rhel75 guest can be restored successfully
> with the virsh restore command and the guest will recover to the place where
> it left 
> 
> 
> Check the systemlog , i find the following record, hope it help you

It does indeed.

> Mar  4 14:32:06 ibm-x3650m3-07 systemd: Stopping Virtualization daemon...

Okay, so this is interesting. Why the heck is systemd *stopping* libvirtd at host boot up?

> Mar  4 14:33:36 ibm-x3650m3-07 systemd: libvirtd.service stopping timed out.

Yes, this timeouts as libvirtd is stuck resuming guests (that's why you hit this when guest count >= 3. I assume you have a slow disk or something (nfs?), so that resume of guest takes 30 seconds or more).

> Killing.
> Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: Resuming guest rhel75:
> error: Failed to start domain rhel75
> Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: error: End of file while
> reading data: Input/output error

So after systemd timeouts on waiting for libvirtd, it kills it. Which release a chain-reaction, like libvirt-guests.sh losing the connection and thus not resuming other guests.

> Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: error: One or more
> references were leaked after disconnect from the hypervisor
> Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: error: Failed to reconnect
> to the hypervisor
> Mar  4 14:33:36 ibm-x3650m3-07 systemd: libvirtd.service: main process
> exited, code=killed, status=9/KILL
> Mar  4 14:33:36 ibm-x3650m3-07 systemd: Unit libvirtd.service entered failed
> state.
> Mar  4 14:33:36 ibm-x3650m3-07 systemd: Starting Virtualization daemon...

And later, when libvirtd is successfully killed, maybe just for a sheer joy of it, systemd decides to start it again.

> Mar  4 14:33:36 ibm-x3650m3-07 systemd: libvirt-guests.service: main process
> exited, code=exited, status=1/FAILURE
> Mar  4 14:33:36 ibm-x3650m3-07 systemd: Failed to start Suspend Active
> Libvirt Guests.
> Mar  4 14:33:36 ibm-x3650m3-07 systemd: Unit libvirt-guests.service entered
> failed state.
> Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.005+0000: 2717:
> info : libvirt version: 1.1.1, package: 25.el7 (Red Hat, Inc.
> <http://bugzilla.redhat.com/bugzilla>, 2014-02-26-10:34:02,
> x86-017.build.eng.bos.redhat.com)
> Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.005+0000: 2717:
> debug : virLogParseOutputs:1346 :
> outputs=1:file:/var/log/libvirt/libvirtd.log
> Mar  4 14:33:37 ibm-x3650m3-07 systemd: Started Virtualization daemon.

Can you please try catching systemd debug logs and attach them here? I believe this can be achieved by appending:

systemd.log_level=debug systemd.log_target=kmsg log_buf_len=5M enforcing=0

to kernel command line before booting. Then /var/log/messages and dmesg should contain the interesting bits.

Comment 35 zhenfeng wang 2014-03-05 03:48:35 UTC
(In reply to Michal Privoznik from comment #34)
> (In reply to zhenfeng wang from comment #33)
> > Hi Michal
> > Sorry to tell you that i didn't find the coredump in my host, and i find the
> > libvirtd was in running status while the host start completely, maybe the
> > libvirtd ever crash during the host start process. I can ofen reproduce this
> > issue while start many guests (guest numbers >=3), can't reproduce it while
> > only start one guest. BTW, The rhel75 guest can be restored successfully
> > with the virsh restore command and the guest will recover to the place where
> > it left 
> > 
> > 
> > Check the systemlog , i find the following record, hope it help you
> 
> It does indeed.
> 
> > Mar  4 14:32:06 ibm-x3650m3-07 systemd: Stopping Virtualization daemon...
> 
> Okay, so this is interesting. Why the heck is systemd *stopping* libvirtd at
> host boot up?
Not clear about this strange phenomenon, I guest that maybe it have relationship with the timeout of the libvirtd initialization 
> 
> > Mar  4 14:33:36 ibm-x3650m3-07 systemd: libvirtd.service stopping timed out.
> 
> Yes, this timeouts as libvirtd is stuck resuming guests (that's why you hit
> this when guest count >= 3. I assume you have a slow disk or something
> (nfs?), so that resume of guest takes 30 seconds or more).
> 
In fact, I didn't use nfs and all the guests live in my local machine, my host's disk is sata disk, and its read rate can reach a value which is more than 100MB/s, so it shouldn't be low disk
# hdparm -t  /dev/sda1

/dev/sda1:
 Timing buffered disk reads: 356 MB in  3.01 seconds = 118.29 MB/sec


> > Killing.
> > Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: Resuming guest rhel75:
> > error: Failed to start domain rhel75
> > Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: error: End of file while
> > reading data: Input/output error
> 
> So after systemd timeouts on waiting for libvirtd, it kills it. Which
> release a chain-reaction, like libvirt-guests.sh losing the connection and
> thus not resuming other guests.
> 
> > Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: error: One or more
> > references were leaked after disconnect from the hypervisor
> > Mar  4 14:33:36 ibm-x3650m3-07 libvirt-guests.sh: error: Failed to reconnect
> > to the hypervisor
> > Mar  4 14:33:36 ibm-x3650m3-07 systemd: libvirtd.service: main process
> > exited, code=killed, status=9/KILL
> > Mar  4 14:33:36 ibm-x3650m3-07 systemd: Unit libvirtd.service entered failed
> > state.
> > Mar  4 14:33:36 ibm-x3650m3-07 systemd: Starting Virtualization daemon...
> 
> And later, when libvirtd is successfully killed, maybe just for a sheer joy
> of it, systemd decides to start it again.
> 
> > Mar  4 14:33:36 ibm-x3650m3-07 systemd: libvirt-guests.service: main process
> > exited, code=exited, status=1/FAILURE
> > Mar  4 14:33:36 ibm-x3650m3-07 systemd: Failed to start Suspend Active
> > Libvirt Guests.
> > Mar  4 14:33:36 ibm-x3650m3-07 systemd: Unit libvirt-guests.service entered
> > failed state.
> > Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.005+0000: 2717:
> > info : libvirt version: 1.1.1, package: 25.el7 (Red Hat, Inc.
> > <http://bugzilla.redhat.com/bugzilla>, 2014-02-26-10:34:02,
> > x86-017.build.eng.bos.redhat.com)
> > Mar  4 14:33:37 ibm-x3650m3-07 libvirtd: 2014-03-04 06:33:37.005+0000: 2717:
> > debug : virLogParseOutputs:1346 :
> > outputs=1:file:/var/log/libvirt/libvirtd.log
> > Mar  4 14:33:37 ibm-x3650m3-07 systemd: Started Virtualization daemon.
> 
> Can you please try catching systemd debug logs and attach them here? I
> believe this can be achieved by appending:
> 
> systemd.log_level=debug systemd.log_target=kmsg log_buf_len=5M enforcing=0
> 
> to kernel command line before booting. Then /var/log/messages and dmesg
> should contain the interesting bits.

Comment 36 zhenfeng wang 2014-03-05 03:50:50 UTC
Created attachment 870744 [details]
The dmesg info of the libvirt-guests

Comment 37 zhenfeng wang 2014-03-05 03:51:41 UTC
Created attachment 870745 [details]
The syslog about the libvirt-guests

Comment 38 Michal Privoznik 2014-03-05 16:35:02 UTC
zhenfeng,

This libvirtd.service restart looks spurious, does your /etc/rc.local (or something) contains 'service libvirtd restart' (or equivalent?)

Comment 39 zhenfeng wang 2014-03-06 02:52:34 UTC
Hi Michal
I didn't find /etc/rc.local file in my host, however, find the /etc/rc.d/rc.local in my host and it contains 'service libvirtd restart', i'll attach this file to the attachment, also i'll attach the 'journalctl -u libvirtd.service' and 'journalctl -u libvirt-guests.service' logs to my attachment

# grep 'service libvirtd restart' /etc/rc.d/rc.local 
service libvirtd restart
service libvirtd restart

Comment 40 zhenfeng wang 2014-03-06 02:56:18 UTC
Created attachment 871180 [details]
The libvirtd and libvirt-guests' log which output by journalctl command

Comment 41 zhenfeng wang 2014-03-06 03:00:01 UTC
Created attachment 871181 [details]
The rc.local file in my host

Comment 42 Michal Privoznik 2014-03-06 10:09:19 UTC
(In reply to zhenfeng wang from comment #41)
> Created attachment 871181 [details]
> The rc.local file in my host

From the file:

#!/bin/sh
ntpdate clock.redhat.com

modprobe kvm
modprobe kvm-intel
modprobe kvm-amd

service rpcbind  start
chkconfig rpcbind on

chmod 666 /dev/kvm

service libvirtd restart
setsebool -P virt_use_nfs 1 
setenforce 1 
/usr/libexec/iptables/iptables.init stop
service firewalld stop
service libvirtd restart

That explains why libvirtd is restarted (two times) and why libvirt-guests script loses connection and fails to resume some domains. In fact, it explains why libvirtd fails to quit and systemd decides to kill it. When libvirtd is requested to stop in certain state (while talking on monitor) our event loop gets stuck.

So I think you should remove 'service libvirtd restart' from rc.local and re-verify.

Comment 43 zhenfeng wang 2014-03-07 13:17:34 UTC
Hi Michal
Thanks for your suggestion, and it works well after i remove 'service libvirtd restart' from rc.local, the following was my verify steps

pkginfo
libvirt-1.1.1-26.el7.x86_64
qemu-kvm-rhev-1.5.3-52.el7.x86_64
kernel-3.10.0-105.el7.x86_64

steps

SETUP1
1.Ensure the libvirtd and libvirt-guests service were in running status
2.Make sure the libvirt-guests script is on on the next reboot:
# systemctl enable libvirt-guests.service
3.Start 4 running guests,make sure the autostart should be disabled for
  all domains
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     rhel71                          running
 3     rhel72                         running
 4     rhel73                         running
 5     rhel75                         running

Scenario 1
1. Edit configure in /etc/sysconfig/libvirt-guests
ON_SHUTDOWN=suspend
ON_BOOT=start

2.Restart the libvirt-guests service, all guests were still in place where they left while start the libvirt-guests
#systemctl restart libvirt-guests
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     rhel71                          running
 3     rhel72                         running
 4     rhel73                         running
 5     rhel75                         running
3.Reboot the host, got the same result with step 2
#reboot
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     rhel71                          running
 3     rhel72                         running
 4     rhel73                         running
 5     rhel75                         running

Scenario 2
1. Edit configure in /etc/sysconfig/libvirt-guests
ON_SHUTDOWN=suspend
ON_BOOT=ignore

2.Restart the libvirt-guests service. When the  libvirt-guests start, all domains were in shutoff status. there will have  managedsave file /var/lib/libvirt/qemu/save/ for the domains
#systemctl restart libvirt-guests
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhel71                         shut off
 -     rhel72                         shut off
 -     rhel73                         shut off
 -     rhel75                         shut off

# ll /var/lib/libvirt/qemu/save/
total 2496084
-rw-------. 1 root root  263823173 Mar  7 19:15 rhel71.save
-rw-------. 1 root root  258106558 Mar  7 19:15 rhel72.save
-rw-------. 1 root root 1028830894 Mar  7 19:15 rhel73.save
-rw-------. 1 root root 1005223558 Mar  7 19:15 rhel75.save

while the start the guests, all guests will back to the place where they left

3.Reboot the host, got the same result with step 2
#reboot
#virsh list --all

Scenario 3
1. Edit configure in /etc/sysconfig/libvirt-guests
ON_SHUTDOWN=shutdown
ON_BOOT=ignore

2.Restart the libvirt-guests service, When the  libvirt-guests start, all domains were in shutoff status. there didn't have  managedsave file /var/lib/libvirt/qemu/save/ for the domains
#systemctl restart libvirt-guests
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhel71                         shut off
 -     rhel72                         shut off
 -     rhel73                         shut off
 -     rhel75                         shut off
# ll /var/lib/libvirt/qemu/save/
total 0

while the start the guests, all guests will have a fresh start

3.Reboot the host, got the same result with step 2
#reboot
#virsh list --all

Scenario 4

1. Edit configure in /etc/sysconfig/libvirt-guests
ON_SHUTDOWN=shutdown
ON_BOOT=start

2.Restart the libvirt-guests service, When the  libvirt-guests start, all domains were in running status. all guests didn't back to the place where they left, they just had a fresh boot
#systemctl restart libvirt-guests
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhel71                         running
 -     rhel72                         running
 -     rhel73                         running
 -     rhel75                         running

3.Reboot the host, got the same result with step 2
#reboot
#virsh list --all

SETUP2
All steps were the same with the SETUP1 except the step 3 which we enable the autostart for rhel71 and rhel72, didn't enable the autostart for rhel73 and rhel75
#virsh autostart rhel71
#virsh autostart rhel72

Scenario 1
1. Edit configure in /etc/sysconfig/libvirt-guests
ON_SHUTDOWN=suspend
ON_BOOT=ignore

2.Restart the libvirt-guests service. I can get the same result with the step2 in Scenario 2

3.Reboot the host, while the host start completly, the rhel71 and rhel72 guests were in running status and they stay the place where they left . For rhel73 and rhel75 guests, they were in shutoff status and i can see their save file under the /var/lib/libvirt/qemu/save folder

Scenario2
1.Edit configure in /etc/sysconfig/libvirt-guests
ON_SHUTDOWN=shutdown
ON_BOOT=ignore

2.Restart the libvirt-guests service, I can get the same result with the step2 in Scenario 3

3.Reboot the host, while the host start completely, the rhel71 and rhel72 guests were in running status and they didn't stay the place where they left, they just have a fresh start. For the rhel73 and rhel75 guests, they were in shutoff status and i can't see their save file under the /var/lib/libvirt/qemu/save folder

Base the upper info, mark this bug verified

Comment 44 Ludek Smid 2014-06-13 10:09:56 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.


Note You need to log in before you can comment on or make changes to this bug.