Bug 1031696

Summary: libvirt: machines get killed when scopes are destroyed
Product: [Fedora] Fedora Reporter: Zbigniew Jędrzejewski-Szmek <zbyszek>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: amessina, awilliam, berrange, bugzilla, clalancette, crobinso, dopey, dyuan, eblake, edgar.hoch, error, frank, itamar, jfehlig, jforbes, johannbg, jorti, jyang, laine, libvirt-maint, lnykryn, mavit, msekleta, mzhan, plautrba, s.adam, shyu, systemd-maint, tomek, veillard, vg.aetera, virt-maint, vpavlin, zbyszek, zhwang, zpeng
Target Milestone: ---Keywords: CommonBugs
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: RejectedFreezeException https://fedoraproject.org/wiki/Common_F20_bugs#virt-killed-shutdown
Fixed In Version: libvirt-1.1.3.4-3.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1032695 (view as bug list) Environment:
Last Closed: 2014-03-19 08:41:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zbigniew Jędrzejewski-Szmek 2013-11-18 14:57:09 UTC
Description of problem:
http://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg14252.html

v0lZy reported on IRC that his qemu machines get killed when shutting
down the host. libvirt-guests.service is designed to suspend them
during shutdown, but when it was run, the guests were all already dead.

And indeed, each qemu is running inside a scope, which is not
connected by any dependencies to either systemd-machine.service, or
libvirt-guests.service. libvirt-guests.service does not depend on
systemd-machine.service either. This means that when shutdown is
ordered, the scopes will stopped in parallel to other
libvirt-guests.service, and depending on timing, qemus will be just
killed with SIGTERM.

For this whole thing to work correctly, we need to ensure that
scopes are not terminated prematurely. If we introduced a target
like libvirt-ready.target, and made libvirt-guests.service be
After=libvirt-ready.target, and made all the scopes be
Before=libvirt-ready.target, I think the vms would have a chance
to shutdown properly. But that's pretty complicated.
And I'm not even sure how to do that properly. Any better
ideas?

Version-Release number of selected component (if applicable):
systemd-208-4.fc20.x86_64
libvirt-1.1.4-2.fc20.x86_64
libvirt-client-1.1.4-2.fc20.x86_64

Comment 1 Eric Blake 2013-11-18 15:46:21 UTC
Might be related to bug 906009

Comment 2 Zbigniew Jędrzejewski-Szmek 2013-11-18 15:48:08 UTC
(In reply to Eric Blake from comment #1)
> Might be related to bug 906009
They are related in the sense that both are about missing dependencies... But #906009 should be very easy to fix contrary to this one.

Comment 3 Juan Orti 2013-11-20 07:17:51 UTC
I have upgraded to F20 and now my virtual machines don't shutdown when I
shutdown the host.

In /etc/sysconfig/libvirt-guests I have:
ON_SHUTDOWN=shutdown
SHUTDOWN_TIMEOUT=100

This is what I see in the log (in Spanish):

# journalctl -a _SYSTEMD_UNIT=libvirt-guests.service
_SYSTEMD_UNIT=libvirtd.service

nov 17 23:38:15 xenon.miceliux.com libvirtd[750]: libvirt version: 1.1.3.1,
package: 1.fc20 (Fedora Project, 2013-11-06-18:12:08,
buildvm-04.phx2.fedoraproject.org)
nov 17 23:38:15 xenon.miceliux.com libvirtd[750]: Received unexpected event 1
nov 17 23:38:16 xenon.miceliux.com libvirtd[750]: Error interno: Fin del
archivo desde monitor
nov 17 23:38:16 xenon.miceliux.com libvirtd[750]: Error interno: Falta objeto
de respuesta de monitor
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Ejecutando
huéspedes en URI default:lithium
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Cerrando
huéspedes en URI default...
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Starting shutdown
on guest: lithium
nov 17 23:38:17 xenon.miceliux.com libvirtd[750]: Object (nil) ((unknown)) is
not a virObjectLockable instance
nov 17 23:38:17 xenon.miceliux.com libvirtd[750]: Argumento inválido: el
monitor no debe poseer un valor NULL
nov 17 23:38:17 xenon.miceliux.com libvirt-guests.sh[21049]: Error:Falló al
apagar el dominio 3ee0acb2-3da9-7045-868c-3dec16e03ad1
nov 17 23:38:17 xenon.miceliux.com libvirt-guests.sh[21049]: Error:Argumento
inválido: el monitor no debe poseer un valor NULL


Which translated is something like:

nov 17 23:38:15 xenon.miceliux.com libvirtd[750]: libvirt version: 1.1.3.1,
package: 1.fc20 (Fedora Project, 2013-11-06-18:12:08,
buildvm-04.phx2.fedoraproject.org)
nov 17 23:38:15 xenon.miceliux.com libvirtd[750]: Received unexpected event 1
nov 17 23:38:16 xenon.miceliux.com libvirtd[750]: Internal error: End of file
from monitor
nov 17 23:38:16 xenon.miceliux.com libvirtd[750]: Internal error: Response
object from monitor is missing
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Running guests in
URI default:lithium
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Closing guests in
default URI...
nov 17 23:38:16 xenon.miceliux.com libvirt-guests.sh[21049]: Starting shutdown
on guest: lithium
nov 17 23:38:17 xenon.miceliux.com libvirtd[750]: Object (nil) ((unknown)) is
not a virObjectLockable instance
nov 17 23:38:17 xenon.miceliux.com libvirtd[750]: Invalid argument: monitor
must not have a NULL value
nov 17 23:38:17 xenon.miceliux.com libvirt-guests.sh[21049]: Error:Failed to
shutdown domain 3ee0acb2-3da9-7045-868c-3dec16e03ad1
nov 17 23:38:17 xenon.miceliux.com libvirt-guests.sh[21049]: Error:Invalid
argument: monitr must not have a NULL value

Comment 4 Cole Robinson 2013-11-21 19:55:51 UTC
Libvirt is using the CreateMachine dbus call to create the scopes for qemu guests. There doesn't seem to be any way here to inject a Before= or After= value for a scope: that API has a 'properties' array, but systemd.resource-control(5) doesn't list Before= or After= so I don't think that's an option. It seems part of this solution needs to be handled on the systemd side. Reassigning.

Can someone from systemd dev side weigh in here? I'd really like to get this fixed before F20 is out the door, it's a potential data loss issue if people are depending on libvirt-guests.service to actually work.

If someone tells me what to do I can handle the libvirt patch

Comment 5 Cole Robinson 2013-11-21 19:58:08 UTC
Requesting a Final Freeze Exception here: this is a potential data loss issue if people are depending on libvirt-guests.service to gracefully stop their VMs on host shutdown, without this issue fixed they are effectively hard powered off.

Comment 6 Daniel Berrangé 2013-11-22 10:05:22 UTC
My understanding was that the 'properties' array can contain any property that is valid for the .scope unit. NB systemd.resource-control only lists the properties that are specific to .scope units. They implicitly inherit anything listed in systemd.unit man page too.

Comment 7 Cole Robinson 2013-11-22 12:40:35 UTC
The documentation confused me here, I though properties mapped to SetUnitProperties which sounds like it only supports cgroup related tweaking. I didn't actually try patching libvirt, but I'll do that shortly and report back.

Comment 8 Cole Robinson 2013-11-22 15:20:09 UTC
I tried:

diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c
index 503fff7..3243d35 100644
--- a/src/util/virsystemd.c
+++ b/src/util/virsystemd.c
@@ -243,8 +243,9 @@ int virSystemdCreateMachine(const char *name,
                           iscontainer ? "container" : "vm",
                           (unsigned int)pidleader,
                           rootdir ? rootdir : "",
-                          1, "Slice", "s",
-                          slicename) < 0)
+                          2,
+                          "Slice", "s", slicename,
+                          "After", "s", "libvirt-guests.service") < 0)
         goto cleanup;
 
     ret = 0;


and:

diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c
index 503fff7..c459bf4 100644
--- a/src/util/virsystemd.c
+++ b/src/util/virsystemd.c
@@ -243,8 +243,8 @@ int virSystemdCreateMachine(const char *name,
                           iscontainer ? "container" : "vm",
                           (unsigned int)pidleader,
                           rootdir ? rootdir : "",
-                          1, "Slice", "s",
-                          slicename) < 0)
+                          1,
+                          "After", "s", "libvirt-guests.service") < 0)
         goto cleanup;
 
     ret = 0;



Both caused guest the dbus call to fail. I also tried doing:

> sudo systemctl set-property machine-qemu\x2df18.scope "After=libvirt-guests.service"
Unknown assignment After=libvirt-guests.service.

systemctl code seems to have a white list though, so not sure if SetUnitProperties would actually work with 'After' or not. Tried triggering it with d-feet but I couldn't figure out how that worked.

Comment 9 Adam Williamson 2013-11-27 19:51:05 UTC
Discussed at 2013-11-27 freeze exception review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-11-27/f20-blocker-review-3.2013-11-27-17.01.log.txt . We're broadly of the opinion that the impact of this bug is such that we'd consider granting a freeze exception, but the efforts to fix it so far don't inspire much confidence in us. It doesn't look like anyone is completely confident about fixing this, people seem to be poking around with a stick until something looks like it works, which isn't the kind of change we like to pull through freezes.

So we delayed the determination. If someone can come up with a fix soon that looks fairly targeted, is tested, and has a convincing explanation for why it will work and won't break anything else, we may grant the exception. Otherwise, expect it to be rejected; there's more leeway for testing the fix carefully via the normal update process.

Comment 10 Adam Williamson 2013-12-02 19:00:57 UTC
Discussed at 2013-12-02 freeze exception review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-12-02/f20-blocker-review-%234.2013-12-02-17.02.log.txt . At this point there's three days till go/no-go and it seems too late to be poking this if we're not very sure what we're doing, so this is now rejected as a freeze exception issue. We could re-consider this decision if a fix shows up soon, tests well, and the release turns out to be delayed; do re-propose if all that comes to pass.

Comment 11 Cole Robinson 2013-12-14 18:15:56 UTC
I sent a follow up to the systemd-devel thread, CCing lennart and asking for further input.

Comment 12 Mohammed Arafa 2013-12-16 20:54:48 UTC
this is also true of fedora19
marafa@notebook:~$ cat /etc/sysconfig/libvirt-guests |grep -vE "^$|^#"
ON_SHUTDOWN=suspend
marafa@notebook:~$

Comment 13 Cole Robinson 2013-12-16 21:10:06 UTC
(In reply to Mohammed Arafa from comment #12)
> this is also true of fedora19
> marafa@notebook:~$ cat /etc/sysconfig/libvirt-guests |grep -vE "^$|^#"
> ON_SHUTDOWN=suspend
> marafa@notebook:~$

The libvirt that uses scopes should be in F19, unless you are using fedora-virt-preview repo. What libvirt version are you using?

Comment 14 Mohammed Arafa 2013-12-18 12:37:44 UTC
hi cole

i have upgraded to f20 already. 

yum.log says
Nov 21 08:02:59 Updated: libvirt-1.0.5.7-2.fc19.x86_64

and rpm -qi libvirt says
Version     : 1.1.3.1
Release     : 2.fc20

Comment 15 Lennart Poettering 2014-02-23 16:26:49 UTC
As discussed with the libvirt guys we now have a somewhjat more comprehensive APi for this in the scope logic of 209 that should be able to handle this.

Comment 16 Daniel Berrangé 2014-02-24 10:06:36 UTC
For reference we have also confirmed that we can fix this on the libvirt side, even without the systemd API enhancements in the short term. https://www.redhat.com/archives/libvir-list/2014-February/msg01358.html

Comment 17 Fedora Update System 2014-03-05 19:03:55 UTC
libvirt-1.1.3.4-2.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/libvirt-1.1.3.4-2.fc20

Comment 18 Fedora Update System 2014-03-07 06:33:19 UTC
Package libvirt-1.1.3.4-2.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-1.1.3.4-2.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-3543/libvirt-1.1.3.4-2.fc20
then log in and leave karma (feedback).

Comment 19 Fedora Update System 2014-03-10 14:46:26 UTC
libvirt-1.1.3.4-3.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/libvirt-1.1.3.4-3.fc20

Comment 20 Fedora Update System 2014-03-19 08:41:26 UTC
libvirt-1.1.3.4-3.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.