Description of problem: When rebooting a Fedora 21 host, the libvirt-guests.sh script can reach the default 90 second timeout and be terminated by systemd before all guests have finished suspending. When the host restarts, any remaining guests have effectively been powered off without a clean shutdown. Version-Release number of selected component (if applicable): libvirt-client 1.2.9.2-1 systemd 216-20 How reproducible: 100% with sufficiently long suspend times across guests Steps to Reproduce: 1. Reboot a host with guests requiring >90 seconds to suspend 2. Observe guests boot instead of resume Additional info: The service timeout appears to be the DefaultTimeoutStopSec value of 90s, as it's not in the unit file, but present for the unit itself: ╶➤ systemctl show libvirt-guests |grep Timeout TimeoutStartUSec=0 TimeoutStopUSec=1min 30s This at least suggests systemd is behaving as expected given the configuration, but this timeout is way too short (and at odds with the 300s per guest default timeout for the shutdown case in libvirt-guests.sh either way). Slightly anonymized logs of this occurring on a host with several large-ish VMs and spinning disks: Feb 23 21:06:31 kvmhost libvirt-guests.sh[17385]: Running guests on default URI: vm1, vm2, vm3, vm4 Feb 23 21:06:31 kvmhost libvirt-guests.sh[17385]: Suspending guests on default URI... Feb 23 21:06:31 kvmhost libvirt-guests.sh[17385]: Suspending vm1: ... Feb 23 21:06:36 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 1.918 GiB Feb 23 21:06:41 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 3.248 GiB Feb 23 21:06:46 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 3.454 GiB Feb 23 21:06:51 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 3.523 GiB Feb 23 21:06:56 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 3.631 GiB Feb 23 21:07:01 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 3.743 GiB Feb 23 21:07:52 kvmhost libvirt-guests.sh[17385]: Suspending vm1: ... Feb 23 21:07:53 kvmhost libvirt-guests.sh[17385]: Suspending vm1: done Feb 23 21:07:53 kvmhost libvirt-guests.sh[17385]: Suspending vm2: ... Feb 23 21:07:58 kvmhost libvirt-guests.sh[17385]: Suspending vm2: 1.998 GiB Feb 23 21:08:00 kvmhost systemd[1]: libvirt-guests.service stopping timed out. Terminating. Feb 23 21:08:00 kvmhost systemd[1]: Unit libvirt-guests.service entered failed state. Feb 23 21:08:00 kvmhost systemd[1]: libvirt-guests.service failed. In this instance, only vm1 was properly suspended.
This message is a reminder that Fedora 21 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 21. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '21'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 21 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Still likely an issue on f23. I think we can just add TimeoutStop=0 to the unit file, to disable any timeout
Fixed upstream in: commit ba08d16d6cec81656b333435650aef36a012034c Author: Guido Günther <agx> AuthorDate: Tue Nov 17 08:39:46 2015 +0100 Commit: Guido Günther <agx> CommitDate: Wed Nov 18 08:15:12 2015 +0100 libvirt-guests: Disable shutdown timeout Since we can't know at service start how many VMs will be running we can't calculate an apropriate shutdown timeout. So instead of killing off the service just let it use it's own internal timeout mechanism. References: http://bugs.debian.org/803714 https://bugzilla.redhat.com/show_bug.cgi?id=1195544 v1.2.21-68-gba08d16
libvirt-1.2.18.2-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-30b347dff1
libvirt-1.2.18.2-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-30b347dff1
libvirt-1.2.18.2-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.