Bug 1195544 - libvirt-guests.service is timed out by systemd before guests finish suspending
Summary: libvirt-guests.service is timed out by systemd before guests finish suspending
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-24 03:37 UTC by Rob Foehl
Modified: 2015-12-28 22:53 UTC (History)
11 users (show)

Fixed In Version: libvirt-1.2.18.2-1.fc23
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-28 22:53:23 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Rob Foehl 2015-02-24 03:37:33 UTC
Description of problem:

When rebooting a Fedora 21 host, the libvirt-guests.sh script can reach the default 90 second timeout and be terminated by systemd before all guests have finished suspending.  When the host restarts, any remaining guests have effectively been powered off without a clean shutdown.


Version-Release number of selected component (if applicable):

libvirt-client 1.2.9.2-1
systemd 216-20


How reproducible:

100% with sufficiently long suspend times across guests


Steps to Reproduce:
1. Reboot a host with guests requiring >90 seconds to suspend
2. Observe guests boot instead of resume


Additional info:

The service timeout appears to be the DefaultTimeoutStopSec value of 90s, as it's not in the unit file, but present for the unit itself:

╶➤ systemctl show libvirt-guests |grep Timeout
TimeoutStartUSec=0
TimeoutStopUSec=1min 30s

This at least suggests systemd is behaving as expected given the configuration, but this timeout is way too short (and at odds with the 300s per guest default timeout for the shutdown case in libvirt-guests.sh either way).

Slightly anonymized logs of this occurring on a host with several large-ish VMs and spinning disks:

Feb 23 21:06:31 kvmhost libvirt-guests.sh[17385]: Running guests on default URI: vm1, vm2, vm3, vm4
Feb 23 21:06:31 kvmhost libvirt-guests.sh[17385]: Suspending guests on default URI...
Feb 23 21:06:31 kvmhost libvirt-guests.sh[17385]: Suspending vm1: ...
Feb 23 21:06:36 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 1.918 GiB
Feb 23 21:06:41 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 3.248 GiB
Feb 23 21:06:46 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 3.454 GiB
Feb 23 21:06:51 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 3.523 GiB
Feb 23 21:06:56 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 3.631 GiB
Feb 23 21:07:01 kvmhost libvirt-guests.sh[17385]: Suspending vm1: 3.743 GiB
Feb 23 21:07:52 kvmhost libvirt-guests.sh[17385]: Suspending vm1: ...
Feb 23 21:07:53 kvmhost libvirt-guests.sh[17385]: Suspending vm1: done
Feb 23 21:07:53 kvmhost libvirt-guests.sh[17385]: Suspending vm2: ...
Feb 23 21:07:58 kvmhost libvirt-guests.sh[17385]: Suspending vm2: 1.998 GiB
Feb 23 21:08:00 kvmhost systemd[1]: libvirt-guests.service stopping timed out. Terminating.
Feb 23 21:08:00 kvmhost systemd[1]: Unit libvirt-guests.service entered failed state.
Feb 23 21:08:00 kvmhost systemd[1]: libvirt-guests.service failed.

In this instance, only vm1 was properly suspended.

Comment 1 Fedora End Of Life 2015-11-04 13:28:46 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 2 Cole Robinson 2015-11-04 23:06:35 UTC
Still likely an issue on f23. I think we can just add

TimeoutStop=0

to the unit file, to disable any timeout

Comment 3 Michal Privoznik 2015-11-20 15:39:37 UTC
Fixed upstream in:

commit ba08d16d6cec81656b333435650aef36a012034c
Author:     Guido Günther <agx>
AuthorDate: Tue Nov 17 08:39:46 2015 +0100
Commit:     Guido Günther <agx>
CommitDate: Wed Nov 18 08:15:12 2015 +0100

    libvirt-guests: Disable shutdown timeout
    
    Since we can't know at service start how many VMs will be running we
    can't calculate an apropriate shutdown timeout. So instead of killing
    off the service just let it use it's own internal timeout mechanism.
    
    References:
        http://bugs.debian.org/803714
        https://bugzilla.redhat.com/show_bug.cgi?id=1195544


v1.2.21-68-gba08d16

Comment 4 Fedora Update System 2015-12-24 14:56:18 UTC
libvirt-1.2.18.2-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-30b347dff1

Comment 5 Fedora Update System 2015-12-25 01:57:16 UTC
libvirt-1.2.18.2-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-30b347dff1

Comment 6 Fedora Update System 2015-12-28 22:52:44 UTC
libvirt-1.2.18.2-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.