Bug 625362

Summary: libvirt-guests should start and shut down guests in parallel
Product: Red Hat Enterprise Linux 6 Reporter: Dan Kenigsberg <danken>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: high    
Version: 6.0CC: atodorov, cpelland, dallan, dyuan, eblake, mzhan, rwu, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.9.10-4.el6 Doc Type: Bug Fix
Doc Text:
Cause: The libvirt-guests script executed operations on guests serially. Consequence: On machines with lots of guests the shutdown procedure took long as guests were waiting for shutdown of others. The shutdown procedure was inefficient as guests didn't use up all resources available. Fix: The libvirt-guests init script was tweaked to enable parallel operation on domains allowing to shorten the time of shutdown of the host. Result: The guests start and shutdown in parallel and utilize the host system's resources more efficiently. The shutdown time of the host will decrease in most cases.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 06:24:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Kenigsberg 2010-08-19 07:50:40 UTC
Description of problem:
On shutdown, libvirt-guests suspends (or shuts down) domains sequentially. This might make sens for suspend, but certainly not for shutdown. With multiple domains, each taking few seconds to shut down cleanly, time may accumulate needlessly.

the script had better started shutdown in all domains, and only then sleep until they exit (or timeout expires).

Comment 1 Dan Kenigsberg 2010-11-24 20:22:30 UTC
Sorry, no need to stress up for this one (from RHEV perspective). We are disabling libvirt-guests on our nodes, anyway.

Comment 6 Jiri Denemark 2011-11-23 11:04:19 UTC
*** Bug 729114 has been marked as a duplicate of this bug. ***

Comment 11 dyuan 2012-02-23 08:59:22 UTC
I see. the BOOT_TIMEOUT will be of no effect when the behavior is in parallel mode, yes ?

Comment 12 Peter Krempa 2012-02-23 09:25:34 UTC
BOOT_TIMEOUT (in the script it's actually called START_DELAY) actualy configures if the startup of the machines should be parallel (START_DELAY=0) or the script should wait the specified amount of time for the quest. Unfortunately, there's no way to reliably wait and detect a full guest boot-up, so we have to  use a timeout.

I'll add documentation to the START_DELAY variable about the serial/parallel behavior of the script depending on the configuration.

With this, the user may specify independently the startup and shutdown behaviors of the libvirt-guests script (eg. parallel startup and serial shutdown ...).

Comment 17 dyuan 2012-03-08 03:48:42 UTC
Please help to check the scenario 2, thanks.


####Scenario 1#### okay.

ON_SHUTDOWN=shutdown
PARALLEL_SHUTDOWN=3
SHUTDOWN_TIMEOUT=300

# service libvirt-guests stop

Running guests on default URI: rhel58, rhel62, rhel62-1, vr-guest_managedsave
Shutting down guests on default URI...
Starting shutdown on guest: rhel58
Starting shutdown on guest: rhel62
Starting shutdown on guest: rhel62-1
Shutdown of guest rhel62 complete.
Starting shutdown on guest: vr-guest_managedsave
Shutdown of guest rhel62-1 complete.
Shutdown of guest rhel58 complete.
Shutdown of guest vr-guest_managedsave complete.

####Scenario 2#### miss the 4th guest.

ON_SHUTDOWN=shutdown
PARALLEL_SHUTDOWN=3
SHUTDOWN_TIMEOUT=1

# service libvirt-guests stop

Running guests on default URI: rhel58, rhel62, rhel62-1, vr-guest_managedsave
Shutting down guests on default URI...
Starting shutdown on guest: rhel58
Starting shutdown on guest: rhel62
Starting shutdown on guest: rhel62-1
Timeout expired while shutting down domains

but the UUID is recorded in libvirt-guests.
# cat /var/lib/libvirt/libvirt-guests 
default 83e69755-f692-6413-0c90-2213eddbbbde 6a6839c3-8b51-125e-262f-f2d384367c49 05d9a9f8-3def-491c-e649-87718ea2d98a 3862afa0-3
ff8-80d1-51f2-cff6ec3880a6

####Scenario 3####

ON_SHUTDOWN=shutdown
PARALLEL_SHUTDOWN=0
SHUTDOWN_TIMEOUT=300

# service libvirt-guests stop

Running guests on default URI: rhel58, rhel62, rhel62-1, vr-guest_managedsave
Shutting down guests on default URI...
Shutting down rhel58: done         
Shutting down rhel62: done         
Shutting down rhel62-1: done         
Shutting down vr-guest_managedsave: done

####Scenario 4####
ON_SHUTDOWN=shutdown
PARALLEL_SHUTDOWN=0
SHUTDOWN_TIMEOUT=1

# service libvirt-guests stop

Running guests on default URI: rhel58, rhel62, rhel62-1, vr-guest_managedsave
Shutting down guests on default URI...
Shutting down rhel58: failed to shutdown in time
Shutting down rhel62: failed to shutdown in time
Shutting down rhel62-1: failed to shutdown in time
Shutting down vr-guest_managedsave: failed to shutdown in time

Comment 18 Peter Krempa 2012-03-08 13:32:56 UTC
In scenario 2 the timeout expires while the first three machines are still shutting down. As you only requested to shutdown 3 machines at time the fourth was never attempted because the timeout expired. Shutdown timeout in case of parallel shutdown operation is applied as a timeout to attempt to shut down all machines on a single URI.

This is documented in the sysconfig file above the HUTDOWN_TIMEOUT variable:
# Number of seconds we're willing to wait for a guest to shut down. If parallel
# shutdown is enabled, this timeout applies as a timeout for shutting down all
# guests on a single URI defined in the variable URIS.

Comment 19 dyuan 2012-03-09 02:12:44 UTC
Thanks Peter, move to VERIFIED according to comment 17 and comment 18.

Comment 20 Peter Krempa 2012-05-02 12:31:15 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: 
The libvirt-guests script executed operations on guests serially.

Consequence: 
On machines with lots of guests the shutdown procedure took long as guests were waiting for shutdown of others. The shutdown procedure was inefficient as guests didn't use up all resources available.

Fix:
The libvirt-guests init script was tweaked to enable parallel operation on domains allowing to shorten the time of shutdown of the host.

Result:
The guests start and shutdown in parallel and utilize the host system's resources more efficiently. The shutdown time of the host will decrease in most cases.

Comment 22 errata-xmlrpc 2012-06-20 06:24:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0748.html