Bug 1270594 - VM migration supportability based on enabled qemu-kvm features
VM migration supportability based on enabled qemu-kvm features
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.2
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Amit Shah
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-11 11:01 EDT by Moran Goldboim
Modified: 2015-12-08 07:05 EST (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-08 07:05:40 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Moran Goldboim 2015-10-11 11:01:23 EDT
Description of problem:
from RHEVM (management) perspective, in order to define better our SLA policies we would like to know the support matrix and conditions to have a live migration working between hosts when certain features are enabled, like SR-IOV, NUMA, CPU Pinning, TSC etc.

Thanks. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Amit Shah 2015-10-13 01:15:25 EDT
(In reply to Moran Goldboim from comment #0)
> Description of problem:
> from RHEVM (management) perspective, in order to define better our SLA
> policies

Can you please be more specific as to what you require?

There are several cases here:
* Does migration start at all? (e.g. inhibited due to migration blockers, as you mentioned)

* How much chance does a migration, once started, have at converging? (This info can be obtained via the live stats available upon querying the MigrationInfo that shows the rate of change, and the amount of RAM still left to be migrated).

> from RHEVM (management) perspective, in order to define better our SLA
> policies we would like to know the support matrix and conditions to have a
> live migration working between hosts when certain features are enabled, like
> SR-IOV, NUMA, CPU Pinning, TSC etc.

A matrix that we provide now may get outdated when new features are added, so in my opinion it's better to query the current information from QEMU to get accurate information.

I quickly hacked up something, is this acceptable:

$ ./x86_64-softmmu/qemu-system-x86_64 -monitor stdio /var/tmp/t.qcow 
QEMU 2.4.50 monitor - type 'help' for more information
(qemu) info migrate
# active blockers: 1
(qemu) migrate tcp:localhost:8080
migrate: The qcow format used by node 'ide0-hd0' does not support live migration
(qemu) info migrate
# active blockers: 1
(qemu) 

That's for using the non-migratable qcow (not qcow2) format.

Another example:

$ ./x86_64-softmmu/qemu-system-x86_64 -monitor stdio -cpu host,migratable=no,+invtsc -enable-kvm
QEMU 2.4.50 monitor - type 'help' for more information
(qemu) info migrate
# active blockers: 1
(qemu) migrate tcp:localhost:8080
migrate: State blocked by non-migratable device 'cpu'
(qemu) quit

And using both, invtsc and qcow bump up the count of active_blockers:

$ ./x86_64-softmmu/qemu-system-x86_64 -monitor stdio -cpu host,migratable=no,+invtsc -enable-kvm /var/tmp/t.qcow 
QEMU 2.4.50 monitor - type 'help' for more information
(qemu) info migrate
# active blockers: 2
(qemu) 


This just shows that migration from the current host will be blocked, because there's one or more blockers.  Is this information sufficient?

Is there anything else apart from migration blockers that you would like to be exposed?
Comment 3 Moran Goldboim 2015-10-19 02:28:19 EDT
(In reply to Amit Shah from comment #2)
> (In reply to Moran Goldboim from comment #0)
> > Description of problem:
> > from RHEVM (management) perspective, in order to define better our SLA
> > policies
> 
> Can you please be more specific as to what you require?
> 
> There are several cases here:
> * Does migration start at all? (e.g. inhibited due to migration blockers, as
> you mentioned)

this is what we're looking for.

> 
> * How much chance does a migration, once started, have at converging? (This
> info can be obtained via the live stats available upon querying the
> MigrationInfo that shows the rate of change, and the amount of RAM still
> left to be migrated).
> 
> > from RHEVM (management) perspective, in order to define better our SLA
> > policies we would like to know the support matrix and conditions to have a
> > live migration working between hosts when certain features are enabled, like
> > SR-IOV, NUMA, CPU Pinning, TSC etc.
> 
> A matrix that we provide now may get outdated when new features are added,
> so in my opinion it's better to query the current information from QEMU to
> get accurate information.
> 
> I quickly hacked up something, is this acceptable:
> 
> $ ./x86_64-softmmu/qemu-system-x86_64 -monitor stdio /var/tmp/t.qcow 
> QEMU 2.4.50 monitor - type 'help' for more information
> (qemu) info migrate
> # active blockers: 1
> (qemu) migrate tcp:localhost:8080
> migrate: The qcow format used by node 'ide0-hd0' does not support live
> migration
> (qemu) info migrate
> # active blockers: 1
> (qemu) 
> 
> That's for using the non-migratable qcow (not qcow2) format.
> 
> Another example:
> 
> $ ./x86_64-softmmu/qemu-system-x86_64 -monitor stdio -cpu
> host,migratable=no,+invtsc -enable-kvm
> QEMU 2.4.50 monitor - type 'help' for more information
> (qemu) info migrate
> # active blockers: 1
> (qemu) migrate tcp:localhost:8080
> migrate: State blocked by non-migratable device 'cpu'
> (qemu) quit
> 
> And using both, invtsc and qcow bump up the count of active_blockers:
> 
> $ ./x86_64-softmmu/qemu-system-x86_64 -monitor stdio -cpu
> host,migratable=no,+invtsc -enable-kvm /var/tmp/t.qcow 
> QEMU 2.4.50 monitor - type 'help' for more information
> (qemu) info migrate
> # active blockers: 2
> (qemu) 
> 
> 
> This just shows that migration from the current host will be blocked,
> because there's one or more blockers.  Is this information sufficient?
> 
> Is there anything else apart from migration blockers that you would like to
> be exposed?

my question would be here if we block the migration anyhow or we can allow it under certain conditions. say that the destination host has sriov hardware configured. so i would considering 3 options here:
-allow: no migration limitations
-block: block live migration
-conditional: allow live migration if specific conditions are followed on the destination host.

i think your way of reporting it from qemu-kvm level is great. Adding few Eng folks to review. Michal, Doron, can you please review.
Comment 4 Doron Fediuck 2015-10-19 03:44:34 EDT
(In reply to Moran Goldboim from comment #3)
> (In reply to Amit Shah from comment #2)
> > (In reply to Moran Goldboim from comment #0)
> > > Description of problem:
> > > from RHEVM (management) perspective, in order to define better our SLA
> > > policies
> > 


> 
> i think your way of reporting it from qemu-kvm level is great. Adding few
> Eng folks to review. Michal, Doron, can you please review.

At the management level we should avoid getting into a flow which will
block migration at qemu level. ie- a user friendly system should disallow
or error before hitting this error at the lower error.

What we probably need is the migration blocking information imported
or queried ahead of time and by this preventing the relevant flow. Is
there a way to get a qemu report on features which blocks migration?
Comment 5 Amit Shah 2015-10-28 10:38:00 EDT
(In reply to Doron Fediuck from comment #4)
> At the management level we should avoid getting into a flow which will
> block migration at qemu level. ie- a user friendly system should disallow
> or error before hitting this error at the lower error.

There are two ways to go about it:

1) When the user creates a VM with a config, create such a VM, and query via QMP the 'info migration' status as shown in comment 2.  If the 'active blockers' field is > 0, the VM will not be migratable.

2) Query QEMU via the command line (or QMP) on a host for things that block migration, e.g.:

   qemu-kvm-rhev --list-migration-blockers

and this will provide a list of devices that block migration.  Conceptually, this is a better approach, because it's per-host, and you have to do it only once (and store the state somewhere).  However, it's not as easy to implement in QEMU, as not all blockers are devices (e.g. the qcow file format, as shown in comment 2, is also a migration blocker, and that's not a device that can be queried).

Item (1) has the advantage that you don't have to store the blocker info somewhere, and can be queried dynamically.  (2) has the advantage that all the info is available in one place.

If we do it outside of code (e.g. via documentation), there's a fear that the running qemu version and the doc don't match, and some information might be stale.

> What we probably need is the migration blocking information imported
> or queried ahead of time and by this preventing the relevant flow. Is
> there a way to get a qemu report on features which blocks migration?

In this spirit, we can have a --always-migratable cmdline switch that will not allow a new non-migratable configuration to be introduced in case a migration blocker is found in a new device (on cmdline or hotplug).  This will ensure a VM is always migratable.

We can implement one or a combination of these ideas, or even explore more.  Let me know if any of these are workable for you.

Note You need to log in before you can comment on or make changes to this bug.