Bug 1301986

Summary: Block based migration fails with unable to execute QEMU command 'migrate': this feature or command is not currently supported
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: rhosp-directorAssignee: Hugh Brock <hbrock>
Status: CLOSED CURRENTRELEASE QA Contact: Shai Revivo <srevivo>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: athomas, berrange, emacchi, fj-lsoft-ofuku, jcoufal, kchamart, markmc, mburns, mcornea, rhel-osp-director-maint, sasha, sgordon, yafu
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-10 04:16:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1274548, 1301831    
Attachments:
Description Flags
libvirtd.log none

Description Marius Cornea 2016-01-26 13:37:27 UTC
Description of problem:
Block based migration fails with unable to execute QEMU command 'migrate': this feature or command is not currently supported.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.6-112.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with 3 ctrls and 2 computes
2. Launch instance
3. Block migrate instance
nova live-migration --block-migrate vm02 overcloud-compute-1.localdomain

Actual results:
Fails with:
016-01-26 08:08:55.543 16887 ERROR nova.virt.libvirt.driver [req-7e9871a3-46df-4559-a3c9-18b7bdd49223 d3075e2e623e4be6805efb115ff74c0d 033fd3cc8a7b4804becfefac24eccbfa - - -] [instance: 3b3211a8-af89-4615-9ccd-76e4eba79c11] Live Migration failure: internal error: unable to execute QEMU command 'migrate': this feature or command is not currently supported

Expected results:
Migration succeeds.

Additional info:
In nova.conf there is:
#block_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_LIVE, VIR_MIGRATE_TUNNELLED, VIR_MIGRATE_NON_SHARED_INC

I had to remove VIR_MIGRATE_TUNNELLED and set it as following to make it work:
block_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_LIVE, VIR_MIGRATE_NON_SHARED_INC

Comment 1 Mark McLoughlin 2016-01-26 16:20:59 UTC
Not clear what's going on here - I think you'll need debug level libvirtd.log

I'd be interested in seeing e.g.

  '{"execute":"migrate","arguments":{"detach":true,"blk":true,"inc":false

this is the qemu migrate command ... need to see what migrate command is being issued and why it's failing

There could well be a limitation specific to qemu-kvm-rhev that you must use tunnelled mode for block migration. It does work with stock RHEL7 qemu-kvm AIUI, however.

See also https://bugs.launchpad.net/nova/+bug/1441054

The question here for OSP director is whether VIR_MIGRATE_TUNNELLED is the right default, and whether we have the necessary documentation to configure block migration.

Comment 3 Marius Cornea 2016-01-26 17:57:47 UTC
This is what I got from the the libvirtd log:

QEMU_MONITOR_SEND_MSG: mon=0x7efbc0001020 msg={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":true,"uri":"fd:migrate"},"id":"libvirt-30"}
QEMU_MONITOR_IO_WRITE: mon=0x7efbc0001020 buf={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":true,"uri":"fd:migrate"},"id":"libvirt-30"}
internal error: unable to execute QEMU command 'migrate': this feature or command is not currently supported
QEMU_MONITOR_SEND_MSG: mon=0x7efbc0001020 msg={"execute":"closefd","arguments":{"fdname":"migrate"},"id":"libvirt-31"}
QEMU_MONITOR_IO_WRITE: mon=0x7efbc0001020 buf={"execute":"closefd","arguments":{"fdname":"migrate"},"id":"libvirt-31"}

Installed packages:
qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64
qemu-kvm-common-rhev-2.3.0-31.el7_2.4.x86_64

Please let me know if I may provide other information to get this sorted out. Thanks.

Comment 4 Kashyap Chamarthy 2016-01-26 20:03:59 UTC
Marius,

Please also note the exact libvirt version you've used when you
experienced the bug.

---

Some context (also probably why Mark says "Not clear what is going on
here" above in Comment #1):

In the above Launchpad bug, from comments 3 (Mark), 4 (Kashyap), and 6
(from the reporter himself):

    https://bugs.launchpad.net/nova/+bug/1441054/comments/3
    https://bugs.launchpad.net/nova/+bug/1441054/comments/4

We've observed that in the intervening time the above upstream bug (not
anymore, see below) was filed (2015-04-07) and now, some (which
_exactly_ is we're unsure) version of libvirt gained the ability to
gracefully fall back to an older approach (the QMP command "inc":true
way, that Mark also mentioned; the second, newer approach is this[*]) of
handling live block migration even when the VIR_MIGRATE_TUNNELLED flag
is set.

From Mark's testing, he was using the libvirt version:

    libvirt-1.2.17-13.el7_2.2.x86_64

From my testing, the versions are:

    libvirt-1.2.13.1-3.fc22.x86_64
    qemu-system-x86-2.3.1-7.fc22.x86_64

We now know that the the above versions of libvirt will gracefully
handle the live block migration when the flag VIR_MIGRATE_TUNNELLED is
enabled.  

So, if you're using libvirt-1.2.17-13.el7_2.2 (or above), you should see
the live block migration succeed.  I.e. if you're the said libvirt
version, you should see something like (this message is from my testing
on Fedora-22):

-----------
2016-01-07 12:02:26.886+0000: 13202: warning : qemuMigrationBeginPhase:2654 : NBD in tunnelled migration is currently not supported
-----------

Then, it tells you that its falling back to the older implementation:

-----------
2016-01-07 12:02:27.212+0000: 13202: debug : qemuMigrationDriveMirror:1727 : Destination doesn't support NBD server Falling back to previous implementation.
[...]
    2016-01-07 12:02:27.226+0000: 13202: debug : qemuMonitorJSONCommandWithFd:290 : Send command '{"execute":"migrate","arguments":{{"detach":true,"blk":false,"inc":true,"uri":"fd:migrate"},"id":"libvirt-18"}' for write with FD -1
-----------

So, please re-test with version at least libvirt-1.2.17-13.el7_2.2 (or
higher).  Live block migration with TUNNELLED should/will succeed.


[*] http://wiki.libvirt.org/page/NBD_storage_migration

PS: Good news is Mark has recently done some upstream work in Nova to
make this all a little more clearer.

Comment 5 Marius Cornea 2016-01-27 18:26:30 UTC
Created attachment 1118851 [details]
libvirtd.log

Thank you for the clarifications but the libvirt version on the computes is: libvirt-1.2.17-13.el7_2.2.x86_64. I'm attaching the libvirt log. One thing I should mention about my environment is that it's running ipv6 addressing. If there's anything else I can do to debug please let me know.

Comment 6 Kashyap Chamarthy 2016-01-28 16:04:25 UTC
[After a conversation with libvirt/QEMU/Nova devs on IRC today]

Tl;dr: The error message you see ("QEMU command 'migrate': this feature
       or command is not currently supported") is expected when you
       attempt to perform live block migration, without any need for a
       shared storage, either with qemu-kvm/qemu-kvm-rhev.

       As you've found, you have to not set the VIR_MIGRATE_TUNNELLED
       flag in 'block_migration_flags' in Nova, until a better solution
       (native encryption support in QEMU) is in place.

Delving into some level of detail, that (hopefully) clarifies _why_ this
happening:

  - The old approach (QEMU 'migrate -b'/"inc:" true) to perform live
    block migration without shared storage is considered legacy, and has
    known limitations -- such as: all data, storage & memory, are sent
    over a single TCP connection.  This old way _does_ allow you to use
    libvirt's TUNNELLED (which takes advantage of encryption
    capabilities inbuilt in libvirt's RPC protocol) migration transport.
    (But note: the QEMU code involved in this sceanario is legacy &
    unrelaible, not supported in RHEL-7; only in RHEL-6).

    Thus, it (the old approach) is disabled in QEMU code (from
    NOV-2013); consequently it's unsupported in its base RHEL version
    (qemu-kvm) nor in RHEV (qemu-kvm-rhev).

    This is the cause of the said error ("this feature or command is not
    currently supported").

  - The new approach (referred to as 'NBD (Network Block Device) based
    migration') is much more reliable and more efficient, because it
    uses separate TCP connections for storage & memory.  However,
    there's no support for NBD in libvirt's TUNNELLED data transport,
    also for good reasons, because of limitations like extra data copies
    involved, etc.

    To alleviate this, as I write this, there's work in-progress in
    upstream QEMU, to add native encryption support for NBD and
    migration in QEMU itself, thus securing live block migration, and
    avoid the need to use TUNNELLED transport altogether.  This is
    likely to land in QEMU 2.6 (the next version of QEMU).

-----

[Additional information.]

(*) There's a QEMU bug to improve the error message:

    https://bugzilla.redhat.com/show_bug.cgi?id=1203214 --
    Improve not supported messages for block migration

(*) A clarification from my comment #4 above: From the Launchpad
    bug, Mark didn't mention the QEMU version he was using there,
    however, on IRC today, he clarified he was using 'qemu-kvm' RPM (but
    not 'qemu-kvm-rhev').  Which, birefly, made us suspicious that maybe
    the legacy approach is only disabled in 'qemu-kvm-rhev', but not in
    'qemu-kvm' -- but that turned out to be false, the legacy approach
    is disabled in all cases on RHEL/RHEV.

Comment 7 Marius Cornea 2016-01-28 20:50:05 UTC
Thanks for the details. What would be the right approach to get it working downstream at this point? Is there any chance we can get an updated version of qemu-kvm provided in downstream or should we get the installer remove the VIR_MIGRATE_TUNNELLED flag which comes as a default?

Comment 8 Kashyap Chamarthy 2016-01-28 23:18:40 UTC
(In reply to Marius Cornea from comment #7)

[...]

> What would be the right approach to get it working downstream at this
> point? Is there any chance we can get an updated version of qemu-kvm
> provided in downstream or should we get the installer remove  the
> VIR_MIGRATE_TUNNELLED flag which comes as a default?

NB: Just a new qemu-kvm-rhev would not suffice (and it won't be 
available immediately either) -- support for the new native QEMU
encryption needs to be wired into libvirt, and Nova.

For now, installer not using the flag VIR_MIGRATE_TUNNELLED seems like a 
reasonable compromise.

Comment 9 Stephen Gordon 2016-01-30 12:41:26 UTC
*** Bug 1301831 has been marked as a duplicate of this bug. ***

Comment 10 Stephen Gordon 2016-01-30 12:42:58 UTC
(In reply to Kashyap Chamarthy from comment #8)
> For now, installer not using the flag VIR_MIGRATE_TUNNELLED seems like a 
> reasonable compromise.

My only question is do we want to do that in *all* deployments or only those w/o shared storage?

Comment 12 Daniel Berrangé 2016-02-02 16:19:08 UTC
(In reply to Stephen Gordon from comment #10)
> (In reply to Kashyap Chamarthy from comment #8)
> > For now, installer not using the flag VIR_MIGRATE_TUNNELLED seems like a 
> > reasonable compromise.
> 
> My only question is do we want to do that in *all* deployments or only those
> w/o shared storage?

TUNNELLED mode provides a useful security enhancement for people who do not ever need block migration. That would be a reason to keep it when doing shared storage deployments.  The flip side is that if we have some deployments with TUNNELLED and some without we have 2 separate communication architectures to test & maintain. That could push us towards never using TUNNELLED at all, and just wait for future QEMU enhancements in 7.4 to provide strong security.

Comment 13 Kashyap Chamarthy 2016-02-23 08:43:22 UTC
Steve: I hope Dan's response in comment#12 answers your question.

Comment 14 Emilien Macchi 2016-03-01 13:29:01 UTC
I sent a patch in TripleO: https://review.openstack.org/286584

Please review it so we're fixing it in OSP8 and eventually backport it to OSP7.

Comment 15 Emilien Macchi 2016-03-01 13:32:44 UTC
Also, should we configure live_migration_flag that also include VIR_MIGRATE_TUNNELLED by default?

Comment 16 Kashyap Chamarthy 2016-03-01 14:10:14 UTC
Responded to comment #14, and comment #15 here: https://review.openstack.org/#/c/286584/2

Comment 17 Emilien Macchi 2016-03-01 15:16:08 UTC
thanks a lot for your help!

Comment 19 Mike Burns 2016-04-07 21:07:13 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 21 Jaromir Coufal 2016-10-10 04:16:03 UTC
Seems to be merged in previous releases.