Bug 1301986
Summary: | Block based migration fails with unable to execute QEMU command 'migrate': this feature or command is not currently supported | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | ||||
Component: | rhosp-director | Assignee: | Hugh Brock <hbrock> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Shai Revivo <srevivo> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.0 (Kilo) | CC: | athomas, berrange, emacchi, fj-lsoft-ofuku, jcoufal, kchamart, markmc, mburns, mcornea, rhel-osp-director-maint, sasha, sgordon, yafu | ||||
Target Milestone: | --- | ||||||
Target Release: | 10.0 (Newton) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-10-10 04:16:03 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1274548, 1301831 | ||||||
Attachments: |
|
Description
Marius Cornea
2016-01-26 13:37:27 UTC
Not clear what's going on here - I think you'll need debug level libvirtd.log I'd be interested in seeing e.g. '{"execute":"migrate","arguments":{"detach":true,"blk":true,"inc":false this is the qemu migrate command ... need to see what migrate command is being issued and why it's failing There could well be a limitation specific to qemu-kvm-rhev that you must use tunnelled mode for block migration. It does work with stock RHEL7 qemu-kvm AIUI, however. See also https://bugs.launchpad.net/nova/+bug/1441054 The question here for OSP director is whether VIR_MIGRATE_TUNNELLED is the right default, and whether we have the necessary documentation to configure block migration. This is what I got from the the libvirtd log: QEMU_MONITOR_SEND_MSG: mon=0x7efbc0001020 msg={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":true,"uri":"fd:migrate"},"id":"libvirt-30"} QEMU_MONITOR_IO_WRITE: mon=0x7efbc0001020 buf={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":true,"uri":"fd:migrate"},"id":"libvirt-30"} internal error: unable to execute QEMU command 'migrate': this feature or command is not currently supported QEMU_MONITOR_SEND_MSG: mon=0x7efbc0001020 msg={"execute":"closefd","arguments":{"fdname":"migrate"},"id":"libvirt-31"} QEMU_MONITOR_IO_WRITE: mon=0x7efbc0001020 buf={"execute":"closefd","arguments":{"fdname":"migrate"},"id":"libvirt-31"} Installed packages: qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64 qemu-kvm-common-rhev-2.3.0-31.el7_2.4.x86_64 Please let me know if I may provide other information to get this sorted out. Thanks. Marius, Please also note the exact libvirt version you've used when you experienced the bug. --- Some context (also probably why Mark says "Not clear what is going on here" above in Comment #1): In the above Launchpad bug, from comments 3 (Mark), 4 (Kashyap), and 6 (from the reporter himself): https://bugs.launchpad.net/nova/+bug/1441054/comments/3 https://bugs.launchpad.net/nova/+bug/1441054/comments/4 We've observed that in the intervening time the above upstream bug (not anymore, see below) was filed (2015-04-07) and now, some (which _exactly_ is we're unsure) version of libvirt gained the ability to gracefully fall back to an older approach (the QMP command "inc":true way, that Mark also mentioned; the second, newer approach is this[*]) of handling live block migration even when the VIR_MIGRATE_TUNNELLED flag is set. From Mark's testing, he was using the libvirt version: libvirt-1.2.17-13.el7_2.2.x86_64 From my testing, the versions are: libvirt-1.2.13.1-3.fc22.x86_64 qemu-system-x86-2.3.1-7.fc22.x86_64 We now know that the the above versions of libvirt will gracefully handle the live block migration when the flag VIR_MIGRATE_TUNNELLED is enabled. So, if you're using libvirt-1.2.17-13.el7_2.2 (or above), you should see the live block migration succeed. I.e. if you're the said libvirt version, you should see something like (this message is from my testing on Fedora-22): ----------- 2016-01-07 12:02:26.886+0000: 13202: warning : qemuMigrationBeginPhase:2654 : NBD in tunnelled migration is currently not supported ----------- Then, it tells you that its falling back to the older implementation: ----------- 2016-01-07 12:02:27.212+0000: 13202: debug : qemuMigrationDriveMirror:1727 : Destination doesn't support NBD server Falling back to previous implementation. [...] 2016-01-07 12:02:27.226+0000: 13202: debug : qemuMonitorJSONCommandWithFd:290 : Send command '{"execute":"migrate","arguments":{{"detach":true,"blk":false,"inc":true,"uri":"fd:migrate"},"id":"libvirt-18"}' for write with FD -1 ----------- So, please re-test with version at least libvirt-1.2.17-13.el7_2.2 (or higher). Live block migration with TUNNELLED should/will succeed. [*] http://wiki.libvirt.org/page/NBD_storage_migration PS: Good news is Mark has recently done some upstream work in Nova to make this all a little more clearer. Created attachment 1118851 [details]
libvirtd.log
Thank you for the clarifications but the libvirt version on the computes is: libvirt-1.2.17-13.el7_2.2.x86_64. I'm attaching the libvirt log. One thing I should mention about my environment is that it's running ipv6 addressing. If there's anything else I can do to debug please let me know.
[After a conversation with libvirt/QEMU/Nova devs on IRC today] Tl;dr: The error message you see ("QEMU command 'migrate': this feature or command is not currently supported") is expected when you attempt to perform live block migration, without any need for a shared storage, either with qemu-kvm/qemu-kvm-rhev. As you've found, you have to not set the VIR_MIGRATE_TUNNELLED flag in 'block_migration_flags' in Nova, until a better solution (native encryption support in QEMU) is in place. Delving into some level of detail, that (hopefully) clarifies _why_ this happening: - The old approach (QEMU 'migrate -b'/"inc:" true) to perform live block migration without shared storage is considered legacy, and has known limitations -- such as: all data, storage & memory, are sent over a single TCP connection. This old way _does_ allow you to use libvirt's TUNNELLED (which takes advantage of encryption capabilities inbuilt in libvirt's RPC protocol) migration transport. (But note: the QEMU code involved in this sceanario is legacy & unrelaible, not supported in RHEL-7; only in RHEL-6). Thus, it (the old approach) is disabled in QEMU code (from NOV-2013); consequently it's unsupported in its base RHEL version (qemu-kvm) nor in RHEV (qemu-kvm-rhev). This is the cause of the said error ("this feature or command is not currently supported"). - The new approach (referred to as 'NBD (Network Block Device) based migration') is much more reliable and more efficient, because it uses separate TCP connections for storage & memory. However, there's no support for NBD in libvirt's TUNNELLED data transport, also for good reasons, because of limitations like extra data copies involved, etc. To alleviate this, as I write this, there's work in-progress in upstream QEMU, to add native encryption support for NBD and migration in QEMU itself, thus securing live block migration, and avoid the need to use TUNNELLED transport altogether. This is likely to land in QEMU 2.6 (the next version of QEMU). ----- [Additional information.] (*) There's a QEMU bug to improve the error message: https://bugzilla.redhat.com/show_bug.cgi?id=1203214 -- Improve not supported messages for block migration (*) A clarification from my comment #4 above: From the Launchpad bug, Mark didn't mention the QEMU version he was using there, however, on IRC today, he clarified he was using 'qemu-kvm' RPM (but not 'qemu-kvm-rhev'). Which, birefly, made us suspicious that maybe the legacy approach is only disabled in 'qemu-kvm-rhev', but not in 'qemu-kvm' -- but that turned out to be false, the legacy approach is disabled in all cases on RHEL/RHEV. Thanks for the details. What would be the right approach to get it working downstream at this point? Is there any chance we can get an updated version of qemu-kvm provided in downstream or should we get the installer remove the VIR_MIGRATE_TUNNELLED flag which comes as a default? (In reply to Marius Cornea from comment #7) [...] > What would be the right approach to get it working downstream at this > point? Is there any chance we can get an updated version of qemu-kvm > provided in downstream or should we get the installer remove the > VIR_MIGRATE_TUNNELLED flag which comes as a default? NB: Just a new qemu-kvm-rhev would not suffice (and it won't be available immediately either) -- support for the new native QEMU encryption needs to be wired into libvirt, and Nova. For now, installer not using the flag VIR_MIGRATE_TUNNELLED seems like a reasonable compromise. *** Bug 1301831 has been marked as a duplicate of this bug. *** (In reply to Kashyap Chamarthy from comment #8) > For now, installer not using the flag VIR_MIGRATE_TUNNELLED seems like a > reasonable compromise. My only question is do we want to do that in *all* deployments or only those w/o shared storage? (In reply to Stephen Gordon from comment #10) > (In reply to Kashyap Chamarthy from comment #8) > > For now, installer not using the flag VIR_MIGRATE_TUNNELLED seems like a > > reasonable compromise. > > My only question is do we want to do that in *all* deployments or only those > w/o shared storage? TUNNELLED mode provides a useful security enhancement for people who do not ever need block migration. That would be a reason to keep it when doing shared storage deployments. The flip side is that if we have some deployments with TUNNELLED and some without we have 2 separate communication architectures to test & maintain. That could push us towards never using TUNNELLED at all, and just wait for future QEMU enhancements in 7.4 to provide strong security. Steve: I hope Dan's response in comment#12 answers your question. I sent a patch in TripleO: https://review.openstack.org/286584 Please review it so we're fixing it in OSP8 and eventually backport it to OSP7. Also, should we configure live_migration_flag that also include VIR_MIGRATE_TUNNELLED by default? Responded to comment #14, and comment #15 here: https://review.openstack.org/#/c/286584/2 thanks a lot for your help! This bug did not make the OSP 8.0 release. It is being deferred to OSP 10. Seems to be merged in previous releases. |