Bug 1261430 - Migration doesn't work with memory hot-plug enabled
Summary: Migration doesn't work with memory hot-plug enabled
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Peter Krempa
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-09 10:15 UTC by Arik
Modified: 2015-11-05 12:16 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-05 12:16:28 UTC
Embargoed:


Attachments (Terms of Use)
vdsm log (15.93 MB, text/plain)
2015-09-09 10:18 UTC, Arik
no flags Details
journalctl (1.03 MB, text/x-vhdl)
2015-09-09 10:19 UTC, Arik
no flags Details

Description Arik 2015-09-09 10:15:53 UTC
Description of problem:
VM live migration doesn't work when the VM is created with memory hotplug support. The error is:
libvirtError: internal error: Unknown migration cookie feature memory-hotplug

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Run a VM with memory hotplug supported
2. Migrate the VM to another host
3.

Actual results:
Migration fails

Expected results:
Migration should work

Additional info:
When memory hotplug is supported in oVirt, the following tag is added to the domain-xml:
<maxMemory slots="16">4294967296</maxMemory>

Comment 1 Arik 2015-09-09 10:18:45 UTC
Created attachment 1071667 [details]
vdsm log

Comment 2 Arik 2015-09-09 10:19:23 UTC
Created attachment 1071668 [details]
journalctl

Comment 3 Ján Tomko 2015-09-09 10:52:01 UTC
(In reply to Arik from comment #0)
> Description of problem:
> VM live migration doesn't work when the VM is created with memory hotplug
> support. The error is:
> libvirtError: internal error: Unknown migration cookie feature memory-hotplug
> 

This seems like expected behavior if the destination libvirt does not support memory hotplug.

> Version-Release number of selected component (if applicable):
> 

What are the exact libvirt versions used on source and destination?

> Additional info:
> When memory hotplug is supported in oVirt, the following tag is added to the
> domain-xml:
> <maxMemory slots="16">4294967296</maxMemory>

This adds more devices to the QEMU command line, migrating such domain to systems where this is not supported will not be possible.

Comment 4 Arik 2015-09-09 11:02:54 UTC
(In reply to Ján Tomko from comment #3)
I ran the VM on both hosts (the source and the destination) and it runs fine with the memory hotplug support.

Both hosts are installed with the same OS, QEMU, libvirt:
OS - fedora 22
QEMU - 2.4.0
libvirt - 1.2.18
So I don't see a reason for the memory hotplug not being supported on each of these hosts.

When I disable the memory hotplug support in oVirt, the migration succeed.

Comment 5 Moran Goldboim 2015-09-17 10:37:33 UTC
Jan, how can we progress here, this one is urgent from management perspective, specifically the combination of those 2 important features.

Comment 7 Peter Krempa 2015-10-05 06:12:20 UTC
The error you are seeing comes from:

commit 136f3de4112c75af0b38fc1946f44e3658ed1890
Author: Peter Krempa <pkrempa>
Date:   Thu Jul 30 15:27:07 2015 +0200

    qemu: Reject migration with memory-hotplug if destination doesn't support it
    
    If destination libvirt doesn't support memory hotplug since all the
    support was introduced by adding new elements the destination would
    attempt to start qemu with an invalid configuration. The worse part is
    that qemu might hang in such situation.
    
    Fix this by sending a required migration feature called 'memory-hotplug'
    to the destination. If the destination doesn't recognize it it will fail
    the migration.


So apparently one of your hosts is running libvirt without that commit. Could you please post libvirtd debug logs from both sides of the migration that will contain the error?

Comment 8 Yaniv Kaul 2015-11-03 11:55:11 UTC
In which version do we have this commit?
I'm seeing something similar in bug 1277255.
Both seem to have libvirt version: 1.2.17, package: 13.el7
See https://bugzilla.redhat.com/attachment.cgi?id=1088936 and https://bugzilla.redhat.com/attachment.cgi?id=1088933 for both sides.

Comment 9 Peter Krempa 2015-11-05 06:07:28 UTC
(In reply to Yaniv Kaul from comment #8)
> In which version do we have this commit?
> I'm seeing something similar in bug 1277255.
> Both seem to have libvirt version: 1.2.17, package: 13.el7
> See https://bugzilla.redhat.com/attachment.cgi?id=1088936 and
> https://bugzilla.redhat.com/attachment.cgi?id=1088933 for both sides.

Upstream added this in v1.2.18-rc2-1-g136f3de so the 1.2.18 release contains it.

Comment 12 Peter Krempa 2015-11-05 10:30:05 UTC
(In reply to Yaniv Kaul from comment #8)
> In which version do we have this commit?
> I'm seeing something similar in bug 1277255.
> Both seem to have libvirt version: 1.2.17, package: 13.el7
> See https://bugzilla.redhat.com/attachment.cgi?id=1088936 and
> https://bugzilla.redhat.com/attachment.cgi?id=1088933 for both sides.


Thanks to Jan Tomko for noticing the following slight difference in error messages that I've overlooked:

In the migration cookie parser code, there are two checks :

    for (i = 0; i < n; i++) {
        int val;
        char *str = virXMLPropString(nodes[i], "name");
        if (!str) {
            virReportError(VIR_ERR_INTERNAL_ERROR,
                           "%s", _("missing feature name"));
            goto error;
        }

        if ((val = qemuMigrationCookieFlagTypeFromString(str)) < 0) {
            virReportError(VIR_ERR_INTERNAL_ERROR,
                           _("Unknown migration cookie feature %s"),
                           str);
            VIR_FREE(str);

This one produces the error message described in the original bug report. This is legitimate though.

            goto error;

Since this actually disallows migration, that is what I've intended to add in the commit mentioned in Comment #7.

        }


What I missed while writing and testing is the following check:
        if ((flags & (1 << val)) == 0) {
            virReportError(VIR_ERR_INTERNAL_ERROR,
                           _("Unsupported migration cookie feature %s"),

Since the commit mentioned above didn't add the correct flag to 'flags' at the point where it's called, this error message get's always printed once the memory hotplug flag is parsed ...

                           str);
            VIR_FREE(str);

... but due to a missing 'goto error;' it's not actually rejected.

        }
        VIR_FREE(str);
    }

So the error message in the bugzillas linked in commit 8 is basically a invalid error message but it does not break the migration. If the migration is failing it's failing due to a different problem.

The original report is complaining about the first case though so those are different problems.

I'll file a separate bug for fixing the error reporting.

Comment 13 Arik 2015-11-05 11:21:32 UTC
I cannot reproduce it anymore in my environment. Note that the hosts have been updated since then though.
Unfortunately, I have no way to check whether libvirt on my hosts included Peter's patch or not.
Anyway, seems like the mentioned patch or other fix that got in since then solved it u/s.

Comment 14 Peter Krempa 2015-11-05 12:15:21 UTC
(In reply to Peter Krempa from comment #12)
> (In reply to Yaniv Kaul from comment #8)
> 
> I'll file a separate bug for fixing the error reporting.

https://bugzilla.redhat.com/show_bug.cgi?id=1278404

Comment 15 Peter Krempa 2015-11-05 12:16:28 UTC
(In reply to Arik from comment #13)
> I cannot reproduce it anymore in my environment. Note that the hosts have
> been updated since then though.
> Unfortunately, I have no way to check whether libvirt on my hosts included
> Peter's patch or not.
> Anyway, seems like the mentioned patch or other fix that got in since then
> solved it u/s.

Thanks for reporting back. I'll close this bug now. If you happen to reproduce this again in the future please reopen this or file a new bug.


Note You need to log in before you can comment on or make changes to this bug.