Bug 1252685

Summary: Migration from RHEL6 to RHEL7 failed when set memory size != total_of_numa_memory.
Product: Red Hat Enterprise Linux 7 Reporter: Fangge Jin <fjin>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.2CC: dyuan, huding, jsuchane, juzhang, mzhan, ovasik, pkrempa, rbalakri, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.2.17-10.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 06:51:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
qemu log none

Description Fangge Jin 2015-08-12 05:45:11 UTC
Created attachment 1061795 [details]
qemu log

Description of problem:
Set memory size != total_of_numa_memory
a) memory size < total_of_numa_memory
  <memory unit='KiB'>1000448</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='1048576'/>
    </numa>

or b) memory size > total_of_numa_memory
  <memory unit='KiB'>1100800</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='1048576'/>
    </numa>

Start guest on RHEL6.7, then migrate it to RHEL7.2, migration failed:
# virsh migrate --live rhel6d7 qemu+ssh://10.66.6.6/system --verbose
root.6.6's password: 
error: operation failed: migration job: unexpectedly failed


Target version:
libvirt-1.2.17-4.el7.x86_64
qemu-kvm-rhev-2.3.0-13.el7.x86_64

Source version:
libvirt-0.10.2-54.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.479.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Prepare a running guest on RHEL6.7 host, set memory size != total_of_numa_memory:
a) memory size < total_of_numa_memory
  <memory unit='KiB'>1000448</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='1048576'/>
    </numa>

or b) memory size > total_of_numa_memory
  <memory unit='KiB'>1100800</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='1048576'/>
    </numa>

2.Check the qemu command line:
a)
# ps aux|grep rhel6d7
-m **977** -realtime mlock=off -smp 5,sockets=2,cores=4,threads=2 -numa node,nodeid=0,cpus=0-1,mem=1024
b)
# ps aux|grep rhel6d7
-m **1075** -realtime mlock=off -smp 5,sockets=2,cores=4,threads=2 -numa node,nodeid=0,cpus=0-1,mem=1024

3.Migrate the guest to RHEL7.2 host
# virsh migrate --live rhel6d7 qemu+ssh://10.66.6.6/system --verbose
root.6.6's password: 
error: operation failed: migration job: unexpectedly failed
 
4.Check the qemu log:
-m **1024** -realtime mlock=off -smp 5,sockets=2,cores=4,threads=2 -numa node,nodeid=0,cpus=0-1,mem=1024
...
2015-08-12T05:26:45.375579Z qemu-kvm: Length mismatch: pc.ram: 0x43300000 in != 0x40000000: Invalid argument
2015-08-12T05:26:45.375599Z qemu-kvm: error while loading state for instance 0x0 of device 'ram'
2015-08-12T05:26:45.375623Z qemu-kvm: load of migration failed: Invalid argument
2015-08-12 05:26:45.416+0000: shutting down


Actual results:
Migration failed.

Expected results:
Migration succeeds, or if the failure is expected as design, a more clear error can be output when migration failed.

Comment 2 Peter Krempa 2015-09-19 07:36:37 UTC
The libvirt code that recalculates the total memory as a sum of NUMA node sizes was introduced in regards with qemu commit:

commit 2b631ec2557eddfe92f1ef80d7fcaedd5db64e28
Author: Wanlong Gao <gaowanlong.com>
Date:   Wed May 14 17:43:06 2014 +0800

    NUMA: check if the total numa memory size is equal to ram_size
    
    If the total number of the assigned numa nodes memory is not
    equal to the assigned ram size, it will write the wrong data
    to ACPI table, then the guest will ignore the wrong ACPI table
    and recognize all memory to one node. It's buggy, we should
    check it to ensure that we write the right data to ACPI table.

v2.0.0-1548-g2b631ec

This commit makes qemu produce an error message in cases where memory specified in -m is not equal to the sum of NUMA node memory sizes:

 $ qemu-kvm -m 214 -smp 16 -numa node,nodeid=0,cpus=0-7,mem=1214
qemu-kvm: total memory for NUMA nodes (0x4be00000) should equal RAM size (0xd600000)

Libvirt currently disregards the data specified in the <memory> element and recalculates the -m size from the numa nodes. In case of migration this would then pass the qemu check, but the migration stream can't be successfully reloaded since the machine ABI changed.

Unfortunately due to the qemu commit above libvirt can't do anything to allow the migration with the configuration described above, so the only thing we can do at this point is to improve the error message.

Comment 5 Peter Krempa 2015-09-22 14:43:12 UTC
Upstream fixes this issue with:

commit 3b2db51430f647def979960f273637083a2f0c91
Author: Peter Krempa <pkrempa>
Date:   Mon Sep 21 19:10:12 2015 +0200

    test: Add test to validate that memory sizes don't get updated on migration

commit c7d7ba85a6242d789ba3f4dae313e950fbb638c5
Author: Peter Krempa <pkrempa>
Date:   Thu Sep 17 08:14:05 2015 +0200

    qemu: command: Align memory sizes only on fresh starts
    
    When we are starting a qemu process for an incomming migration or
    snapshot reloading we should not modify the memory sizes in the domain
    since we could potentially change the guest ABI that was tediously
    checked before. Additionally the function now updates the initial memory
    size according to the NUMA node size, which should not happen if we are
    restoring state.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1252685

commit 0fed5a7bc79865fe00fd5a328a2e520934c52ff7
Author: Peter Krempa <pkrempa>
Date:   Fri Sep 18 17:24:32 2015 +0200

    conf: Don't always recalculate initial memory size from NUMA size totals
    
    When implementing memory hotplug I've opted to recalculate the initial
    memory size (contents of the <memory> element) as a sum of the sizes of
    NUMA nodes when NUMA was enabled. This was based on an assumption that
    qemu did not allow starting when the NUMA node size total didn't equal
    to the initial memory size. Unfortunately the check was introduced to
    qemu just lately.
    
    This patch uses the new XML parser flag to decide whether it's safe to
    update the memory size total from the NUMA cell sizes or not.
    
    As an additional improvement we now report an error in case when the
    size of hotplug memory would exceed the total memory size.
    
    The rest of the changes assures that the function is called with correct
    flags.

commit 403e86067d5cb3a6fd8583cb5b08121151bd4d9f
Author: Peter Krempa <pkrempa>
Date:   Thu Aug 13 16:39:28 2015 +0200

    conf: Pre-calculate initial memory size instead of always calculating it
    
    Add 'initial_memory' member to struct virDomainMemtune so that the
    memory size can be pre-calculated once instead of inferring it always
    again and again.
    
    Separating of the fields will also allow finer granularity of decisions
    in later patches where it will allow to keep the old initial memory
    value in cases where we are handling incomming migration from older
    versions that did not always update the size from NUMA as the code did
    previously.
    
    The change also requires modification of the qemu memory alignment
    function since at the point where we are modifying the size of NUMA
    nodes the total size needs to be recalculated too.
    
    The refactoring done in this patch also fixes a crash in the hyperv
    driver that did not properly initialize def->numa and thus
    virDomainNumaGetMemorySize(def->numa) crashed.
    
    In summary this patch should have no functional impact at this point.

commit 8059a99025d15b12e62a294b7b6797e4c618eff8
Author: Peter Krempa <pkrempa>
Date:   Wed Sep 16 14:25:42 2015 +0200

    conf: Rename max_balloon to total_memory
    
    The name of the variable was misleading. Rename it and it's setting
    accessor before other fixes.

commit 849b5fc4f609885b9976b633c6efaba0beee2fe3
Author: Peter Krempa <pkrempa>
Date:   Tue Sep 15 16:59:23 2015 +0200

    conf: Split memory related post parse stuff into separate function
    
    The post parse func is growing rather large. Since later patches will
    introduce more logic in the memory post parse code, split it into a
    separate handler.

commit 59173c3dd94fc090d2776be3986a1014ddbf2396
Author: Peter Krempa <pkrempa>
Date:   Tue Sep 15 17:04:55 2015 +0200

    conf: Add XML parser flag that will allow us to do incompatible updates
    
    Add a new parser flag that will mark code paths that parse XML files
    wich will not be used with existing VM state so that post parse
    callbacks can possibly do ABI incompatible changes if needed.

commit 24e3b0eda1373608c1c4f1176530d324418ae82b
Author: Peter Krempa <pkrempa>
Date:   Tue Sep 15 15:23:53 2015 +0200

    conf: Document all VIR_DOMAIN_DEF_PARSE_* flags

commit ed94ad9e40a029d552c005a5d5f214af5fc43558
Author: Peter Krempa <pkrempa>
Date:   Tue Sep 15 14:08:52 2015 +0200

    conf: Drop VIR_DOMAIN_DEF_PARSE_CLOCK_ADJUST flag
    
    The flag was used only for formatting the XML and once the parser and
    formatter flags were split in 0ecd6851093945dd5ddc78266c61b577c65394ae
    it doesn't make sense any more to have it.

commit 3fb0819830cef3b269fbcdea217d7f1de4b62e87
Author: Peter Krempa <pkrempa>
Date:   Fri Jul 31 16:00:20 2015 +0200

    qemu: Make memory alignment helper more universal
    
    Extract the size determination into a separate function and reuse it
    across the memory device alignment functions. Since later we will need
    to decide the alignment size according to architecture let's pass def to
    the functions.

commit 1891cad5420a3a1727177d1c762b23104c9ccc6d
Author: Peter Krempa <pkrempa>
Date:   Mon Sep 14 16:42:46 2015 +0200

    conf: Add helper to determine whether memory hotplug is enabled for a vm
    
    Add a simple helper so that the code doesn't have to rewrite the same
    condition multiple times.

commit 5dd61a168f615bce9e39d17276841cf9ba171ab9
Author: Peter Krempa <pkrempa>
Date:   Wed Sep 16 14:00:02 2015 +0200

    libxl: vz: Use accessor instead of direct access for max_balloon
    
    Commits 45697fe5 and f863ac80 used direct access to the variable instead
    of the preferred accessor method.

Comment 9 Fangge Jin 2015-09-25 09:54:44 UTC
Verify on build libvirt-1.2.17-10.el7.x86_64

1.Prepare a running guest with the following settings on rhel6.7 host:
a)memory size < total_of_numa_memory
  <memory unit='KiB'>1000448</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='1048576'/>
    </numa>
  </cpu>

b)memory size > total_of_numa_memory
  <memory unit='KiB'>1201152</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='1048576'/>
    </numa>
  </cpu>

c)memory size == total_of_numa_memory
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='1048576'/>
    </numa>
  </cpu>

2.Migrate the guest to rhel7.2 host:
a)
# virsh migrate rhel6.6-GUI qemu+ssh://10.66.106.26/system --live --verbose
error: internal error: process exited while connecting to monitor: 2015-09-25T06:00:12.904340Z qemu-kvm: total memory for NUMA nodes (0x40000000) should equal RAM size (0x3d100000)

b)
# virsh migrate rhel6.6-GUI qemu+ssh://10.66.106.26/system --live --verbose
error: internal error: process exited while connecting to monitor: 2015-09-25T09:32:23.744652Z qemu-kvm: total memory for NUMA nodes (0x40000000) should equal RAM size (0x49500000)

c)
# virsh migrate rhel6.6-GUI qemu+ssh://10.66.106.26/system --live --verbose
Migration: [100 %]


As this BZ includes many patches, do I need to more testing?

Comment 10 Peter Krempa 2015-10-05 07:22:28 UTC
(In reply to JinFangge from comment #9)
> Verify on build libvirt-1.2.17-10.el7.x86_64

...

> c)
> # virsh migrate rhel6.6-GUI qemu+ssh://10.66.106.26/system --live --verbose
> Migration: [100 %]
> 
> 
> As this BZ includes many patches, do I need to more testing?

I think that it would also be relevant to test migrations from RHEL-7.0 (with qemu-kvm-rhev) to RHEL-7.2 too, since 7.0 qemu-kvm-rhev didn't enforce the memory size totalling.

Additionally obviously a 7.2 to 7.2 migration to verify that everything still works as it should.

Comment 11 Fangge Jin 2015-10-08 05:56:11 UTC
Comment 9 is tested with qemu-kvm-rhev for migration from 6.7 to 7.2.

1) Retest with qemu-kvm for migration from 6.7 to 7.2, migration succeed:
# virsh migrate test4 qemu+ssh://10.66.4.227/system --live --verbose
Migration: [100 %]

2) Test with qemu-kvm-rhev for migration from 7.0 to 7.2, migration failed:
# virsh migrate test4 qemu+ssh://10.66.4.227/system --live --verbose
root.4.227's password: 
error: internal error: process exited while connecting to monitor: 2015-10-08T05:52:28.063385Z qemu-kvm: total memory for NUMA nodes (0x40000000) should equal RAM size (0x3d100000)

3) Regression test with qemu-kvm-rhev for migration from 7.2 to 7.2(memory == total_of_numa), migration succeed:
# virsh migrate test4 qemu+ssh://10.66.4.227/system --live --verbose
Migration: [100 %]

Comment 12 Fangge Jin 2015-10-08 08:40:39 UTC
In comment9 and comment11, each scenario has got expected results:

qemu-kvm-rhev:  6.7-->7.2  memory!=total_of_numa  "migration failed with clear error message"
qemu-kvm-rhev:  6.7-->7.2  memory==total_of_numa  "migration succeed"
qemu-kvm:       6.7-->7.2  memory!=total_of_numa  "migration succeed"
qemu-kvm-rhev:  7.0-->7.2  memory!=total_of_numa  "migration failed with clear error message"
qemu-kvm-rhev:  7.2-->7.2  memory==total_of_numa  "migration succeed"

So move to verified.

Comment 13 Fangge Jin 2015-10-08 09:33:28 UTC
Update the summary in comment12:

qemu-kvm-rhev:  6.7-->7.2  memory!=total_of_numa  "migration failed with clear error message"                                                  PASS
qemu-kvm-rhev:  6.7-->7.2  memory==total_of_numa  "migration succeed"                                                        PASS
qemu-kvm:       6.7-->7.2  memory!=total_of_numa  "migration succeed and **memory size keeps unchanged after migration**"                PASS
qemu-kvm-rhev:  7.0-->7.2  memory!=total_of_numa  "migration failed with clear error message"                                                  PASS
qemu-kvm-rhev:  7.2-->7.2  memory==total_of_numa  "migration succeed"                                                        PASS


And additional info for this scenario to verify the memory size is not changed after migration:
qemu-kvm:       6.7-->7.2  memory!=total_of_numa

Before migration:
# virsh dumpxml test4|grep -i memory
  <memory unit='KiB'>1000448</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
      <cell cpus='0-1' memory='1048576'/>

After migration, the memory size is not changed:
# virsh dumpxml test4|grep -i memory
  <memory unit='KiB'>1000448</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
And check the qemu command line:
/usr/libexec/qemu-kvm -name test4 -S -machine rhel6.6.0,accel=kvm,usb=off -m **977** -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=1024

Comment 15 errata-xmlrpc 2015-11-19 06:51:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html