Bug 1249006

Summary: Guest fail to start while the guest's memory isn't aligned to 256 MB
Product: Red Hat Enterprise Linux 7 Reporter: zhenfeng wang <zhwang>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.2CC: abologna, dgibson, dyuan, dzheng, gsun, hannsj_uhl, huding, juzhang, knoel, mzhan, pkrempa, rbalakri, tlavigne, virt-maint, xfu
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.2.17-10.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 06:49:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1201513, 1277183, 1277184    

Description zhenfeng wang 2015-07-31 10:10:44 UTC
Description of problem:
Guest fail to start while the guest's memory isn't aligned to 256 MB

Pkg info
libvirt-1.2.17-3.el7.ppc64le
qemu-kvm-rhev-2.3.0-13.el7.ppc64le
kernel-3.10.0-300.el7.ppc64le


How reproducible:
100%

Steps to Reproduce:
1.Prepare a guest with the following configuration
#virsh dumpxml virt-tests-vm1
--
   <memory unit='KiB'>2048000</memory>
  <currentMemory unit='KiB'>2048000</currentMemory>

--

2.Start guest, will fail to start guest with the following error

# virsh start virt-tests-vm1
error: Failed to start domain virt-tests-vm1
error: internal error: process exited while connecting to monitor: 2015-07-31T09:17:47.860870Z qemu-kvm: Can't support memory configuration where RAM size 0x7d000000 or maxmem size 0x7d000000 isn't aligned to 256 MB

#cat /var/log/libvirt/qemu/virt-tests-vm1.log
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name virt-tests-vm1 -S -machine pseries-rhel7.2.0,accel=kvm,usb=off -m 2000
--
2015-07-31T09:17:47.860870Z qemu-kvm: Can't support memory configuration where RAM size 0x7d000000 or maxmem size 0x7d000000 isn't aligned to 256 MB
2015-07-31 09:17:48.253+0000: shutting down


3.Guest could start successfully with qemu-kvm-rhev-2.3.0-12.el7.ppc64le.rpm
 
Actual results:
Guest fail to start while the guest's memory isn't aligned to 256 MB

Expected results:
should start successfully

Comment 3 Peter Krempa 2015-08-11 13:51:41 UTC
So I validated that a migration stream fails to restore on destination once you change the memory size, so with the patch you are no longer able to migrate from previous versions including save/restore and other stuff. The memory size alignment should not be enforced unless you request memory hotplug.

Comment 5 David Gibson 2015-08-12 01:51:32 UTC
I think the better option is to fix this restriction inside qemu, rather than working around it in libvirt.

I have some ideas on how to do this.

Comment 6 David Gibson 2015-08-12 03:43:00 UTC
On further investigation, I take comment 5 back.  Removing this restriction in qemu is moderately tricky

I think aligning up in libvirt is the correct approach - although discoverability of that alignment requirement is kind of nasty.

Migration isn't an issue, because RHEL7.2 will be the first supported KVM on Power.

Comment 7 David Gibson 2015-08-12 05:18:27 UTC
On yet further investigation we might be able to at least partially fix this in qemu without too much trouble.

Specifically we can remove the alignment restriction for base memory, however any hotplugged memory will still need to be in 256MB chunks - effectively (max_ram_size - ram_size) must be a multiple of 256MB.

Will that be enough to avoid the problems in libvirt?

Comment 8 Peter Krempa 2015-08-12 05:57:41 UTC
(In reply to David Gibson from comment #6)

...
 
> Migration isn't an issue, because RHEL7.2 will be the first supported KVM on
> Power.

While this is true on downstream I wouldn't be able to justify this upstream.

(In reply to David Gibson from comment #7)
> On yet further investigation we might be able to at least partially fix this
> in qemu without too much trouble.
> 
> Specifically we can remove the alignment restriction for base memory,
> however any hotplugged memory will still need to be in 256MB chunks -
> effectively (max_ram_size - ram_size) must be a multiple of 256MB.

Aligning the size of the added memory chunks or even the difference between max_ram and base memory size (if that's a hard requirement, not just the implication of aligning the added memory) is fine in libvirt.

> 
> Will that be enough to avoid the problems in libvirt?

Definitely yes. It will allow us to keep the legacy configurations working and will also be justifiable upstream since memory hotplug is a new feature.

Thanks!

Comment 9 David Gibson 2015-08-12 06:45:25 UTC
Sorry, I realised moments after I posted that that the migration would still be an issue for upstream.  Although, note that the alignment is only enforced when memory hotplug is enabled, which is only true for the most recent machine type unless explicitly overridden. (Note that "enabled" in this sense can be true even if maxram == ram).

Nonetheless looks like we have a better fix here - Bharata is working on an upstream patch to remove the alignment restriction for non hotplugged memory.  I hope to merge that into my spapr-next feeder tree ASAP, and we should be able to pull downstream from there.  It will probably take a bit longer to get into upstream maintainer, because agraf isn't usually very prompt about pulling things.

Should I make a new BZ for the qemu fix, or just move this BZ over to qemu?

Comment 10 Karen Noel 2015-08-29 02:18:13 UTC
David, What is the upstream status for this fix? Thanks.

Comment 11 David Gibson 2015-08-31 04:25:56 UTC
Hi Karen,

I discussed this further with Peter Krempa and Andrea during KVM Forum.

We decided that in fact fixing this in libvirt is the correct approach after all.

The issue with migration can be addressed because the alignment is only enforced when memory hotplug (or "LMB Dynamic Reconfiguration" in IBM terminology) is enabled (either via the machine type or explicitly with machine options).

So libvirt needs to make the rounding up of memory sizes conditional on dr_lmb being enabled.

Comment 12 Peter Krempa 2015-09-19 08:32:50 UTC
From additional discussions it looks like that the "dr_lmb" is enabled only on the basis of the selected machine type and can't be controlled by any other way. That means libvirt can't base the decision how to align the memory on the fact that memory hotplug was enabled, but we'll need to hack around this issue by machine type checks which is rather unfortunate, since we'll need to hardcode the support for this into libvirt. 

Said this, it won't be impossible to do that, but machine type based decisions tend to end up rather fragile when something changes in the future which is beyond libvirts control.

Comment 13 David Gibson 2015-09-21 07:45:41 UTC
Peter,

Ok, so the question at this point is which is less bad: 1) Do the icky machine type check in libvirt or 2) Rush in a last minute qemu fix to add a machine option which explicitly enables/disable DR memory.

From an upstream POV, (2) should be fine - memory hotplug has only just been pushed to mainline, and wasn't in qemu-2.4.  Downstream it's trickier, but should be possible since we already have blocker+ on this bug.

The qemu side fix (well, fix to assist the libvirt fix) should be pretty straightforward.  If you think that's the way to go, I'm happy to implement that tomorrow.

Comment 14 Peter Krempa 2015-09-22 14:36:45 UTC
Upstream fixes this with:

commit bd874b6c422283ff9c07ee28b042b424e85a2398
Author: Peter Krempa <pkrempa>
Date:   Mon Sep 21 18:10:55 2015 +0200

    qemu: ppc64: Align memory sizes to 256MiB blocks
    
    For some machine types ppc64 machines now require that memory sizes are
    aligned to 256MiB increments (due to the dynamically reconfigurable
    memory). As now we treat existing configs reasonably in regards to
    migration, we can round all the sizes unconditionally. The only drawback
    will be that the memory size of a VM can potentially increase by
    (256MiB - 1byte) * number_of_NUMA_nodes.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1249006

But the commit requires the migration size fixes that were done for

https://bugzilla.redhat.com/show_bug.cgi?id=1252685

Comment 17 Peter Krempa 2015-09-24 12:07:46 UTC
As bugzilla was down when I've tried to reply I'm attaching the e-mail reply I've sent to David:

(In reply to David Gibson from comment #13)
>Peter,
>
>Ok, so the question at this point is which is less bad: 1) Do the icky machine
>type check in libvirt or 2) Rush in a last minute qemu fix to add a machine
>option which explicitly enables/disable DR memory.
>
 
Actually I probably also have a third option now:

I'm preparing a patchset for other issue related to memory alignment and
this patchset fixes various aspects of memory alignment, especially one
where we'd re-allign the memory even if the guest was alive (migration)
which we shouldn't do actually.

This patchset will allow us to force the 256MiB alignment without
breaking migration or existing guests from libvirts view (it will be
broken only if qemu will not allow such config).

The only downside of this is that switching to new libvirt will possibly
increase memory sizes of VMs that will be started fresh from the point
we add the alignment.

>From an upstream POV, (2) should be fine - memory hotplug has only just been
>pushed to mainline, and wasn't in qemu-2.4.  Downstream it's trickier, but
>should be possible since we already have blocker+ on this bug.
>
>The qemu side fix (well, fix to assist the libvirt fix) should be pretty
>straightforward.  If you think that's the way to go, I'm happy to implement
>that tomorrow.
 
Doing option 2 would have a few tricks to it:
1) We'd need a way to introspect the possibility to pass the new flag to
disable DR mem.
2) The new flag to disable DR memory would need to work with older
machine types too (otherwise we wouldn't avoid the
3) (I forgot this one, but I'm certain I thought of 3 points ...)

Comment 18 David Gibson 2015-09-25 01:48:01 UTC
Peter,

Does that mean we can consider the problem basically solved now?

Should I still work on adding a machine option upstream?

Regarding your detailed points:

(1) I believe machine options are introspectable - they appear with "qemu-kvm -machine pseries,?" at least.

(2) Yes, my intention would be that the option would work with all machine type revisions

(3) um...

Comment 19 Dan Zheng 2015-09-29 10:00:22 UTC
Test with below packages:
libvirt-1.2.17-11.el7.ppc64le
qemu-kvm-rhev-2.3.0-26.el7.ppc64le
kernel-3.10.0-319.el7.ppc64le



Below tests are done:
Test 1. Test maxMemory and memory without aligned to 256M

Configure with below setting:
 <maxMemory slots='16' unit='KiB'>10000000</maxMemory>
 <memory unit='KiB'>1000000</memory>
<cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='500000' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='500000' unit='KiB'/>
    </numa>
  </cpu>
Guest can start successfully.
# virsh dumpxml guest
...
  <maxMemory slots='16' unit='KiB'>10223616</maxMemory>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>

  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='524288' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='524288' unit='KiB'/>
    </numa>
  </cpu>

Check qemu command line: 
qemu     142155 1   ... -cpu host -m size=1048576k,slots=16,maxmem=10223616k 

Check within the guest.
# cat /proc/meminfo
MemTotal:         979520 kB

Test 2: Migration 

On source, dumpxml of the running guest is same as dumpxml in test 1.

# virsh migrate guest --live --copy-storage-all --unsafe qemu+ssh://10.19.112.39/system

On target, guest is running. Dumpxml is as below:
...
  <maxMemory slots='16' unit='KiB'>10223616</maxMemory>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='524288' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='524288' unit='KiB'/>
    </numa>
  </cpu>

Check within the guest:
# cat /proc/meminfo
MemTotal:         979520 kB


Test 3: Memory hotplug without aligned to 256M
memoryplug.xml:
<memory model="dimm">
  <target>
    <size unit="KiB">500000</size>
    <node>0</node>
  </target>
  <source>
    <pagesize unit="KiB">64</pagesize>
    <nodemask>0</nodemask>
  </source>
</memory>

# virsh attach-device guest memoryplug.xml
Device attached successfully
# virsh dumpxml guest

  <maxMemory slots='16' unit='KiB'>10223616</maxMemory>
  <memory unit='KiB'>1572864</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>  ====> 1048576 ???
 <devices>
    <memory model='dimm'>
      <source>
        <nodemask>0</nodemask>
        <pagesize unit='KiB'>64</pagesize>
      </source>
      <target>
        <size unit='KiB'>524288</size>
        <node>0</node>
      </target>
      <alias name='dimm0'/>
      <address type='dimm' slot='0' base='0x40000000'/>
    </memory>
 </devices>

# cat /proc/meminfo
MemTotal:        1503808 kB (1503808-979520=524288)

Comment 20 Dan Zheng 2015-09-30 07:27:58 UTC
It is appreciated of your comments on more test scenarios.

Comment 21 Peter Krempa 2015-10-05 07:13:37 UTC
(In reply to David Gibson from comment #18)
> Peter,
> 
> Does that mean we can consider the problem basically solved now?

Erm, I hope so :)

> 
> Should I still work on adding a machine option upstream?

I don't think it will be necessary, well unless somebody decides that rounding the memory doesn't suit them for some reason. Otherwise we'll just always do it from now on.

Comment 22 Peter Krempa 2015-10-05 07:18:21 UTC
(In reply to Dan Zheng from comment #19)
> 

...

> # virsh attach-device guest memoryplug.xml
> Device attached successfully
> # virsh dumpxml guest
> 
>   <maxMemory slots='16' unit='KiB'>10223616</maxMemory>
>   <memory unit='KiB'>1572864</memory>
>   <currentMemory unit='KiB'>1000000</currentMemory>  ====> 1048576 ???

currentMemory is handled via the balloon driver that does not have the alignment issue.

>  <devices>
>     <memory model='dimm'>
>       <source>
>         <nodemask>0</nodemask>
>         <pagesize unit='KiB'>64</pagesize>
>       </source>
>       <target>
>         <size unit='KiB'>524288</size>
>         <node>0</node>
>       </target>
>       <alias name='dimm0'/>
>       <address type='dimm' slot='0' base='0x40000000'/>
>     </memory>
>  </devices>
> 
> # cat /proc/meminfo
> MemTotal:        1503808 kB (1503808-979520=524288)

(In reply to Dan Zheng from comment #20)
> It is appreciated of your comments on more test scenarios.

I currently can't think of anything else to test in this regard.

Comment 23 Dan Zheng 2015-10-08 06:44:44 UTC
Based on comment 22 and comment 19, I mark it as verified.

Comment 25 errata-xmlrpc 2015-11-19 06:49:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html