Bug 1261812 - [ppc64le] VM startup takes too long when hot-plug memory feature is enabled
[ppc64le] VM startup takes too long when hot-plug memory feature is enabled
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.6.0
ppc64le Unspecified
high Severity urgent
: ovirt-3.6.0-rc
: 3.6.0
Assigned To: Martin Betak
Israel Pinto
:
Depends On: 1262143 1263039
Blocks: 515840 RHEV3.6PPC 1224886 1277183 1277184
  Show dependency treegraph
 
Reported: 2015-09-10 04:37 EDT by Michal Skrivanek
Modified: 2016-04-19 21:26 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
Due to the QEMU known issue (see BZ#1262143), ppc64le virtual machines take longer to start. To work around this issue, the default virtual machine maximum memory setting for ppc64le systems is set to 1TB instead of 4TB on x86_64 systems. This default can be increased, however then it takes a few minutes for a ppc64le virtual machine to start.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-04-19 21:26:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Test_with_512GB (926.09 KB, application/zip)
2015-09-13 04:55 EDT, Israel Pinto
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 46099 master MERGED core: Decrease max memory size for ppc64 to 1TB Never
oVirt gerrit 46213 ovirt-engine-3.6 MERGED core: Decrease max memory size for ppc64 to 1TB Never

  None (edit)
Description Michal Skrivanek 2015-09-10 04:37:05 EDT
Due to a problem in qemu a VM with maximum memory size of more than cca 256GB takes noticeable time. Progressively the time increases with more than O(n2) complexity, at 1TB it's already minutes.

RHEV uses a default 4TB max when hot-plug mem is enabled and then the VM startup takes ages, reflected in RHEV GUI as a VM stuck in "WaitForLaunch" state forever
Comment 1 Michal Skrivanek 2015-09-10 04:39:13 EDT
David, please feel free to open your own qemu-kvm bug for more details
I'd like to track it as part of RHEV just in case we have to create a different code path/config for x86 vs ppc (in case it's going to be a limitation during 3.6 on qemu side)
Comment 2 David Gibson 2015-09-10 20:08:35 EDT
I've created bug 1262143 to track the qemu side of this.

The Regression flag doesn't seem quite right for this bug, since memory hotplug is a new feature.
Comment 3 Michal Skrivanek 2015-09-11 10:53:35 EDT
this is an AutomationBlocker at most, as for manual tests the workaround is a simple decrease of max allowed memory size

either way, due to https://bugzilla.redhat.com/show_bug.cgi?id=1262143#c1 I propose to limit the max size on POWER to 1TB to not affect all the VMs, only in case someone wants/needs a >1TB VM a configuration option should be used to increase the limit (and suffer the startup delay on all VMs then)
Comment 4 Yaniv Kaul 2015-09-13 02:50:25 EDT
I think the definition for AutomationBlocker is an issue that prevents automation from running, not any automation failure. Thus, removing this flag.
Comment 5 Israel Pinto 2015-09-13 04:55:18 EDT
Created attachment 1072834 [details]
Test_with_512GB
Comment 6 Israel Pinto 2015-09-13 04:57:42 EDT
Update testing:
I tested memory hot plug on PPC with VM64BitMaxMemorySizeInMB: 512GB,510GB and 256BG
Setup:
RHEVM 3.6.0.12 :
Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.15.master.el6
VDSM:  vdsm-4.17.6-1.el7ev
Libvirt: libvirt-1.2.17-8.el7
Results:
1. With 512GB and 510GB: The VM failed to run:
"VM golden_env_mixed_virtio_0 is down with error. Exit message: Lost connection with qemu process."
2. With 256GB: test PASSSED


Attached engine, vdsm, qeum logs
Comment 7 Michal Skrivanek 2015-09-14 08:46:02 EDT
David, is there any other limitation regarding RAM size? Seems in comment #5 it's failing to start with 512GB
Comment 8 David Gibson 2015-09-14 21:16:36 EDT
Well, there aren't supposed to be other limitations but there's always the possibility of further bugs.

It looks like the problem you're hitting is the same one reported in bug 1262143 comment 2.  I'm not immediately sure why you and Qunfang both hit this, but I didn't - I'm investigating.

As a temporary workaround for testing you may be able to configure a larger maxmem if you minimise the number of other devices (of any sort) in the guest - the problem appears to be that we're running out of space in the limited buffer for the guest device tree.
Comment 9 David Gibson 2015-09-16 19:48:41 EDT
Michal, regarding the doc text.  We have a fix in the queue that should fix the startup times - not completely, but now minutes of startup time should only start happening around 2T of maxmem.  However there's another problem that means a 1T limit is a good idea: bug 1263039 covers a crash during guest boot with certain guests and maxmem above around 256G (exactly where depends on how many cpus and other devices are in the system).  We have a fix for that, but it just increases a small limited buffer by a certain factor.  1T of maxmem and plenty of devices should be safe with the fix, but 2T of maxmem isn't.

We plan to fix this better, but that will require more upstream work and won't be ready for RHEL 7.2.
Comment 10 Julie 2015-09-20 23:58:16 EDT
Hi Michal,
   I have updated the doc text. Please let me know if anything needs to be changed.

Kind regards,
Julie
Comment 11 Michal Skrivanek 2015-09-21 04:34:38 EDT
how about this?
Comment 12 Julie 2015-09-21 19:52:47 EDT
(In reply to Michal Skrivanek from comment #11)
> how about this?

Thanks! Looks good.
Comment 13 Israel Pinto 2015-10-12 10:30:36 EDT
Verify with rhevm on ppc env:
RHEVM Version: 3.6.0-0.18.el6
vdsm version: vdsm-4.17.8-1.el7ev
libvirt version: libvirt-1.2.17-12.el7
Scenario:
1. Create VM with 1G
2. hot plug memory 1G/2G/256M
3. Check in VM memory status with free
4. Migrate VM

All cases pass.
Comment 14 Sven Kieske 2015-10-14 05:36:51 EDT
(In reply to Israel Pinto from comment #13)
> Verify with rhevm on ppc env:
> RHEVM Version: 3.6.0-0.18.el6
> vdsm version: vdsm-4.17.8-1.el7ev
> libvirt version: libvirt-1.2.17-12.el7
> Scenario:
> 1. Create VM with 1G
> 2. hot plug memory 1G/2G/256M
> 3. Check in VM memory status with free
> 4. Migrate VM
> 
> All cases pass.

Either your wording is unclear or this test scenario does not handle this bugfix at all.

you should measure startup time for vms with huge amounts of (hot pluggable)ram, not check the vm memory status inside the vm (has also nothing to do with vm migration).

But maybe I misread your test case?
Comment 15 Israel Pinto 2015-10-14 09:23:43 EDT
There problem is not with memory size, however with the size on the engine: VM64BitMaxMemorySizeInMB.
If the size is 4T it was impossible to start VM, we  found that if you use less
memory in (VM64BitMaxMemorySizeInMB) the VM is up with no problem.
I add migration also to check that the memory stay the same on different host.

Note You need to log in before you can comment on or make changes to this bug.