Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1261812

Summary:

[ppc64le] VM startup takes too long when hot-plug memory feature is enabled

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Michal Skrivanek <michal.skrivanek>

Component:

ovirt-engine

Assignee:

Martin Betak <mbetak>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Israel Pinto <ipinto>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

3.6.0

CC:

dgibson, gklein, hannsj_uhl, ipinto, juwu, lbopf, lsurette, mavital, michal.skrivanek, rbalakri, Rhev-m-bugs, s.kieske, srevivo, ykaul

Target Milestone:

ovirt-3.6.0-rc

Target Release:

3.6.0

Hardware:

ppc64le

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Known Issue

Doc Text:

Due to the QEMU known issue (see BZ#1262143), ppc64le virtual machines take longer to start. To work around this issue, the default virtual machine maximum memory setting for ppc64le systems is set to 1TB instead of 4TB on x86_64 systems. This default can be increased, however then it takes a few minutes for a ppc64le virtual machine to start.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-04-20 01:26:59 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Virt

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1262143, 1263039

Bug Blocks:

515840, 1201513, 1224886, 1277183, 1277184

Attachments:

Description	Flags
Test_with_512GB	none

Description Michal Skrivanek 2015-09-10 08:37:05 UTC

Due to a problem in qemu a VM with maximum memory size of more than cca 256GB takes noticeable time. Progressively the time increases with more than O(n2) complexity, at 1TB it's already minutes.

RHEV uses a default 4TB max when hot-plug mem is enabled and then the VM startup takes ages, reflected in RHEV GUI as a VM stuck in "WaitForLaunch" state forever

Comment 1 Michal Skrivanek 2015-09-10 08:39:13 UTC

David, please feel free to open your own qemu-kvm bug for more details
I'd like to track it as part of RHEV just in case we have to create a different code path/config for x86 vs ppc (in case it's going to be a limitation during 3.6 on qemu side)

Comment 2 David Gibson 2015-09-11 00:08:35 UTC

I've created bug 1262143 to track the qemu side of this.

The Regression flag doesn't seem quite right for this bug, since memory hotplug is a new feature.

Comment 3 Michal Skrivanek 2015-09-11 14:53:35 UTC

this is an AutomationBlocker at most, as for manual tests the workaround is a simple decrease of max allowed memory size

either way, due to https://bugzilla.redhat.com/show_bug.cgi?id=1262143#c1 I propose to limit the max size on POWER to 1TB to not affect all the VMs, only in case someone wants/needs a >1TB VM a configuration option should be used to increase the limit (and suffer the startup delay on all VMs then)

Comment 4 Yaniv Kaul 2015-09-13 06:50:25 UTC

I think the definition for AutomationBlocker is an issue that prevents automation from running, not any automation failure. Thus, removing this flag.

Comment 5 Israel Pinto 2015-09-13 08:55:18 UTC

Created attachment 1072834 [details]
Test_with_512GB

Comment 6 Israel Pinto 2015-09-13 08:57:42 UTC

Update testing:
I tested memory hot plug on PPC with VM64BitMaxMemorySizeInMB: 512GB,510GB and 256BG
Setup:
RHEVM 3.6.0.12 :
Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.15.master.el6
VDSM:  vdsm-4.17.6-1.el7ev
Libvirt: libvirt-1.2.17-8.el7
Results:
1. With 512GB and 510GB: The VM failed to run:
"VM golden_env_mixed_virtio_0 is down with error. Exit message: Lost connection with qemu process."
2. With 256GB: test PASSSED


Attached engine, vdsm, qeum logs

Comment 7 Michal Skrivanek 2015-09-14 12:46:02 UTC

David, is there any other limitation regarding RAM size? Seems in comment #5 it's failing to start with 512GB

Comment 8 David Gibson 2015-09-15 01:16:36 UTC

Well, there aren't supposed to be other limitations but there's always the possibility of further bugs.

It looks like the problem you're hitting is the same one reported in bug 1262143 comment 2.  I'm not immediately sure why you and Qunfang both hit this, but I didn't - I'm investigating.

As a temporary workaround for testing you may be able to configure a larger maxmem if you minimise the number of other devices (of any sort) in the guest - the problem appears to be that we're running out of space in the limited buffer for the guest device tree.

Comment 9 David Gibson 2015-09-16 23:48:41 UTC

Michal, regarding the doc text.  We have a fix in the queue that should fix the startup times - not completely, but now minutes of startup time should only start happening around 2T of maxmem.  However there's another problem that means a 1T limit is a good idea: bug 1263039 covers a crash during guest boot with certain guests and maxmem above around 256G (exactly where depends on how many cpus and other devices are in the system).  We have a fix for that, but it just increases a small limited buffer by a certain factor.  1T of maxmem and plenty of devices should be safe with the fix, but 2T of maxmem isn't.

We plan to fix this better, but that will require more upstream work and won't be ready for RHEL 7.2.

Comment 10 Julie 2015-09-21 03:58:16 UTC

Hi Michal,
   I have updated the doc text. Please let me know if anything needs to be changed.

Kind regards,
Julie

Comment 11 Michal Skrivanek 2015-09-21 08:34:38 UTC

how about this?

Comment 12 Julie 2015-09-21 23:52:47 UTC

(In reply to Michal Skrivanek from comment #11)
> how about this?

Thanks! Looks good.

Comment 13 Israel Pinto 2015-10-12 14:30:36 UTC

Verify with rhevm on ppc env:
RHEVM Version: 3.6.0-0.18.el6
vdsm version: vdsm-4.17.8-1.el7ev
libvirt version: libvirt-1.2.17-12.el7
Scenario:
1. Create VM with 1G
2. hot plug memory 1G/2G/256M
3. Check in VM memory status with free
4. Migrate VM

All cases pass.

Comment 14 Sven Kieske 2015-10-14 09:36:51 UTC

(In reply to Israel Pinto from comment #13)
> Verify with rhevm on ppc env:
> RHEVM Version: 3.6.0-0.18.el6
> vdsm version: vdsm-4.17.8-1.el7ev
> libvirt version: libvirt-1.2.17-12.el7
> Scenario:
> 1. Create VM with 1G
> 2. hot plug memory 1G/2G/256M
> 3. Check in VM memory status with free
> 4. Migrate VM
> 
> All cases pass.

Either your wording is unclear or this test scenario does not handle this bugfix at all.

you should measure startup time for vms with huge amounts of (hot pluggable)ram, not check the vm memory status inside the vm (has also nothing to do with vm migration).

But maybe I misread your test case?

Comment 15 Israel Pinto 2015-10-14 13:23:43 UTC

There problem is not with memory size, however with the size on the engine: VM64BitMaxMemorySizeInMB.
If the size is 4T it was impossible to start VM, we  found that if you use less
memory in (VM64BitMaxMemorySizeInMB) the VM is up with no problem.
I add migration also to check that the memory stay the same on different host.