Bug 1261812
Summary: | [ppc64le] VM startup takes too long when hot-plug memory feature is enabled | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Michal Skrivanek <michal.skrivanek> | ||||
Component: | ovirt-engine | Assignee: | Martin Betak <mbetak> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Israel Pinto <ipinto> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.6.0 | CC: | dgibson, gklein, hannsj_uhl, ipinto, juwu, lbopf, lsurette, mavital, michal.skrivanek, rbalakri, Rhev-m-bugs, s.kieske, srevivo, ykaul | ||||
Target Milestone: | ovirt-3.6.0-rc | ||||||
Target Release: | 3.6.0 | ||||||
Hardware: | ppc64le | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Known Issue | |||||
Doc Text: |
Due to the QEMU known issue (see BZ#1262143), ppc64le virtual machines take longer to start. To work around this issue, the default virtual machine maximum memory setting for ppc64le systems is set to 1TB instead of 4TB on x86_64 systems. This default can be increased, however then it takes a few minutes for a ppc64le virtual machine to start.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-04-20 01:26:59 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1262143, 1263039 | ||||||
Bug Blocks: | 515840, 1201513, 1224886, 1277183, 1277184 | ||||||
Attachments: |
|
Description
Michal Skrivanek
2015-09-10 08:37:05 UTC
David, please feel free to open your own qemu-kvm bug for more details I'd like to track it as part of RHEV just in case we have to create a different code path/config for x86 vs ppc (in case it's going to be a limitation during 3.6 on qemu side) I've created bug 1262143 to track the qemu side of this. The Regression flag doesn't seem quite right for this bug, since memory hotplug is a new feature. this is an AutomationBlocker at most, as for manual tests the workaround is a simple decrease of max allowed memory size either way, due to https://bugzilla.redhat.com/show_bug.cgi?id=1262143#c1 I propose to limit the max size on POWER to 1TB to not affect all the VMs, only in case someone wants/needs a >1TB VM a configuration option should be used to increase the limit (and suffer the startup delay on all VMs then) I think the definition for AutomationBlocker is an issue that prevents automation from running, not any automation failure. Thus, removing this flag. Created attachment 1072834 [details]
Test_with_512GB
Update testing: I tested memory hot plug on PPC with VM64BitMaxMemorySizeInMB: 512GB,510GB and 256BG Setup: RHEVM 3.6.0.12 : Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.15.master.el6 VDSM: vdsm-4.17.6-1.el7ev Libvirt: libvirt-1.2.17-8.el7 Results: 1. With 512GB and 510GB: The VM failed to run: "VM golden_env_mixed_virtio_0 is down with error. Exit message: Lost connection with qemu process." 2. With 256GB: test PASSSED Attached engine, vdsm, qeum logs David, is there any other limitation regarding RAM size? Seems in comment #5 it's failing to start with 512GB Well, there aren't supposed to be other limitations but there's always the possibility of further bugs. It looks like the problem you're hitting is the same one reported in bug 1262143 comment 2. I'm not immediately sure why you and Qunfang both hit this, but I didn't - I'm investigating. As a temporary workaround for testing you may be able to configure a larger maxmem if you minimise the number of other devices (of any sort) in the guest - the problem appears to be that we're running out of space in the limited buffer for the guest device tree. Michal, regarding the doc text. We have a fix in the queue that should fix the startup times - not completely, but now minutes of startup time should only start happening around 2T of maxmem. However there's another problem that means a 1T limit is a good idea: bug 1263039 covers a crash during guest boot with certain guests and maxmem above around 256G (exactly where depends on how many cpus and other devices are in the system). We have a fix for that, but it just increases a small limited buffer by a certain factor. 1T of maxmem and plenty of devices should be safe with the fix, but 2T of maxmem isn't. We plan to fix this better, but that will require more upstream work and won't be ready for RHEL 7.2. Hi Michal, I have updated the doc text. Please let me know if anything needs to be changed. Kind regards, Julie how about this? (In reply to Michal Skrivanek from comment #11) > how about this? Thanks! Looks good. Verify with rhevm on ppc env: RHEVM Version: 3.6.0-0.18.el6 vdsm version: vdsm-4.17.8-1.el7ev libvirt version: libvirt-1.2.17-12.el7 Scenario: 1. Create VM with 1G 2. hot plug memory 1G/2G/256M 3. Check in VM memory status with free 4. Migrate VM All cases pass. (In reply to Israel Pinto from comment #13) > Verify with rhevm on ppc env: > RHEVM Version: 3.6.0-0.18.el6 > vdsm version: vdsm-4.17.8-1.el7ev > libvirt version: libvirt-1.2.17-12.el7 > Scenario: > 1. Create VM with 1G > 2. hot plug memory 1G/2G/256M > 3. Check in VM memory status with free > 4. Migrate VM > > All cases pass. Either your wording is unclear or this test scenario does not handle this bugfix at all. you should measure startup time for vms with huge amounts of (hot pluggable)ram, not check the vm memory status inside the vm (has also nothing to do with vm migration). But maybe I misread your test case? There problem is not with memory size, however with the size on the engine: VM64BitMaxMemorySizeInMB. If the size is 4T it was impossible to start VM, we found that if you use less memory in (VM64BitMaxMemorySizeInMB) the VM is up with no problem. I add migration also to check that the memory stay the same on different host. |