Summary: | VFIO: VM with attached GPU is powered off when trying to hotplug increase memory of VM. | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Nisim Simsolo <nsimsolo> | ||||||||||||||||
Component: | BLL.Virt | Assignee: | Martin Polednik <mpoledni> | ||||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nisim Simsolo <nsimsolo> | ||||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||||
Priority: | high | ||||||||||||||||||
Version: | 3.6.0.1 | CC: | alex.williamson, bugs, gklein, hannsj_uhl, jherrman, laine, mavital, mgoldboi, michal.skrivanek, mpoledni, nsimsolo | ||||||||||||||||
Target Milestone: | ovirt-3.6.2 | Flags: | rule-engine:
ovirt-3.6.z+
rule-engine: blocker+ mgoldboi: planning_ack+ michal.skrivanek: devel_ack+ mavital: testing_ack+ |
||||||||||||||||
Target Release: | 3.6.2 | ||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||
Doc Text: |
When using Virtual Function I/O (VFIO) passthrough devices, the memory lock limit failed to be modified during a memory hot-plug operation. As a consequence, the guest virtual machine terminated unexpectedly. Now, the memory lock limit modification is performed before the memory hot-plug, and the described crash no longer occurs.
|
Story Points: | --- | ||||||||||||||||
Clone Of: | |||||||||||||||||||
: | 1273491 (view as bug list) | Environment: | |||||||||||||||||
Last Closed: | 2016-02-18 11:06:54 UTC | Type: | Bug | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Bug Depends On: | 1273491, 1284775, 1305498 | ||||||||||||||||||
Bug Blocks: | |||||||||||||||||||
Attachments: |
|
Description
Nisim Simsolo
2015-10-18 11:44:34 UTC
Created attachment 1084120 [details]
engine.log
Created attachment 1084132 [details]
vdsm.log.1.xz
Created attachment 1084135 [details]
vdsm.log
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. There doesn't seem to be any pointers to what might have happened in vdsm log (apart from hotplug being successful), can you also provide us with qemu log? Can you verify that the VM is dead by executing 'virsh -r list' and looking up the VM's name and status on hypervisor? - 'Virsh -r list' shows that VM is not running after issue reproduced. - Observing qemu log shows that there is an issue with memory allocation: 2015-10-19T14:48:43.304948Z vfio_dma_map(0x7f7c0c22b7a0, 0x140000000, 0x40000000, 0x7f7a87800000) = -12 (Cannot allocate memory) qemu: hardware error: vfio: DMA mapping failed, unable to continue - qemu log and libvirt XML attached. Created attachment 1084442 [details]
qemu log
Created attachment 1084443 [details]
libvirt.xml of win2012_intel VM
Alex, any idea what could be the cause? This happened both with maxMemory 4 TiB (our default) and 256 GiB. The VM works fine (with drivers blacklisted correctly as far as we know) until the hotplug is triggered. Please report dmesg for the host after this occurs attaching host dmesg before the issue and after the issue reproduced. Created attachment 1084695 [details]
dmesg before memory hotplug
Created attachment 1084696 [details]
dmesg after memory hotplug
The process locked memory rlimit is set to 5G, which I believe is what libvirt uses for a 4G VM, the initial memory size of the VM. Therefore, the qemu-kvm process is not going to be a be able to lock more pages unless someone bumps the limit further. The evidence is in dmesg: [ 599.043115] vfio_pin_pages: RLIMIT_MEMLOCK (5368709120) exceeded [ 599.043119] vfio_pin_pages: RLIMIT_MEMLOCK (5368709120) exceeded This results in the -ENOMEM failure in vfio_dma_map. On the vfio side, there is no reason this would be GPU specific, it should happen for any *assigned* device (I emphasize assigned because RHEV treats things like USB passthrough the same as PCI device assignment). Has anyone every tested whether libvirt increases the process locked memory limits when there's an assigned device and memory is hot-added? (In reply to Alex Williamson from comment #14) > Has anyone every tested > whether libvirt increases the process locked memory limits when there's an > assigned device and memory is hot-added? I haven't tested it, but I can see from examining the code that the only place libvirt sets the max locked memory limit is when an assigned PCI device is hotplugged. I think this bug should be re-assigned to libvirt, but I'm not sure which product's libvirt component to assign it to. In oVirt testing is done on single release by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone. Pending on resolution of 1273491. Expecting that post 7.2 GA hence we want to bump up libvirt version in vdsm spec Workaround for the time being is disabling memory hotplug in engine-config. moving to ON_QA based on https://bugzilla.redhat.com/show_bug.cgi?id=1280420#c9 Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA. Verified: rhevm-3.6.1.2-0.1 libvirt-client-1.2.17-13.el7_2.2.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64 vdsm-4.17.13-1.el7ev.noarch Scenario: (verified using rhel and windows 8 VM on AMD and Intel based hosts) 1. Run VM 2. Hotplug increase memory 3. Verify VM continues to run and memory increased properly. |