Bug 1273491 - VM with attached VFIO device is powered off when trying to hotplug increase memory of VM.
VM with attached VFIO device is powered off when trying to hotplug increase m...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.2
All Linux
high Severity urgent
: rc
: 7.3
Assigned To: Peter Krempa
Nisim Simsolo
virt
: ZStream
Depends On:
Blocks: 1272742 1277186 RHEV3.6PPC_PCI_Passthrough 1280420 RHEV4.0PPC
  Show dependency treegraph
 
Reported: 2015-10-20 10:38 EDT by Andrea Bolognani
Modified: 2016-11-03 14:27 EDT (History)
22 users (show)

See Also:
Fixed In Version: libvirt-1.3.1-1.el7
Doc Type: Bug Fix
Doc Text:
When using Virtual Function I/O (VFIO) passthrough devices, the memory lock limit failed to be modified during a memory hot-plug operation. As a consequence, the guest virtual machine terminated unexpectedly. Now, the memory lock limit modification is performed before the memory hot-plug, and the described crash no longer occurs.
Story Points: ---
Clone Of: 1272742
: 1280420 (view as bug list)
Environment:
Last Closed: 2016-11-03 14:27:33 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Andrea Bolognani 2015-10-20 10:38:04 EDT
+++ This bug was initially created as a clone of Bug #1272742 +++

Description of problem:
*Notes: Next bug is relevant only for GPU passthrough (does not occur with other host devices attached to VM).
- Bug is relevant only for memory hotplug (does not occur when hotplug increasing VM CPUs amount).
- Bug occurs on linux/windows VMs.

When trying to increase running VM memory, VM is powered off instead of increase memory using hotplug mechanism.

Version-Release number of selected component (if applicable):
rhevm-3.6.0.1-0.1.el6
sanlock-3.2.4-1.el7.x86_64
vdsm-4.17.9-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7.x86_64
libvirt-client-1.2.17-5.el7.x86_64

How reproducible:
Consistently

Steps to Reproduce:
1. Run VM with GPU attached.
2. From virtual machines tab > Edit VM > system > increase memory size.
3. click OK (without checking "apply later" checkbox).

Actual results:
VM powered off (Connection reset by peer)

Expected results:
VM memory should be hotplug increased without powering off VM.

Additional info:
- vmId (win7_intel): cfcdab4-415e-40d7-a19c-bdbc80a9172d'
- Issue occurred at: 2015-10-18 13:54:06,596 ERROR [org.ovirt.engine.core.vdsbroker.SetAmountOfMemoryVDSCommand] (ajp-/127.0.0.1:8702-11) [515aecb4] Failed in 'SetAmountOfMemoryVDS' method
2015-10-18 13:54:06,605 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-11) [515aecb4] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message:
VDSM intel-vfio.tlv.redhat.com command failed: Unable to read from monitor: 
Connection reset by peer

engine.log and vdsm.log attached.

--- Additional comment from Martin Polednik on 2015-10-19 08:19:00 EDT ---

There doesn't seem to be any pointers to what might have happened in vdsm log (apart from hotplug being successful), can you also provide us with qemu log? Can you verify that the VM is dead by executing 'virsh -r list' and looking up the VM's name and status on hypervisor?

--- Additional comment from Nisim Simsolo on 2015-10-19 10:52:48 EDT ---

- 'Virsh -r list' shows that VM is not running after issue reproduced.
- Observing qemu log shows that there is an issue with memory allocation: 
2015-10-19T14:48:43.304948Z vfio_dma_map(0x7f7c0c22b7a0, 0x140000000, 0x40000000, 0x7f7a87800000) = -12 (Cannot allocate memory)
qemu: hardware error: vfio: DMA mapping failed, unable to continue
- qemu log and libvirt XML attached.

--- Additional comment from Martin Polednik on 2015-10-19 11:01:07 EDT ---

Alex, any idea what could be the cause? This happened both with maxMemory 4 TiB (our default) and 256 GiB. The VM works fine (with drivers blacklisted correctly as far as we know) until the hotplug is triggered.

--- Additional comment from Alex Williamson on 2015-10-19 19:38:37 EDT ---

Please report dmesg for the host after this occurs

--- Additional comment from Nisim Simsolo on 2015-10-20 07:38:24 EDT ---

attaching host dmesg before the issue and after the issue reproduced.

--- Additional comment from Alex Williamson on 2015-10-20 08:59:16 EDT ---

The process locked memory rlimit is set to 5G, which I believe is what libvirt uses for a 4G VM, the initial memory size of the VM.  Therefore, the qemu-kvm process is not going to be a be able to lock more pages unless someone bumps the limit further.  The evidence is in dmesg:

[  599.043115] vfio_pin_pages: RLIMIT_MEMLOCK (5368709120) exceeded
[  599.043119] vfio_pin_pages: RLIMIT_MEMLOCK (5368709120) exceeded

This results in the -ENOMEM failure in vfio_dma_map.  On the vfio side, there is no reason this would be GPU specific, it should happen for any *assigned* device (I emphasize assigned because RHEV treats things like USB passthrough the same as PCI device assignment).  Has anyone every tested whether libvirt increases the process locked memory limits when there's an assigned device and memory is hot-added?
Comment 1 Laine Stump 2015-10-20 11:53:50 EDT
> Has anyone ever tested
> whether libvirt increases the process locked memory limits when there's an
> assigned device and memory is hot-added?

I haven't tested it, but I can see from examining the code that the only place libvirt sets the max locked memory limit is when an assigned PCI device is hotplugged.
Comment 2 Luyao Huang 2015-11-10 00:04:35 EST
I can reproduce this issue in x86_64 via virsh cmd:

1. set CONFIG_VFIO_PCI_VGA=y in /boot/config*:

# vim /boot/config-3.10.0-327.el7.x86_64

...
CONFIG_VFIO_PCI_VGA=y
...

2. load vfio and detach a VGA card:

#  modprobe vfio_pci

# echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts

# lspci
0f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV620 GL [FirePro V3700]

# virsh nodedev-detach pci_0000_0f_00_0
Device pci_0000_0f_00_0 detached

3. prepare a guest with maxmemory and passthrough VGA card:

# virsh dumpxml rhel7.0-rhel

  <maxMemory slots='16' unit='KiB'>25600000</maxMemory>
  <memory unit='KiB'>2048896</memory>
  <currentMemory unit='KiB'>2048896</currentMemory>

    <hostdev mode='subsystem' type='pci' managed='no'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0f' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' function='0x0'/>
    </hostdev>

4. start guest:

# virsh start rhel7.0-rhel
Domain rhel7.0-rhel started


5. hot-plug 1G memory device:
# virsh attach-device rhel7.0-rhel memdevice.xml 
error: Failed to attach device from memdevice.xml
error: Unable to read from monitor: Connection reset by peer

6. check the log:
# vim /var/log/libvirt/qemu/rhel7.0-rhel.log

2015-11-10T05:02:55.633874Z VFIO_MAP_DMA: -12
2015-11-10T05:02:55.633938Z vfio_dma_map(0x7fd51f42f7a0, 0x100000000, 0x40000000, 0x7fd428a00000) = -12 (Cannot allocate memory)
qemu: hardware error: vfio: DMA mapping failed, unable to continue
CPU #0:
RAX=00000000ffffffed RBX=ffffffff8193c000 RCX=0100000000000000 RDX=0000000000000000
Comment 3 Peter Krempa 2015-11-10 03:49:44 EST
commit ec90b34acf7cf7d06a63908c39e21b63382a1967
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Fri Nov 6 16:39:31 2015 +0100

    qemu: hotplug: Fix mlock limit handling on memory hotplug
    
    If mlock is required either due to use of VFIO hostdevs or due to the
    fact that it's enabled it needs to be tweaked prior to adding new memory
    or after removing a module. Add a helper to determine when it's
    necessary and reuse it both on hotplug and hotunplug.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1273491

commit fbc58cfcaeffdd4a350cf6abd67da6006f01b148
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Fri Nov 6 15:51:33 2015 +0100

    qemu: Extract logic to determine the mlock limit size for VFIO
    
    New function qemuDomainGetMlockLimitBytes will now handle the
    calculation so that it unifies the logic to one place and allows later
    reuse.

v1.2.21-28-gbaf55e1
Comment 4 Alex Williamson 2015-11-10 08:52:55 EST
(In reply to Luyao Huang from comment #2)
> I can reproduce this issue in x86_64 via virsh cmd:
> 
> 1. set CONFIG_VFIO_PCI_VGA=y in /boot/config*:
> 
> # vim /boot/config-3.10.0-327.el7.x86_64
> 
> ...
> CONFIG_VFIO_PCI_VGA=y
> ...
> 
> 2. load vfio and detach a VGA card:
> 
> #  modprobe vfio_pci
> 
> # echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts
> 
> # lspci
> 0f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> RV620 GL [FirePro V3700]

Why are we using a completely unsupported environment to reproduce this, a recompiled kernel and assignment of an unsupported GPU?  This will fail with any assigned device, just use a NIC instead of a GPU, or at least find a supported GPU.
Comment 5 Luyao Huang 2015-11-10 20:54:36 EST
(In reply to Alex Williamson from comment #4)
> (In reply to Luyao Huang from comment #2)
> > I can reproduce this issue in x86_64 via virsh cmd:
> > 
> > 1. set CONFIG_VFIO_PCI_VGA=y in /boot/config*:
> > 
> > # vim /boot/config-3.10.0-327.el7.x86_64
> > 
> > ...
> > CONFIG_VFIO_PCI_VGA=y
> > ...
> > 
> > 2. load vfio and detach a VGA card:
> > 
> > #  modprobe vfio_pci
> > 
> > # echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts
> > 
> > # lspci
> > 0f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> > RV620 GL [FirePro V3700]
> 
> Why are we using a completely unsupported environment to reproduce this, a
> recompiled kernel and assignment of an unsupported GPU?  This will fail with
> any assigned device, just use a NIC instead of a GPU, or at least find a
> supported GPU.

Hi Alex Williamson,

Thanks a lot for your comment, and sorry for my mistake. here is the test result with a NIC:

1. 
# lsmod|grep vfio
vfio_iommu_type1       17632  0 
vfio_pci               36735  0 
vfio                   25291  2 vfio_iommu_type1,vfio_pci

2.
# lspci
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5764M Gigabit Ethernet PCIe (rev 10)
03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

3.
# virsh nodedev-detach pci_0000_03_10_1
Device pci_0000_03_10_1 detached

4.
# virsh nodedev-dumpxml pci_0000_03_10_1
<device>
  <name>pci_0000_03_10_1</name>
  <path>/sys/devices/pci0000:00/0000:00:01.0/0000:03:10.1</path>
  <parent>pci_0000_00_01_0</parent>
  <driver>
    <name>vfio-pci</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>3</bus>
    <slot>16</slot>
    <function>1</function>
    <product id='0x10ca'>82576 Virtual Function</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='phys_function'>
      <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
    </capability>
    <iommuGroup number='30'>
      <address domain='0x0000' bus='0x03' slot='0x10' function='0x1'/>
    </iommuGroup>
    <pci-express>
      <link validity='cap' port='0' speed='2.5' width='4'/>
      <link validity='sta' width='0'/>
    </pci-express>
  </capability>
</device>

5.
# virsh dumpxml rhel7.0-rhel
...
  <maxMemory slots='32' unit='KiB'>25600000</maxMemory>
  <memory unit='KiB'>2048896</memory>
  <currentMemory unit='KiB'>2048896</currentMemory>

...
    <hostdev mode='subsystem' type='pci' managed='no'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x10' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' function='0x0'/>
    </hostdev>
...

6. hot-plug 1G memory device:

# cat memdevice.xml
    <memory model='dimm'>
      <target>
        <size unit='G'>1</size>
        <node>0</node>
      </target>
    </memory>

#  virsh attach-device rhel7.0-rhel memdevice.xml 
error: Failed to attach device from memdevice.xml
error: Unable to read from monitor: Connection reset by peer

7. check guest log:
# vim /var/log/libvirt/qemu/rhel7.0-rhel.log

2015-11-11T01:21:37.948375Z VFIO_MAP_DMA: -12
2015-11-11T01:21:37.948439Z vfio_dma_map(0x7f514b53d7a0, 0x100000000, 0x40000000, 0x7f5069800000) = -12 (Cannot allocate memory)
qemu: hardware error: vfio: DMA mapping failed, unable to continue



About the supported GPU, i found some information in http://vfio.blogspot.com

" But what about the graphics cards?  If you're looking for a solution supported by the graphics card vendor, you're limited to Nvidia Quadro K-series, model 2000 or better (or GRID or Tesla, but those are not terribly relevant in this context)."

Is this the current supported GPU list ? if not, could you please tell me the supported GPU list or where to find it ?

Thanks in advance for your reply.

Luyao
Comment 6 Alex Williamson 2015-11-10 22:15:42 EST
(In reply to Luyao Huang from comment #5) 
> About the supported GPU, i found some information in http://vfio.blogspot.com
> 
> " But what about the graphics cards?  If you're looking for a solution
> supported by the graphics card vendor, you're limited to Nvidia Quadro
> K-series, model 2000 or better (or GRID or Tesla, but those are not terribly
> relevant in this context)."
> 
> Is this the current supported GPU list ? if not, could you please tell me
> the supported GPU list or where to find it ?

It's correct for RHEL7, the only GPUs we support for assignment are NVIDIA K-series Quadro (model 2000 or higher), GRID, and Tesla, but in general, external web pages describing upstream capabilities should not be taken as the state of support for RHEL.  VFIO VGA support is specifically disabled in the RHEL kernel because it is unsupported.  We never support customers recompiling packages to change build config options.
Comment 7 Alex Williamson 2015-11-10 22:16:19 EST
(In reply to Luyao Huang from comment #5) 
> About the supported GPU, i found some information in http://vfio.blogspot.com
> 
> " But what about the graphics cards?  If you're looking for a solution
> supported by the graphics card vendor, you're limited to Nvidia Quadro
> K-series, model 2000 or better (or GRID or Tesla, but those are not terribly
> relevant in this context)."
> 
> Is this the current supported GPU list ? if not, could you please tell me
> the supported GPU list or where to find it ?

It's correct for RHEL7, the only GPUs we support for assignment are NVIDIA K-series Quadro (model 2000 or higher), GRID, and Tesla, but in general, external web pages describing upstream capabilities should not be taken as the state of support for RHEL.  VFIO VGA support is specifically disabled in the RHEL kernel because it is unsupported.  We never support customers recompiling packages to change build config options.
Comment 8 Alex Williamson 2015-11-10 22:17:16 EST
oops, sorry for the double comment.
Comment 9 Luyao Huang 2015-11-11 00:36:28 EST
(In reply to Alex Williamson from comment #7)
> (In reply to Luyao Huang from comment #5) 
> > About the supported GPU, i found some information in http://vfio.blogspot.com
> > 
> > " But what about the graphics cards?  If you're looking for a solution
> > supported by the graphics card vendor, you're limited to Nvidia Quadro
> > K-series, model 2000 or better (or GRID or Tesla, but those are not terribly
> > relevant in this context)."
> > 
> > Is this the current supported GPU list ? if not, could you please tell me
> > the supported GPU list or where to find it ?
> 
> It's correct for RHEL7, the only GPUs we support for assignment are NVIDIA
> K-series Quadro (model 2000 or higher), GRID, and Tesla, but in general,
> external web pages describing upstream capabilities should not be taken as
> the state of support for RHEL.  VFIO VGA support is specifically disabled in

Got it.

> the RHEL kernel because it is unsupported.  We never support customers
> recompiling packages to change build config options.

Oh, i see, that is why you said i used a recompiled kernel. And thanks a lot for your reply.
Comment 12 Peter Krempa 2015-11-12 02:17:26 EST
Two follow up patches:
commit 63ed05d2410bfb9179b26bc3422ea5f9b546d7b3
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Wed Nov 11 06:44:56 2015 +0100

    qemu: Explain mlock limit size more in detail
    
    Based on Alex's explanation [1] in the recent discussion let's update
    the comment explaining the memory lock limit calculation.
    
    [1]
    http://www.redhat.com/archives/libvir-list/2015-November/msg00329.html

commit e7b91c510e9831b2741469809465bb68a87c8362
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Wed Nov 11 06:49:06 2015 +0100

    qemu: domain: Restructurate control flow in qemuDomainGetMlockLimitBytes
    
    Break early when hard limit is set so that it's not intermixed by other
    logic for determining the limit.
Comment 18 Luyao Huang 2016-08-17 03:34:38 EDT
Test with qemu-kvm-rhev-2.6.0-20.el7.x86_64 and libvirt-2.0.0-5.el7.x86_64:

a) memory hotplug + hostdev vf  : PASS

1. prepare guest numa + vf:

# virsh dumpxml r7-lhuang
...
  <maxMemory slots='16' unit='KiB'>25600000</maxMemory>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>4</vcpu>
...
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
...
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x86' slot='0x10' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' function='0x0'/>
    </hostdev>
...

2. 
# virsh start r7-lhuang
Domain r7-lhuang started

3. check qemu process limit:

# prlimit -p 10014
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 3221225472 3221225472 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes

4. hotplug a memory device:

# virsh attach-device r7-lhuang memdevice.xml
Device attached successfully

5. check process limit:

# prlimit -p 10014
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 4294967296 4294967296 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes

6. recheck xml:

# virsh dumpxml r7-lhuang
...
  <maxMemory slots='16' unit='KiB'>25600000</maxMemory>
  <memory unit='KiB'>3145728</memory>
  <currentMemory unit='KiB'>3145728</currentMemory>
...
    <memory model='dimm'>
      <target>
        <size unit='KiB'>1048576</size>
        <node>1</node>
      </target>
      <alias name='dimm0'/>
      <address type='dimm' slot='0' base='0x100000000'/>
    </memory>





b) test memory hotplug + vf hot-plug  :   PASS

1. prepare guest with maxmemory + numa:

# virsh dumpxml r7-lhuang
...
  <maxMemory slots='16' unit='KiB'>25600000</maxMemory>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>4</vcpu>
...
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
...

2. 
# virsh start r7-lhuang
Domain r7-lhuang started


3. attach vf to guest:
# cat vf.xml 
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x86' slot='0x10' function='0x1'/>
      </source>
    </hostdev>

# virsh attach-device r7-lhuang vf.xml 
Device attached successfully

4. check process limit:

# prlimit -p 11381
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 3221225472 3221225472 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes

5. attach memory device:

# cat memdevice10G.xml
    <memory model='dimm'>
      <target>
        <size unit='G'>10</size>
        <node>1</node>
      </target>
    </memory>


# virsh attach-device r7-lhuang memdevice10G.xml 
Device attached successfully

6. recheck process limit:

# prlimit -p 11381
RESOURCE   DESCRIPTION                               SOFT        HARD UNITS
AS         address space limit                  unlimited   unlimited bytes
CORE       max core file size                           0   unlimited blocks
CPU        CPU time                             unlimited   unlimited seconds
DATA       max data size                        unlimited   unlimited bytes
FSIZE      max file size                        unlimited   unlimited blocks
LOCKS      max number of file locks held        unlimited   unlimited 
MEMLOCK    max locked-in-memory address space 13958643712 13958643712 bytes
MSGQUEUE   max bytes in POSIX mqueues              819200      819200 bytes
NICE       max nice prio allowed to raise               0           0 
NOFILE     max number of open files                  1024        4096 
NPROC      max number of processes                 498491      498491 
RSS        max resident set size                unlimited   unlimited pages
RTPRIO     max real-time priority                       0           0 
RTTIME     timeout for real-time tasks          unlimited   unlimited microsecs
SIGPENDING max number of pending signals           498491      498491 
STACK      max stack size                         8388608   unlimited bytes
Comment 19 Luyao Huang 2016-08-17 03:45:06 EDT
c) test locked memory + memory hot-plug  :  FAIL

1. 
# virsh dumpxml r7-lhuang
<domain type='kvm'>
  <name>r7-lhuang</name>
  <uuid>09143880-044e-4a38-82cc-234c274c2c19</uuid>
  <maxMemory slots='16' unit='KiB'>25600000</maxMemory>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <memoryBacking>
    <locked/>
  </memoryBacking>
...
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>

2. start guest:

# virsh start r7-lhuang
Domain r7-lhuang started

3. check guest prlimit

# prlimit -p 16599
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 3221225472 3221225472 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes

4. guest will be shut off after a moment:

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     r7-lhuang                      shut off

5. guest log:

qemu: qemu_thread_create: Resource temporarily unavailable
2016-08-17 07:39:54.455+0000: shutting down

Libvirt set memlock to guest memory size + 1G (3221225472), but qemu still need more memory
Comment 20 Luyao Huang 2016-08-17 04:33:11 EDT
d) Test with memory unplug + vf     :     PASS

1. prepare a guest xml with 
# virsh dumpxml r7-lhuang
...
  <maxMemory slots='16' unit='KiB'>25600000</maxMemory>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>4</vcpu>
...
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
...
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x86' slot='0x10' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </hostdev>
...

2. start guest:

# virsh start r7-lhuang
Domain r7-lhuang started

3. check qemu prlimit:

# prlimit -p 17697
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 3221225472 3221225472 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes

4. 

# cat memdevice128.xml 

    <memory model='dimm'>
      <target>
        <size unit='M'>128</size>
        <node>0</node>
      </target>
    </memory>

# virsh attach-device r7-lhuang memdevice128.xml 
Device attached successfully

# prlimit -p 17697
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 3355443200 3355443200 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes

5. hot unplug memory device:

# virsh detach-device r7-lhuang memdevice128.xml 
Device detached successfully

6. check qemu prlimit:

# prlimit -p 17697
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 3221225472 3221225472 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes



e) test hugepage + vf + memory hot-plug   :   PASS

1. prepare a guest with hugepage + vf + maxmemory:

# virsh dumpxml r7-lhuang
<domain type='kvm'>
  <name>r7-lhuang</name>
  <uuid>09143880-044e-4a38-82cc-234c274c2c19</uuid>
  <maxMemory slots='16' unit='KiB'>25600000</maxMemory>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB' nodeset='0-1'/>
    </hugepages>
  </memoryBacking>
...
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
...
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x86' slot='0x10' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </hostdev>

2. start guest:

# virsh start r7-lhuang
Domain r7-lhuang started

3. check qemu memlock :

# prlimit -p 18271
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 3221225472 3221225472 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes

4. attach memory device:

# cat memdevice.xml 
    <memory model='dimm'>
      <target>
        <size unit='M'>1024</size>
        <node>1</node>
      </target>
    </memory>

# virsh attach-device r7-lhuang memdevice.xml 
Device attached successfully

# prlimit -p 18271
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 4294967296 4294967296 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes




f) test hugepage + hot plug vf + memory hot-plug and hot-unplug   :    PASS

1. prepare a guest with maxmemory+numa+hugepage:

# virsh dumpxml r7-lhuang
<domain type='kvm'>
  <name>r7-lhuang</name>
  <uuid>09143880-044e-4a38-82cc-234c274c2c19</uuid>
  <maxMemory slots='16' unit='KiB'>25600000</maxMemory>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB' nodeset='0-1'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
...
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>

2. start guest:

# virsh start r7-lhuang
Domain r7-lhuang started

3. check qemu memlock:

# prlimit -p 18454
RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
AS         address space limit                unlimited unlimited bytes
CORE       max core file size                         0 unlimited blocks
CPU        CPU time                           unlimited unlimited seconds
DATA       max data size                      unlimited unlimited bytes
FSIZE      max file size                      unlimited unlimited blocks
LOCKS      max number of file locks held      unlimited unlimited 
MEMLOCK    max locked-in-memory address space     65536     65536 bytes
MSGQUEUE   max bytes in POSIX mqueues            819200    819200 bytes
NICE       max nice prio allowed to raise             0         0 
NOFILE     max number of open files                1024      4096 
NPROC      max number of processes               498491    498491 
RSS        max resident set size              unlimited unlimited pages
RTPRIO     max real-time priority                     0         0 
RTTIME     timeout for real-time tasks        unlimited unlimited microsecs
SIGPENDING max number of pending signals         498491    498491 
STACK      max stack size                       8388608 unlimited bytes

4. attach vf:

# virsh attach-device r7-lhuang vf.xml 
Device attached successfully

# prlimit -p 18454
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 3221225472 3221225472 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes

5. hot-plug memory device:

# virsh attach-device r7-lhuang memdevice.xml 
Device attached successfully

# prlimit -p 18454
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 4294967296 4294967296 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes

6. hot-unplug memory device:

# virsh detach-device r7-lhuang memdevice.xml 
Device detached successfully

# cat memdevice.xml 
    <memory model='dimm'>
      <target>
        <size unit='M'>1024</size>
        <node>1</node>
      </target>
    </memory>

# prlimit -p 18454
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited blocks
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited blocks
LOCKS      max number of file locks held       unlimited  unlimited 
MEMLOCK    max locked-in-memory address space 3221225472 3221225472 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024       4096 
NPROC      max number of processes                498491     498491 
RSS        max resident set size               unlimited  unlimited pages
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals          498491     498491 
STACK      max stack size                        8388608  unlimited bytes
Comment 21 Luyao Huang 2016-08-17 22:18:22 EDT
Hi Peter,

I found that guest with locked memory cannot start with qemu-kvm-rhev-2.6.0-20.el7.x86_64 and libvirt-2.0.0-5.el7.x86_64 ( check comment 19 ). And i know libvirt will set memlock to guest memory + 1G even there is not memtune hard limit in guest xml.
Also this test was passed on rhel7.2.z (you can check bug 1280420 comment 6).
Could you please help to check if this is expected ? or qemu will eat more memory right now ? Thanks in advance for your reply.
Comment 22 Nisim Simsolo 2016-08-18 10:48:40 EDT
Verified for both Intel and AMD architecture with Quadro K4200 and Quadro K2200 respectively on Windows and Linux VMs.
In addition to the GPUs attached, I also attached PCIe NIC, USB devices and SCSI controller to the VM before the memory hotplug action.

Verification builds:
Red Hat Enterprise Linux Server release 7.3 Beta (Maipo)
kernel-3.10.0-492.el7.x86_64
vdsm-4.18.11-1.el7ev.x86_64
qemu-kvm-rhev-2.6.0-21.el7.x86_64
libvirt-client-2.0.0-5.el7.x86_64
sanlock-3.4.0-1.el7.x86_64

ovirt-engine-4.0.2.6-0.1.el7ev.noarch
Comment 23 Peter Krempa 2016-09-08 09:20:57 EDT
(In reply to Luyao Huang from comment #21)
> Hi Peter,
> 
> I found that guest with locked memory cannot start with
> qemu-kvm-rhev-2.6.0-20.el7.x86_64 and libvirt-2.0.0-5.el7.x86_64 ( check
> comment 19 ). And i know libvirt will set memlock to guest memory + 1G even
> there is not memtune hard limit in guest xml.

The memlock case is different from the problem with assigned devices. For memory locking it's really necessary to provide the memory limit via the memtune since locking memory includes all the buffers and other overhead of qemu that is locked into memory along with the guest memory.

For device assignment only the guest memory plus some additional space needs to be locked in, not everything including disk buffers and stuff.

> Also this test was passed on rhel7.2.z (you can check bug 1280420 comment 6).
> Could you please help to check if this is expected ? or qemu will eat more
> memory right now ? Thanks in advance for your reply.

Yes that may happen since the overhead of qemu is unknown and can't be really calulated. Also that test case is not relevant in any way to the original BZ. For memory locking you really should provide the memory hard limit.
Comment 24 Luyao Huang 2016-09-08 21:32:51 EDT
(In reply to Peter Krempa from comment #23)
> (In reply to Luyao Huang from comment #21)
> > Hi Peter,
> > 
> > I found that guest with locked memory cannot start with
> > qemu-kvm-rhev-2.6.0-20.el7.x86_64 and libvirt-2.0.0-5.el7.x86_64 ( check
> > comment 19 ). And i know libvirt will set memlock to guest memory + 1G even
> > there is not memtune hard limit in guest xml.
> 
> The memlock case is different from the problem with assigned devices. For
> memory locking it's really necessary to provide the memory limit via the
> memtune since locking memory includes all the buffers and other overhead of
> qemu that is locked into memory along with the guest memory.
> 
> For device assignment only the guest memory plus some additional space needs
> to be locked in, not everything including disk buffers and stuff.
> 
> > Also this test was passed on rhel7.2.z (you can check bug 1280420 comment 6).
> > Could you please help to check if this is expected ? or qemu will eat more
> > memory right now ? Thanks in advance for your reply.
> 
> Yes that may happen since the overhead of qemu is unknown and can't be
> really calulated. Also that test case is not relevant in any way to the
> original BZ. For memory locking you really should provide the memory hard
> limit.

Got it, thanks for your clearly explanation
Comment 26 errata-xmlrpc 2016-11-03 14:27:33 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html

Note You need to log in before you can comment on or make changes to this bug.