Bug 1153590 - Improve error message on huge page preallocation
Summary: Improve error message on huge page preallocation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Luiz Capitulino
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-16 09:25 UTC by Michal Privoznik
Modified: 2015-03-05 09:56 UTC (History)
5 users (show)

Fixed In Version: qemu-kvm-rhev-2.1.2-7.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-05 09:56:45 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0624 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2015-03-05 14:37:36 UTC

Description Michal Privoznik 2014-10-16 09:25:13 UTC
Description of problem:
When there's not enough huge pages in the system, and prealloc was requested on the command line, qemu will fail with this not so helpful message.


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.1.2-3.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Allocate huge pages pool
2. Run qemu with huge pages memory backing that requires more pages than in the pool
3. Observe error

Actual results:
> # virsh start migt10
> error: Failed to start domain migt10
> error: internal error: process exited while connecting to monitor: os_mem_prealloc: failed to preallocate pages


Expected results:
Something like "not enough pages in the pool" or similar.

Additional info:
This can be reproduced via libvirt too. Just define a domain like this:

<domain type='kvm' id='2'>
  <name>migt10</name>
  <uuid>9dd81882-178b-4df0-8dae-ab864a079a4f</uuid>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0-3'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <numatune>
    <memory mode='strict' nodeset='0-3'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='1'/>
    <memnode cellid='2' mode='strict' nodeset='2'/>
    <memnode cellid='3' mode='strict' nodeset='3'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.1.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu>
    <numa>
      <cell id='0' cpus='0' memory='262144'/>
      <cell id='1' cpus='1' memory='262144'/>
      <cell id='2' cpus='2' memory='262144'/>
      <cell id='3' cpus='3' memory='262144'/>
    </numa>
  </cpu>
  ...
</domain>

Then allocate say 128 of 2M huge pages:

# echo 128 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

and try to run the domain:

# virsh start migt10
error: Failed to start domain migt10
error: internal error: process exited while connecting to monitor: os_mem_prealloc: failed to preallocate pages


Interesting thing is, if I set the pool size to 127 I see completely different error (which might serve as example of good error message):

 virsh start migt10
error: Failed to start domain migt10
error: internal error: process exited while connecting to monitor: 2014-10-16T09:23:53.652848Z qemu-kvm: -object memory-backend-file,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=256M,id=ram-node0,host-nodes=0,policy=bind: unable to map backing store for hugepages: Cannot allocate memory
2014-10-16T09:23:53.653500Z qemu-kvm: -object memory-backend-file,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=256M,id=ram-node1,host-nodes=1,policy=bind: unable to map backing store for hugepages: Cannot allocate memory
2014-10-16T09:23:53.653664Z qemu-kvm: -object memory-backend-file,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=256M,id=ram-node2,host-nodes=2,policy=bind: unable to map backing store for hugepages: Cannot allocate memory
2014-10-16T09:23:53.653823Z qemu-kvm: -object memory-backend-file,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=256M,id=ram-node3,host-nodes=3,policy=bind: unable to map backing store for hugepages: Cannot allocate memory

Comment 2 Michal Privoznik 2014-10-16 13:06:30 UTC
Even better error message could be "Insufficient free host memory pages available to allocate guest RAM"

Comment 3 Michal Privoznik 2014-10-16 13:35:43 UTC
Patch proposed upstream:

https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg01778.html

Comment 4 Luiz Capitulino 2014-10-21 16:01:56 UTC
Michal, I'll review your patch upstream shortly. If you would like to backport it to RHEL yourself, just re-assign this BZ to you.

Comment 5 Miroslav Rezanina 2014-11-06 18:33:11 UTC
Fix included in qemu-kvm-rhev-2.1.2-7.el7

Comment 7 huiqingding 2014-11-17 06:45:59 UTC
Reproduce this issue using the following version:
kernel-3.10.0-203.el7.x86_64
qemu-kvm-rhev-2.1.2-7.el7.x86_64

Steps to Reproduce:
1. the host support 1G hugepage
cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.10.0-203.el7.x86_64 root=/dev/mapper/rhel_intel--brickland--0100-root ro console=ttyS0,115200n81 crashkernel=auto rd.lvm.lv=rhel_intel-brickland-0100/root rd.lvm.lv=rhel_intel-brickland-0100/swap systemd.debug LANG=en_US.UTF-8 intel_iommu=on hugepagesz=1G default_hugepagesz=1G

2. assign one hugepage for each numa node
# echo 1 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages 
# echo 1 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages 
# echo 1 > /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/nr_hugepages 
# echo 1 > /sys/devices/system/node/node3/hugepages/hugepages-1048576kB/nr_hugepages 

3. start a vm with 4 numa node and set 2G to the memory of each numa node
# virsh start vm1

ps: the xml file is as following:
<domain type='kvm'>
  <name>vm1</name>
  <uuid>3e89cb31-73e0-4e78-8616-513127d3a14f</uuid>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB' nodeset='0-3'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <numatune>
    <memory mode='strict' nodeset='0-3'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='1'/>
    <memnode cellid='2' mode='strict' nodeset='2'/>
    <memnode cellid='3' mode='strict' nodeset='3'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu>
    <numa>
      <cell id='0' cpus='0' memory='2097152'/>
      <cell id='1' cpus='1' memory='2097152'/>
      <cell id='2' cpus='2' memory='2097152'/>
      <cell id='3' cpus='3' memory='2097152'/>
    </numa>
  </cpu>
... ...
</domain>

Results:
after step3, the following error info is outputed:
# virsh start vm1
error: Failed to start domain vm1
error: internal error: early end of file from monitor: possible problem:
os_mem_prealloc: failed to preallocate pages

Comment 8 huiqingding 2014-11-17 06:49:13 UTC
Test this issue using the following version:
kernel-3.10.0-203.el7.x86_64
qemu-kvm-rhev-2.1.2-7.el7.x86_64

Use the same steps of comment 7, the result is that after step3, the error info is outputed:
# virsh start vm1
error: Failed to start domain vm1
error: internal error: early end of file from monitor: possible problem:
os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM


Based on the above result, I think this issue has been fixed.

Comment 11 errata-xmlrpc 2015-03-05 09:56:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html


Note You need to log in before you can comment on or make changes to this bug.