1371211 – Qemu 2.6 won't boot guest with 2 meg hugepages

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1371211 - Qemu 2.6 won't boot guest with 2 meg hugepages

Summary: Qemu 2.6 won't boot guest with 2 meg hugepages

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.3
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	alpha
Target Release:	7.3
Assignee:	Eduardo Habkost
QA Contact:	Yumei Huang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-29 15:00 UTC by Joe Mario
Modified:	2016-11-07 21:32 UTC (History)
CC List:	15 users (show)
Fixed In Version:	qemu-kvm-rhev-2.6.0-24.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-07 21:32:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Here's the xml file which showed the problem. (12.81 KB, text/plain) 2016-08-30 14:13 UTC, Joe Mario	no flags	Details
Here's the /var/log/libvirt/qemu/saphana-r73b.log file (3.80 KB, text/plain) 2016-08-30 14:17 UTC, Joe Mario	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:2673	0	normal	SHIPPED_LIVE	qemu-kvm-rhev bug fix and enhancement update	2016-11-08 01:06:13 UTC

Description Joe Mario 2016-08-29 15:00:04 UTC

Description of problem:
Version-Release number of selected component:

When we upgraded a working RHEL 7.2 system, using qemu-kvm 2.3, to RHEL 7.3 with qemu-kvm 2.6, our guests would not boot.

We narrowed the problem down to the existence of 2-meg hugepages.
The guest boots fine when hugepages is removed from the xml.
It also boots fine if we roll back the qemu-kvm to the 2.3 version.


How reproducible:
We've reproduced this on two systems with two different guest images.

Note, there is no problem with 1-gig hugepages.  The problem only occurs with 2-meg hugepages.

Steps to Reproduce:
1.  Upgrade to RHEL 7.3 and qemu-kvm 2.6.
2. Allocate 2-meg hugepages - the amount needed by the guest.
3. Try to bring up the guest.  It should fail.
4. Remove the hugepage entry from the .xml file and retry.  The guest should boot fine.

Comment 2 Bob Sibley 2016-08-29 19:25:37 UTC

This problem occurs on both 1G and 2M hugepages.

3.10.0-496.el7.x86_64
qemu-kvm-rhev-2.6.0-22.el7.x86_64
libvirt-2.0.0-4.el7.x86_64

Hugepagesize:    1048576 kB
HugePages_Total:     230

When tying to allocate hugepages:

  <memory unit='KiB'>230686720</memory>
  <currentMemory unit='KiB'>230686720</currentMemory>

   <memoryBacking>
    <hugepages/>
  </memoryBacking>

  <numatune>
    <memory mode='strict' nodeset='0-1'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='1'/>
  </numatune>

    <numa>
      <cell id='0' cpus='0-6' memory='115343360' unit='KiB'/>
      <cell id='1' cpus='7-13' memory='115343360' unit='KiB'/>
    </numa>

guest startup failes:

error: Failed to start domain rhel72z
error: monitor socket did not show up: No such file or directory


When removing the "memnode" tag from numatune, the guest starts, with 220 GB of memory in the guest.


when I leave the memnode in the numatune, and reduce the "memory" size in the numa tag, I can start the guest with 48 GB of memory:

all the same as above except:

    <numa>
      <cell id='0' cpus='0-6' memory='25165824' unit='KiB'/>
      <cell id='1' cpus='7-13' memory='25165824' unit='KiB'/>
    </numa>

Comment 3 Eduardo Habkost 2016-08-30 13:11:53 UTC

To investigate the issue, we need the log files from /var/log/libvirt/qemu. The full libvirt XML for the VM would be very useful, too.

Comment 4 Joe Mario 2016-08-30 14:13:55 UTC

Created attachment 1195927 [details]
Here's the xml file which showed the problem.

This is the xml file we used.  Though it does reproduce with much smaller memory sizes (instead of the 480gig in this file).

Comment 5 Joe Mario 2016-08-30 14:17:53 UTC

Created attachment 1195931 [details]
Here's the /var/log/libvirt/qemu/saphana-r73b.log file

Comment 6 Joe Mario 2016-08-30 14:23:07 UTC

Just to clarify what Bob Sibley noted earlier, if we remove the following four lines from the xml file:

>     <memnode cellid='0' mode='strict' nodeset='0'/>
>     <memnode cellid='1' mode='strict' nodeset='1'/>
>     <memnode cellid='2' mode='strict' nodeset='2'/>
>     <memnode cellid='3' mode='strict' nodeset='3'/>

then the guest will boot with the 2.6 qemu.

I mention this as a triage tip, not as a way of saying it fixes the problem.  
Those four lines have always worked for us in previous qemu versions.

Joe

Comment 7 Eduardo Habkost 2016-08-30 14:55:08 UTC

Are there enough hugepages available on each NUMA node?

This can be checked using:
# grep '' /sys/devices/system/node/node*/hugepages/hugepages-*/nr_hugepages 

This is just to check if the buggy code is in the memory allocation error path. QEMU+libvirt are supposed to show a more meaningful error message in this case, anyway.

Comment 8 Joe Mario 2016-08-30 15:06:28 UTC

re: Are there enough hugepages available on each NUMA node?

Yes, for three reasons:

First, we did verify the allocation on each node:
[root@brickland3-10ge virt]# grep '' /sys/devices/system/node/node*/hugepages/hugepages-*/nr_hugepages
/sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:61440
/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages:0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:61440
/sys/devices/system/node/node2/hugepages/hugepages-1048576kB/nr_hugepages:0
/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages:61440
/sys/devices/system/node/node3/hugepages/hugepages-1048576kB/nr_hugepages:0
/sys/devices/system/node/node3/hugepages/hugepages-2048kB/nr_hugepages:61440

Second:
 Without even changing the hugepage allocation, we can successfully invoke the v2.3 qemu and the guest boots fine, followed by invoking the newer v2.6 qemu where it then fails.

Third:
 As we were triaging this, we tried modified xml files using far far fewer hugepages than the system had allocated, and the problem still existed.

Joe

Comment 9 Eduardo Habkost 2016-08-30 15:07:54 UTC

(In reply to Joe Mario from comment #4)
> Created attachment 1195927 [details]
> Here's the xml file which showed the problem.
> 
> This is the xml file we used.  Though it does reproduce with much smaller
> memory sizes (instead of the 480gig in this file).

I only have a 8GB, 2-node, 24-cpu host available for testing, removed the vcpupin elements, and changed memory sizes to:

  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  [...]
  <numatune>
    <memory mode='strict' nodeset='0-1'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='1'/>
    <memnode cellid='2' mode='strict' nodeset='0'/>
    <memnode cellid='3' mode='strict' nodeset='1'/>
  </numatune>
  [...]
    <numa>
      <cell id='0' cpus='0-47' memory='1048576' unit='KiB'/>
      <cell id='1' cpus='48-95' memory='1048576' unit='KiB'/>
      <cell id='2' cpus='96-143' memory='1048576' unit='KiB'/>
      <cell id='3' cpus='144-191' memory='1048576' unit='KiB'/>
    </numa>

After ensuring 1024 2MB hugepages were allocated on each node with:

# echo 1024 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages 
# echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages 

the VM started, so I couldn't reproduce the problem.

Comment 10 Eduardo Habkost 2016-08-30 15:17:06 UTC

Thanks, I will try to reproduce the bug on a larger system. Can you confirm that the bug is reproducible even if you remove all the disk and network devices from the libvirt XML? This way I will be sure I am using exactly the same configuration you use to reproduce it.

Comment 11 Joe Mario 2016-08-30 17:37:00 UTC

Eduardo:
 Here's something which should be a little more helpful.  I've  pared the xml file way down to something easier for you to use.

 It appears to be dependent on the memory size.

 The xml file below boots fine using the 80Gib total memory size, with 25% going to each node.

 However, when I bump the memory size to 128Gib, it fails to boot.  I've left those sizes in the two commented-out fields for your reference.

 The system still has the original 480 Gib of hugepages reserved, so there's plenty available.

Joe

<domain type='kvm'>
  <name>saphana-r73b</name>
  <uuid>05a142dc-9ba8-4848-b8de-3b40a6ed4a73</uuid>
<!--
  Fails to boot with this amount, and 30720000 KiB/cell id below.
  <memory unit='KiB'>122880000</memory>
  <currentMemory unit='KiB'>122880000</currentMemory>
-->
  <memory unit='KiB'>81920000</memory>
  <currentMemory unit='KiB'>81920000</currentMemory>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <numatune>
    <memory mode='strict' nodeset='0-3'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='1'/>
    <memnode cellid='2' mode='strict' nodeset='2'/>
    <memnode cellid='3' mode='strict' nodeset='3'/>
  </numatune>
  <vcpu placement='static'>32</vcpu>
  <cpu mode='host-passthrough'>
    <topology sockets='4' cores='8' threads='1'/>
    <numa>
<!--
  Fails to boot with this amount, along with 122880000 memory unit above.
      <cell id='0' cpus='0-7' memory='30720000' unit='KiB'/>
      <cell id='1' cpus='8-15' memory='30720000' unit='KiB'/>
      <cell id='2' cpus='16-23' memory='30720000' unit='KiB'/>
      <cell id='3' cpus='24-31' memory='30720000' unit='KiB'/>
-->
      <cell id='0' cpus='0-7' memory='20480000' unit='KiB'/>
      <cell id='1' cpus='8-15' memory='20480000' unit='KiB'/>
      <cell id='2' cpus='16-23' memory='20480000' unit='KiB'/>
      <cell id='3' cpus='24-31' memory='20480000' unit='KiB'/>
    </numa>
  </cpu>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source file='/djd/virt/saphana-r73b.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
  </devices>
</domain>

Comment 12 Eduardo Habkost 2016-08-30 20:28:57 UTC

I could reproduce it on a larger machine. It looks like qemu-kvm is really getting stuck on initialization, and I could reproduce it without using libvirt.

Comment 13 Eduardo Habkost 2016-08-30 20:33:27 UTC

The problem seems to be caused by the prealloc process taking too long, and triggering a timeout on libvirt.

Attaching gdb to qemu-kvm while libvirt was waiting for the monitor to show up:

(gdb) bt
#0  memset (__len=1, __ch=0, __dest=<optimized out>) at /usr/include/bits/string3.h:84
#1  os_mem_prealloc (fd=10, area=area@entry=0x7f7d1ca00000 "", memory=memory@entry=20971520000, errp=errp@entry=0x7ffd8986c490) at util/oslib-posix.c:365
#2  0x00007f82132bcf8e in host_memory_backend_memory_complete (uc=<optimized out>, errp=0x7ffd8986c4f0) at backends/hostmem.c:334
#3  0x00007f82133beff3 in user_creatable_add_type (type=<optimized out>, id=0x7f82162082d0 "ram-node0", qdict=qdict@entry=0x7f8216301200, v=v@entry=0x7f8216256e40,
    errp=errp@entry=0x7ffd8986c568) at qom/object_interfaces.c:137
#4  0x00007f82133bf324 in user_creatable_add (qdict=qdict@entry=0x7f8216300000, v=0x7f8216256e40, errp=errp@entry=0x7ffd8986c5d0) at qom/object_interfaces.c:67
#5  0x00007f82133bf375 in user_creatable_add_opts (opts=opts@entry=0x7f82162321e0, errp=errp@entry=0x7ffd8986c5d0) at qom/object_interfaces.c:162
#6  0x00007f82133bf3f8 in user_creatable_add_opts_foreach (opaque=0x7f82132a0790 <object_create_initial>, opts=0x7f82162321e0, errp=<optimized out>)
    at qom/object_interfaces.c:182
#7  0x00007f8213466eda in qemu_opts_foreach (list=<optimized out>, func=0x7f82133bf3b0 <user_creatable_add_opts_foreach>,
    opaque=opaque@entry=0x7f82132a0790 <object_create_initial>, errp=errp@entry=0x0) at util/qemu-option.c:1116
#8  0x00007f8213192c96 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4318

Comment 14 Joe Mario 2016-08-30 21:41:57 UTC

Interesting find.

It's unfortunate that the "_dest" value in memset got optimized away.  

Here is some "thinking out loud".

The timeout occurs while you're spinning in memset, as you noted.

359         int i;
360         size_t hpagesize = qemu_fd_getpagesize(fd);
361         size_t numpages = DIV_ROUND_UP(memory, hpagesize);
362 
363         /* MAP_POPULATE silently ignores failures */
364         for (i = 0; i < numpages; i++) {
365             memset(area + (hpagesize * i), 0, 1);
366         }

With the memory size you used (20971520000), that gives you 10,000 for "numpages".

The 32-bit integer "i" is going to grow to 10000 in the for loop.
So far so good.

The multiplication in "(hpagesize * i)" should promote "i" from a signed int to an unsigned long, because hpagesize is an unsigned long.

However if something went wrong in that promotion, then "hpagesize *i" can grow to just over the size of a signed int.  And the memory size that triggers this problem is around the size when (hpagesize * i) can overflow a signed int.

If "i" was not properly promoted to an unsigned long, it would overflow and cause that computation havoc.

I don't know if there's a codegen bug or not.  I'm not saying there is.
But it should be trivial for you to change the datatype of "i" from an int to a long and retry.

Just a thought.

Joe

Comment 15 Joe Mario 2016-08-30 21:55:33 UTC

If the above quick test in #14 doesn't work, then build an unoptimized version that doesn't have the "__dest" parameter in memset optimized away.

It's important to confirm you're memset'ing a valid address.

Comment 16 Eduardo Habkost 2016-08-30 22:12:01 UTC

I believe the code is working as expected, but it is just taking too long because the kernel is zeroing the pages before making them available to the process. I changed the command-line to create a smaller guest (8GB), and it takes ~5 seconds to prealloc all memory.

Probably older QEMU versions did the prealloc after initializing the monitor, and now this is being done earlier during initialization (and we need to fix that). I will continue looking into it tomorrow.

Comment 17 Eduardo Habkost 2016-08-31 00:26:44 UTC

Evidence that it's just prealloc taking too long:

+ /usr/libexec/qemu-kvm -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=10000000000 -object adslkjdsaf
qemu-kvm: -object adslkjdsaf: Parameter 'id' is missing

real    0m2.993s
user    0m0.060s
sys     0m2.934s
+ for n in 1 2 3 5 7 10 15 20 30 50
+ /usr/libexec/qemu-kvm -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=20000000000 -object adslkjdsaf
qemu-kvm: -object adslkjdsaf: Parameter 'id' is missing

real    0m11.999s
user    0m0.070s
sys     0m11.929s
+ for n in 1 2 3 5 7 10 15 20 30 50
+ /usr/libexec/qemu-kvm -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=30000000000 -object adslkjdsaf
qemu-kvm: -object adslkjdsaf: Parameter 'id' is missing

real    0m11.542s
user    0m0.085s
sys     0m11.458s
+ for n in 1 2 3 5 7 10 15 20 30 50
+ /usr/libexec/qemu-kvm -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=50000000000 -object adslkjdsaf
qemu-kvm: -object adslkjdsaf: Parameter 'id' is missing

real    0m53.485s
user    0m0.085s
sys     0m53.398s
+ for n in 1 2 3 5 7 10 15 20 30 50
+ /usr/libexec/qemu-kvm -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=70000000000 -object adslkjdsaf
qemu-kvm: -object adslkjdsaf: Parameter 'id' is missing

real    0m48.375s
user    0m0.071s
sys     0m48.299s

Comment 18 Joe Mario 2016-08-31 00:34:30 UTC

Agreed.  Thanks for that explanation.
Joe

Comment 19 Michal Privoznik 2016-08-31 07:12:52 UTC

(In reply to Eduardo Habkost from comment #13)
> The problem seems to be caused by the prealloc process taking too long, and
> triggering a timeout on libvirt.
> 

Interesting, the timeout in libvirt has always been 30 seconds for monitor to show up. In your testing from comment 17 the longest run was 5 seconds (if I understand it correctly and those 53.485 are summed up for all 10 runs of qemu-kvm and not per one run).

I wonder if there's something libvirt can do.

Comment 21 Eduardo Habkost 2016-08-31 14:23:30 UTC

(In reply to Michal Privoznik from comment #19)
> (In reply to Eduardo Habkost from comment #13)
> > The problem seems to be caused by the prealloc process taking too long, and
> > triggering a timeout on libvirt.
> > 
> 
> Interesting, the timeout in libvirt has always been 30 seconds for monitor
> to show up. In your testing from comment 17 the longest run was 5 seconds
> (if I understand it correctly and those 53.485 are summed up for all 10 runs
> of qemu-kvm and not per one run).

Actually it was the time of a single qemu-kvm run. In other words, this single command:

/usr/libexec/qemu-kvm -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=70000000000 -object adslkjdsaf

Takes 48 seconds to run.

> 
> I wonder if there's something libvirt can do.

I will investigate further and see if I can think of any simple workarounds.

Comment 23 Eduardo Habkost 2016-08-31 19:27:36 UTC

Bug bisected to:

commit f08f9271bfe3f19a5eb3d7a2f48532065304d5c8
Author: Daniel P. Berrange <berrange>
Date:   Wed May 13 17:14:04 2015 +0100

    vl: Create (most) objects before creating chardev backends
    
    Some types of object must be created before chardevs, other types of
    object must be created after chardevs. As such there is no option but
    to create objects in two phases.
    
    This takes the decision to create as many object types as possible
    right away before anyother backends are created, and only delay
    creation of those few which have an explicit dependency on the
    chardevs. Hopefully the set which need delaying will remain small
    over time.
    
    Signed-off-by: Daniel P. Berrange <berrange>
    Reviewed-by: Paolo Bonzini <pbonzini>
    Reviewed-by: Eric Blake <eblake>
    Signed-off-by: Andreas Färber <afaerber>


This makes the memory-backend objects be created before chardevs, triggering the libvirt timeout when memory prealloc takes too much time.

Comment 26 Eduardo Habkost 2016-08-31 20:18:57 UTC

Fix submitted to qemu-devel:

Subject: [PATCH] vl: Delay initialization of memory backends
Date: Wed, 31 Aug 2016 17:17:10 -0300
Message-Id: <1472674630-18886-1-git-send-email-ehabkost>

Comment 29 Eduardo Habkost 2016-09-01 15:06:04 UTC

(In reply to Eduardo Habkost from comment #26)
> Fix submitted to qemu-devel:
> 
> Subject: [PATCH] vl: Delay initialization of memory backends
> Date: Wed, 31 Aug 2016 17:17:10 -0300
> Message-Id: <1472674630-18886-1-git-send-email-ehabkost>

The fix breaks vhost-user because it depends on memory backends being already initialized. But some delayed -object classes must be initialized after netdevs. The initialization ordering requirements between multiple command-line options are messier than I thought.  :(

Comment 31 Eduardo Habkost 2016-09-02 14:22:08 UTC

(In reply to Eduardo Habkost from comment #29)
> (In reply to Eduardo Habkost from comment #26)
> > Fix submitted to qemu-devel:
> > 
> > Subject: [PATCH] vl: Delay initialization of memory backends
> > Date: Wed, 31 Aug 2016 17:17:10 -0300
> > Message-Id: <1472674630-18886-1-git-send-email-ehabkost>
> 
> The fix breaks vhost-user because it depends on memory backends being
> already initialized. But some delayed -object classes must be initialized
> after netdevs. The initialization ordering requirements between multiple
> command-line options are messier than I thought.  :(

Good news: the patch doesn't break vhost-user, exactly. It only broke vhost-user-test.

Bad news: the patch breaks vhost-user-test as a side-effect of fixing a memory region initialization bug under TCG. vhost-user-test worked by accident: it uses TCG, but vhost is not even supposed to work with TCG. See <https://www.mail-archive.com/qemu-devel@nongnu.org/msg394258.html>. I am looking for a way to fix vhost-user-test.

Comment 32 Eduardo Habkost 2016-09-02 19:04:55 UTC

New upstream patches submitted:

From: Eduardo Habkost <ehabkost>
Subject: [PATCH v2 0/2] Delay initialization of memory backends
Date: Fri,  2 Sep 2016 15:59:42 -0300
Message-Id: <1472842784-15399-1-git-send-email-ehabkost>

Comment 33 Eduardo Habkost 2016-09-02 19:19:55 UTC

(In reply to Eduardo Habkost from comment #32)
> New upstream patches submitted:
> 
> From: Eduardo Habkost <ehabkost>
> Subject: [PATCH v2 0/2] Delay initialization of memory backends
> Date: Fri,  2 Sep 2016 15:59:42 -0300
> Message-Id: <1472842784-15399-1-git-send-email-ehabkost>

Scratch build of qemu-kvm-rhev including the series: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11697667

Comment 37 Miroslav Rezanina 2016-09-09 11:25:33 UTC

Fix included in qemu-kvm-rhev-2.6.0-24.el7

Comment 39 Yumei Huang 2016-09-13 05:21:14 UTC

Reproduce:
qemu-kvm-rhev-2.6.0-22.el7
kernel-3.10.0-505.el7.x86_64
libvirt-2.0.0-8.el7.x86_64

Guest failed to boot with 200G memory backed by 2M hugepage binding to host nodes. 

Details:
# virsh dumpxml rhel-1
...
<memory unit='KiB'>209715200</memory>
  <currentMemory unit='KiB'>209715200</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <vcpu placement='static'>192</vcpu>
  <numatune>
    <memory mode='strict' nodeset='0-3'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='5'/>
    <memnode cellid='2' mode='strict' nodeset='2'/>
    <memnode cellid='3' mode='strict' nodeset='4'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='host-passthrough'>
    <numa>
      <cell id='0' cpus='0-47' memory='52428800' unit='KiB'/>
      <cell id='1' cpus='48-95' memory='52428800' unit='KiB'/>
      <cell id='2' cpus='96-143' memory='52428800' unit='KiB'/>
      <cell id='3' cpus='144-191' memory='52428800' unit='KiB'/>
    </numa>
  </cpu>
...

# virsh start rhel-1
error: Failed to start domain rhel-1

# cat /var/log/libvirt/qemu/rhel-1.log
...
2016-09-13 03:30:51.586+0000: starting up libvirt version: 2.0.0, package: 8.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-09-07-11:59:06, x86-020.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-22.el7), hostname: amd-6172-512-2.lab.eng.pek2.redhat.com
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=rhel-1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-4-rhel-1/master-key.aes -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu host -m 204800 -realtime mlock=off -smp 192,sockets=192,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=53687091200,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-47,memdev=ram-node0 -object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=53687091200,host-nodes=5,policy=bind -numa node,nodeid=1,cpus=48-95,memdev=ram-node1 -object memory-backend-file,id=ram-node2,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=53687091200,host-nodes=2,policy=bind -numa node,nodeid=2,cpus=96-143,memdev=ram-node2 -object memory-backend-file,id=ram-node3,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=53687091200,host-nodes=4,policy=bind -numa node,nodeid=3,cpus=144-191,memdev=ram-node3 -uuid 05a142dc-9ba8-4848-b8de-3b40a6ed4a73 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-4-rhel-1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/home/guest/rhel73.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
Domain id=4 is tainted: host-cpu
2016-09-13 03:31:21.609+0000: shutting down



Verify:
qemu-kvm-rhev-2.6.0-24.el7
kernel-3.10.0-505.el7.x86_64
libvirt-2.0.0-8.el7.x86_64

Guest could boot successfully.  So the bug is fixed.

Comment 41 errata-xmlrpc 2016-11-07 21:32:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html

Note You need to log in before you can comment on or make changes to this bug.