Bug 1237220 - Fail to create NUMA guest with <nosharepages/>
Summary: Fail to create NUMA guest with <nosharepages/>
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Eduardo Habkost
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-30 14:44 UTC by Alex Williamson
Modified: 2015-12-04 16:48 UTC (History)
12 users (show)

Fixed In Version: qemu-kvm-rhev-2.3.0-13.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-04 16:48:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2546 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2015-12-04 21:11:56 UTC

Description Alex Williamson 2015-06-30 14:44:32 UTC
Description of problem:

Start with the following NUMA VM:

<domain type='kvm'>
  <name>2node0</name>
  <uuid>4a95fbdf-9fcf-4668-97da-5b7cdb0cc6c8</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.1.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='host-passthrough'>
    <numa>
      <cell id='0' cpus='0-1' memory='2097152' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='2097152' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <memballoon model='none'/>
  </devices>
</domain>

This allocates both nodes with no shared pages, locked on host node0.  This VM works, however if we want to split the guest nodes between host nodes, we need memnode attributes, so we make the following change:

@@ -9,7 +9,8 @@
   </memoryBacking>
   <vcpu placement='static'>4</vcpu>
   <numatune>
-    <memory mode='strict' nodeset='0'/>
+    <memnode cellid='0' mode='strict' nodeset='0'/>
+    <memnode cellid='1' mode='strict' nodeset='1'/>
   </numatune>
   <os>
     <type arch='x86_64' machine='pc-i440fx-rhel7.1.0'>hvm</type>

(nodeset can be the same to reproduce on a single node host)

This results in the following full XML:

<domain type='kvm'>
  <name>2node0-1</name>
  <uuid>abc65d75-e745-4ed0-8cf9-1b849c1f031f</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <numatune>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='1'/>
  </numatune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.1.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='host-passthrough'>
    <numa>
      <cell id='0' cpus='0-1' memory='2097152' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='2097152' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <memballoon model='none'/>
  </devices>
</domain>

(name and uuid also changed)

This VM definition will not start and gets the following error:

2015-06-30 14:39:57.013+0000: starting up libvirt version: 1.2.16, package: 1.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-06-04-04:03:16, x86-034.build.eng.bos.redhat.com), qemu version: 2.3.0 (qemu-kvm-rhev-2.3.0-6.el7)
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name 2node0-1 -S -machine pc-i440fx-rhel7.1.0,accel=kvm,usb=off,mem-merge=off -cpu host -m 4096 -realtime mlock=on -smp 4,sockets=4,cores=1,threads=1 -object memory-backend-ram,id=ram-node0,size=2147483648,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2147483648,host-nodes=1,policy=bind -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -uuid abc65d75-e745-4ed0-8cf9-1b849c1f031f -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/2node0-1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -msg timestamp=on
Domain id=37 is tainted: host-cpu
qemu-kvm: util/qemu-option.c:387: qemu_opt_get_bool_helper: Assertion `opt->desc && opt->desc->type == QEMU_OPT_BOOL' failed.
2015-06-30 14:40:03.938+0000: shutting down

gdb gives the following backtrace:

#0  0x00007ffff09905d7 in raise () from /lib64/libc.so.6
#1  0x00007ffff0991cc8 in abort () from /lib64/libc.so.6
#2  0x00007ffff0989546 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff09895f2 in __assert_fail () from /lib64/libc.so.6
#4  0x0000555555857e3f in qemu_opt_get_bool_helper (opts=0x5555561677e0, 
    name=name@entry=0x55555588502e "mem-merge", defval=defval@entry=true, 
    del=del@entry=false) at util/qemu-option.c:387
#5  0x000055555585818a in qemu_opt_get_bool (opts=<optimized out>, 
    name=name@entry=0x55555588502e "mem-merge", defval=defval@entry=true)
    at util/qemu-option.c:397
#6  0x00005555556f2b04 in host_memory_backend_init (obj=0x555556179a90)
    at backends/hostmem.c:234
#7  0x00005555557a2d39 in object_init_with_type (obj=0x555556179a90, 
    ti=0x55555613ac80) at qom/object.c:309
#8  0x00005555557a31ef in object_initialize_with_type (
    data=data@entry=0x555556179a90, size=<optimized out>, 
    type=type@entry=0x55555613ac80) at qom/object.c:343
#9  0x00005555557a3341 in object_new_with_type (type=0x55555613ac80)
    at qom/object.c:429
#10 0x00005555557a33b5 in object_new (
    typename=typename@entry=0x555556179730 "memory-backend-ram")
    at qom/object.c:439
#11 0x00005555556e1db5 in object_add (
    type=0x555556179730 "memory-backend-ram", id=0x5555561796c0 "ram-node0", 
    qdict=qdict@entry=0x5555561783b0, v=0x555556179850, 
    errp=errp@entry=0x7fffffffdbc8) at qmp.c:643
#12 0x00005555556cf144 in object_create (opts=<optimized out>, 
    opaque=<optimized out>) at vl.c:2632
#13 0x0000555555858edb in qemu_opts_foreach (list=<optimized out>, 
    func=func@entry=0x5555556cefa0 <object_create>, opaque=opaque@entry=0x0, 
    abort_on_failure=abort_on_failure@entry=0) at util/qemu-option.c:1059
#14 0x00005555555e0857 in main (argc=<optimized out>, argv=<optimized out>, 
    envp=<optimized out>) at vl.c:4040

This implicates the mem-merge option, which is controlled via <nosharepages/>

Removing the option from the XML allows the VM to start, but should not be necessary.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.3.0-6.el7.x86_64
libvirt-1.2.16-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Define the above VMs
2.
3.

Actual results:
Cannot use <nosharepages/> in combination with //numatune/memnode

Expected results:
//numatune/memnode should be compatible with <nosharepages/>

Additional info:

Comment 1 Alex Williamson 2015-06-30 15:02:02 UTC
This works with qemu-kvm-rhev-2.1.2-23.el7_1.4.x86_64 therefore adding regression keyword

Comment 4 Alex Williamson 2015-06-30 17:26:02 UTC
Here's the history afaict, 2.1.0 was broken wrt numa pinning and fixed by:

commit 288d3322022d6ad646407f3ca6f1a6a746565b9a
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Wed Aug 13 13:50:24 2014 +0200

    hostmem: set MPOL_MF_MOVE
    
    When memory is allocated on a wrong node, MPOL_MF_STRICT
    doesn't move it - it just fails the allocation.
    A simple way to reproduce the failure is with mlock=on
    realtime feature.
    
    The code comment actually says: "ensure policy won't be ignored"
    so setting MPOL_MF_MOVE seems like a better way to do this.
    
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

This was part of v2.1.1 and therefore included in our v2.1.2

The VM described in comment 0 continued to work until:

commit 49d2e648e8087d154d8bf8b91f27c8e05e79d5a6
Author: Marcel Apfelbaum <marcel.a@redhat.com>
Date:   Tue Dec 16 16:58:05 2014 +0000

    machine: remove qemu_machine_opts global list
    
    QEMU has support for options per machine, keeping
    a global list of options is no longer necessary.
    
    Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
    Reviewed-by: Alexander Graf <agraf@suse.de>
    Reviewed-by: Greg Bellows <greg.bellows@linaro.org>
    Message-id: 1418217570-15517-2-git-send-email-marcel.a@redhat.com
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

At which point libvirt would error with:

error: Failed to start domain 2node0-1
error: unsupported configuration: disable shared memory is not available with this QEMU binary

An attempt was made to fix this in:

commit 0a7cf217d81161e36af2344e911d56d4f9fef9c5
Author: Marcel Apfelbaum <marcel@redhat.com>
Date:   Wed Apr 1 19:47:21 2015 +0300

    util/qemu-config: fix regression of qmp_query_command_line_options
    
    Commit 49d2e64 (machine: remove qemu_machine_opts global list)
    made machine options specific to machine sub-type, leaving
    the qemu_machine_opts desc array empty. Sadly this is the place
    qmp_query_command_line_options is looking for supported options.
    
    As a fix for for 2.3 the machine_qemu_opts (the generic ones)
    are restored only for qemu-config scope.
    We need to find a better fix for 2.4.
    
    Reported-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
    Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
    Message-Id: <1427906841-1576-1-git-send-email-marcel@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

But this just gets us to the current situation where QEMU v2.3 is broken, as well as current upstream, with the following error:

error: Failed to start domain 2node0-1
error: internal error: process exited while connecting to monitor: qemu-system-x86_64: util/qemu-option.c:387: qemu_opt_get_bool_helper: Assertion `opt->desc && opt->desc->type == QEMU_OPT_BOOL' failed.

Comment 5 FuXiangChun 2015-07-01 05:37:15 UTC
Hi Alex.

QE found mem-merge=off cause this bug.  If use mem-merge=on(this is default value) qemu-kvm process works.  Before, QE tested "-object memory-backend-ram" with mem-merge's default value.  Do QE need to cover this two scenarios(mem-merge=on|off)? If need, I will add this scenario to test plan and test case. Thanks.



from qemu-kvm manual

mem-merge=on|off
        Enables or disables memory merge support. This feature, when supported  by the host, de-duplicates identical memory
        pages among VMs instances (enabled by default)

Comment 6 Alex Williamson 2015-07-01 15:21:53 UTC
AFAICT it's a supported option and something a user might reasonably turn on, especially if they're using hugepages or device assignment.

Comment 7 FuXiangChun 2015-07-06 08:31:50 UTC
(In reply to Alex Williamson from comment #6)
> AFAICT it's a supported option and something a user might reasonably turn
> on, especially if they're using hugepages or device assignment.

Got it, the default should be on according to "man qemu-kvm" but I would like to double confirm with you, it's right? (QE can not find this option via 'info qtree' on HMP)

Comment 8 Alex Williamson 2015-07-06 12:29:13 UTC
(In reply to FuXiangChun from comment #7)
> (In reply to Alex Williamson from comment #6)
> > AFAICT it's a supported option and something a user might reasonably turn
> > on, especially if they're using hugepages or device assignment.
> 
> Got it, the default should be on according to "man qemu-kvm" but I would
> like to double confirm with you, it's right? (QE can not find this option
> via 'info qtree' on HMP)

Yes, the <nosharepages/> libvirt XML option translates to mem-merge=off on the QEMU commandline.  The default is 'on'.

Comment 10 Miroslav Rezanina 2015-07-24 11:08:53 UTC
Fix included in qemu-kvm-rhev-2.3.0-13.el7

Comment 12 Pei Zhang 2015-07-28 10:44:18 UTC
summary:
I re-tested this bug with qemu-kvm-rhev-2.3.0-13.el7.x86_64. And the results prove that this bug has been fixed well. 


detail(mem-merge=off and mem-merge=on both are OK):
#/usr/libexec/qemu-kvm -name 2node0-1 -machine pc-i440fx-rhel7.1.0,accel=kvm,usb=off,mem-merge=off -cpu host -m 4096 -realtime mlock=on -smp 4,sockets=4,cores=1,threads=1 -object memory-backend-ram,id=ram-node0,size=2147483648,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2147483648,host-nodes=1,policy=bind -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -uuid abc65d75-e745-4ed0-8cf9-1b849c1f031f -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/2node0-1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -msg timestamp=on -monitor stdio
QEMU 2.3.0 monitor - type 'help' for more information
(qemu) info status
VM status: running

# /usr/libexec/qemu-kvm -name 2node0-1 -machine pc-i440fx-rhel7.1.0,accel=kvm,usb=off,mem-merge=on -cpu host -m 4096 -realtime mlock=on -smp 4,sockets=4,cores=1,threads=1 -object memory-backend-ram,id=ram-node0,size=2147483648,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2147483648,host-nodes=1,policy=bind -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -uuid abc65d75-e745-4ed0-8cf9-1b849c1f031f -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/2node0-1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -msg timestamp=on -monitor stdio
QEMU 2.3.0 monitor - type 'help' for more information
(qemu) info status
VM status: running

Comment 13 juzhang 2015-08-03 04:15:11 UTC
According to comment12, set this issue as verified.

Comment 15 errata-xmlrpc 2015-12-04 16:48:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html


Note You need to log in before you can comment on or make changes to this bug.