Bug 1263263

Summary:	qemu process can't start if memory nodeset excludes Numa Node 0
Product:	Red Hat Enterprise Linux 6	Reporter:	Martin Tessun <mtessun>
Component:	libvirt	Assignee:	Ján Tomko <jtomko>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	urgent	Docs Contact:	Jiri Herrmann <jherrman>
Priority:	urgent
Version:	6.7	CC:	dyuan, jkurik, jsuchane, jtomko, lhuang, obockows, rbalakri, rhodain, tdosek, vijaykumar.bisalahalli
Target Milestone:	rc	Keywords:	Regression, ZStream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	libvirt-0.10.2-55.el6	Doc Type:	Release Note
Doc Text:	Guests with strict NUMA pinning boot more reliably When starting a virtual machine configured with strict Non-Uniform Memory Access (NUMA) pinning, the KVM module could not allocate memory from the Direct Memory Access (DMA) zones if the NUMA nodes were not included in the configured limits set by the libvirt daemon. This led to a Quick Emulator (QEMU) process failure, which in turn prevented the guest from booting. With this update, the cgroup limits are applied after the KVM allocates the memory, and the QEMU process, as well as the guest, now starts as expected.	Story Points:	---
Clone Of:
Clones:	1265970 (view as bug list)		Environment:
Last Closed:	2016-05-10 19:25:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1265970, 1275757

Description Martin Tessun 2015-09-15 13:09:20 UTC

Description of problem:
A VM which is configured to run in a specific numa node isn't able to start any more, e.g.:
  <numatune>
    <memory mode='strict' nodeset='3'/>
  </numatune>

The startup results in the following message:
kvm_create_vcpu: Cannot allocate memory
Failed to create vCPU. Check the -smp parameter.

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.479.el6.x86_64
libvirt-0.10.2-54.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Have a system with at least 2 Numa Nodes
2. Configure a VM with strict NUMA pinning and excluding NUMA Node 0, e.g.
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>
3. Start the VM

Actual results:
The VM does not start with OOM condition for the vcpu threads:
kvm_create_vcpu: Cannot allocate memory
Failed to create vCPU. Check the -smp parameter.


Expected results:
VM should start, as it did with RHEL 6.6


Additional info:
As soon as Numa Node 0 is used, the VM does start.
I am currently trying to build a reproducer in house, but I think the recent changes for numaset to work correctly might have caused the issue, esp. the fix for BZ #1198645 might be causing this.

Comment 1 Martin Tessun 2015-09-15 13:24:40 UTC

Some test from my side:

[root@cisco-b200m1-01 ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14
node 0 size: 6080 MB
node 0 free: 5445 MB
node 1 cpus: 1 3 5 7 9 11 13 15
node 1 size: 6144 MB
node 1 free: 5595 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 
[root@cisco-b200m1-01 ~]# 


[root@cisco-b200m1-01 ~]# virsh dumpxml numa1
<domain type='kvm'>
  <name>numa1</name>
  <uuid>5b91caaf-7c02-0108-b5d5-12d1e3063403</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static' cpuset='1,3,5,7,9,11,13,15' current='2'>8</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>
  <os>
    <type arch='x86_64' machine='rhel6.6.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/libvirt/images/bsul0471.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:3a:c4:b7'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='tablet' bus='usb'/>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes'/>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </memballoon>
  </devices>
</domain>

[root@cisco-b200m1-01 ~]# 

So this one is running on NUMA Node 1 only.

Now start the VM:
[root@cisco-b200m1-01 ~]# virsh start numa1
error: Failed to start domain numa1
error: Unable to read from monitor: Connection reset by peer

[root@cisco-b200m1-01 ~]# 

The log shows:

2015-09-15 13:21:08.914+0000: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name numa1 -S -M rhel6.6.0 -enable-kvm -m 4096 -realtime mlock=off -smp 2,maxcpus=8,sockets=8,cores=1,threads=1 -uuid 5b91caaf-7c02-0108-b5d5-12d1e3063403 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/numa1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -drive file=/var/lib/libvirt/images/bsul0471.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3a:c4:b7,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
char device redirected to /dev/pts/2
kvm_create_vcpu: Cannot allocate memory
Failed to create vCPU. Check the -smp parameter.
2015-09-15 13:21:09.325+0000: shutting down


Doing a small change to Nodeset 0 works:
[root@cisco-b200m1-01 ~]# diff numa1.xml numa1_new.xml 
8c8
<     <memory mode='strict' nodeset='1'/>
---
>     <memory mode='strict' nodeset='0'/>

[root@cisco-b200m1-01 ~]# virsh start numa1
Domain numa1 started

[root@cisco-b200m1-01 ~]#

Comment 2 Martin Tessun 2015-09-15 14:21:33 UTC

I just tried downgrading qemu and libvirt to RHEL 6.6 versions.
Interestingly this did not change the behaviour.

So I completely downgraded my system to a RHEL 6.6 level and reran the tests.
this time all works as expected:

  <vcpu placement='static' cpuset='1,3,5,7,9,11,13,15' current='2'>8</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>

[root@cisco-b200m1-01 ~]# virsh start numa1
Domain numa1 started
[root@cisco-b200m1-01 ~]# rpm -q -a qemu\* libvirt\*
libvirt-java-devel-0.4.9-1.el6.noarch
libvirt-0.10.2-46.el6.x86_64
libvirt-devel-0.10.2-46.el6.x86_64
libvirt-python-0.10.2-46.el6.x86_64
qemu-kvm-0.12.1.2-2.445.el6.x86_64
libvirt-java-0.4.9-1.el6.noarch
qemu-img-0.12.1.2-2.445.el6.x86_64
libvirt-client-0.10.2-46.el6.x86_64
[root@cisco-b200m1-01 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.6 (Santiago)
[root@cisco-b200m1-01 ~]#

Comment 3 Martin Tessun 2015-09-15 15:05:20 UTC

Updating libvirt and qemu from RHEL 6.6 only, still kept the system working:

================================================================================================================================================================
 Package                                Arch                           Version                                      Repository                             Size
================================================================================================================================================================
Updating:
 libvirt                                x86_64                         0.10.2-54.el6                                beaker-Server                         2.4 M
 qemu-kvm                               x86_64                         2:0.12.1.2-2.479.el6                         beaker-Server                         1.6 M
Updating for dependencies:
 libvirt-client                         x86_64                         0.10.2-54.el6                                beaker-Server                         4.1 M
 libvirt-devel                          x86_64                         0.10.2-54.el6                                beaker-Server                         910 k
 libvirt-python                         x86_64                         0.10.2-54.el6                                beaker-Server                         500 k
 qemu-img                               x86_64                         2:0.12.1.2-2.479.el6                         beaker-Server                         830 k

Transaction Summary
================================================================================================================================================================


[root@cisco-b200m1-01 yum.repos.d]# service libvirtd restart
Stopping libvirtd daemon: [  OK  ]
Starting libvirtd daemon: [  OK  ]
[root@cisco-b200m1-01 yum.repos.d]# virsh start numa1
Domain numa1 started

[root@cisco-b200m1-01 yum.repos.d]# 

So next I also updated the kernel to a RHEL 6.7 kernel:

================================================================================================================================================================
 Package                                  Arch                            Version                                  Repository                              Size
================================================================================================================================================================
Installing:
 kernel                                   x86_64                          2.6.32-573.el6                           beaker-Server                           30 M
Updating for dependencies:
 kernel-firmware                          noarch                          2.6.32-573.el6                           beaker-Server                           18 M

Transaction Summary
================================================================================================================================================================
Install       1 Package(s)
Upgrade       1 Package(s)

After rebooting the system, the error showed up:

[root@cisco-b200m1-01 ~]# virsh start numa1
error: Failed to start domain numa1
error: internal error process exited while connecting to monitor: char device redirected to /dev/pts/1
kvm_create_vcpu: Cannot allocate memory
Failed to create vCPU. Check the -smp parameter.


[root@cisco-b200m1-01 ~]# 

So this looks to be a kernel issue and no libvirt/qemu issue then. Still doing some additional tests.

Comment 5 Luyao Huang 2015-09-16 06:21:42 UTC

Hi Jan,

I guess this issue looks like this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1206424

Wish this will help you

Thanks,
Luyao

Comment 6 Ján Tomko 2015-09-16 11:24:36 UTC

Yes, with the patch from bug 1206424 (or rather its RHEL 7 clone: 1010885) I am able to start the machine with 6.7 kernel:

commit 7e72ac787848b7434c9359a57c1e2789d92350f8
Author:     Martin Kletzander <mkletzan>
CommitDate: 2014-07-16 20:15:46 +0200

    qemu: leave restricting cpuset.mems after initialization
git describe: v1.2.6-176-g7e72ac7 contains: v1.2.7-rc1~91

As Martin Tessun discovered in comment 4, it seems the kernel change that prompted the libvirt patch was backported to kernel between versions 2.6.32-504.el6.x86_64 and 2.6.32-573.el6.x86_64.

Comment 18 Luyao Huang 2015-12-17 01:39:42 UTC

I can reproduce this issue with libvirt-0.10.2-54.el6.x86_64, kernel-2.6.32-573.8.1.el6.x86_64:

1. check DMA location:

# cat /proc/zoneinfo | grep DMA
Node 0, zone      DMA
Node 0, zone    DMA32
Node 1, zone    DMA32

2. prepare a guest which memory bind to node 3:

# virsh dumpxml test4
...
  <numatune>
    <memory mode='strict' nodeset='3'/>
  </numatune>
...

4. start guest:

# virsh start test4
error: Failed to start domain test4
error: internal error process exited while connecting to monitor: kvm_create_vcpu: Cannot allocate memory
Failed to create vCPU. Check the -smp parameter.


And verify this bug with libvirt-0.10.2-55.el6.x86_64, kernel-2.6.32-573.8.1.el6.x86_64:

1. check DMA location:

# cat /proc/zoneinfo | grep DMA
Node 0, zone      DMA
Node 0, zone    DMA32
Node 1, zone    DMA32

2. prepare a guest which memory bind to node 3:

# virsh dumpxml test4
...
  <numatune>
    <memory mode='strict' nodeset='3'/>
  </numatune>
...

3. start guest:

# virsh start test4
Domain test4 started


And test cpu hot-plug:

1.
# virsh vcpucount test4
maximum      config         5
maximum      live           5
current      config         4
current      live           4

# cgget -g cpuset /libvirt/qemu/test4
/libvirt/qemu/test4:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 3
cpuset.cpus: 0-23


2.

# virsh setvcpus test4 5

3. check cgroup:

# cgget -g cpuset /libvirt/qemu/test4/vcpu4
/libvirt/qemu/test4/vcpu4:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 3
cpuset.cpus: 2

Comment 22 errata-xmlrpc 2016-05-10 19:25:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0738.html

Comment 23 Vijay 2017-01-04 03:52:28 UTC

After upgrading libvirt to libvirt-0.10.2-60.el6.x86_64.rpm from libvirt-0.10.2-54.el6.x86_64.rpm on RHEL6.7 the issue has been resolved