Bug 1265970

Summary: qemu process can't start if memory nodeset excludes Numa Node 0
Product: Red Hat Enterprise Linux 6 Reporter: Jan Kurik <jkurik>
Component: libvirtAssignee: Ján Tomko <jtomko>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.7CC: dyuan, jdenemar, jsuchane, jtomko, lhuang, mtessun, rbalakri, rhodain, tdosek
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.10.2-54.el6_7.2 Doc Type: Bug Fix
Doc Text:
Previously, the cgroup memory node limits were applied to allocations in a KVM. As a consequence, the KVM module could not allocate memory from the Direct Memory Access (DMA) zones unless the nodes were included in the limits set by the libvirt library, which led to a Quick Emulator (QEMU) process failure. With this update, the cgroup limits are applied after the KVM allocates the memory, and the QEMU process now starts as expected.
Story Points: ---
Clone Of: 1263263 Environment:
Last Closed: 2015-11-10 09:14:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1263263    
Bug Blocks:    

Description Jan Kurik 2015-09-24 08:44:00 UTC
This bug has been copied from bug #1263263 and has been proposed
to be backported to 6.7 z-stream (EUS).

Comment 9 Luyao Huang 2015-10-12 07:22:19 UTC
I can reproduce this issue with libvirt-0.10.2-54.el6.x86_64:

1. prepare a rhel6 host which DMA in node 0:

# cat /proc/zoneinfo |grep DMA
Node 0, zone      DMA
Node 0, zone    DMA32

# numactl --har
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 65514 MB
node 0 free: 63279 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 65536 MB
node 1 free: 63567 MB
node distances:
node   0   1 
  0:  10  11 
  1:  11  10 

2. prepare a guest which bind memory to node 1:

# virsh dumpxml r6
<domain type='kvm'>
  <name>r6</name>
  <uuid>63b566d4-40e9-4152-b784-f46cc953abb0</uuid>
  <memory unit='KiB'>40240000</memory>
  <currentMemory unit='KiB'>30240000</currentMemory>
  <vcpu placement='static' current='2'>4</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>


3. start it:

# virsh start r6
error: Failed to start domain r6
error: internal error process exited while connecting to monitor: char device redirected to /dev/pts/2
char device redirected to /dev/pts/3
kvm_create_vcpu: Cannot allocate memory
Failed to create vCPU. Check the -smp parameter.


And try to verify this issue with libvirt-0.10.2-54.el6_7.1.x86_64:


1. prepare a rhel6 host which DMA in node 0:

# cat /proc/zoneinfo |grep DMA
Node 0, zone      DMA
Node 0, zone    DMA32

# numactl --har
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 65514 MB
node 0 free: 63279 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 65536 MB
node 1 free: 63567 MB
node distances:
node   0   1 
  0:  10  11 
  1:  11  10 

2. prepare a guest which bind memory to node 1:

# virsh dumpxml r6
<domain type='kvm'>
  <name>r6</name>
  <uuid>63b566d4-40e9-4152-b784-f46cc953abb0</uuid>
  <memory unit='KiB'>40240000</memory>
  <currentMemory unit='KiB'>30240000</currentMemory>
  <vcpu placement='static' current='2'>4</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>


3.  start it:

# virsh start r6
Domain r6 started

# virsh numatune r6
numa_mode      : strict
numa_nodeset   : 1

4. but there is a problem with vcpu hot-plug:

# virsh setvcpus r6 3
error: Unable to read from monitor: Connection reset by peer

guest log like this:

kvm_create_vcpu: Cannot allocate memory
Failed to create vCPU. Check the -smp parameter.
2015-10-12 07:17:36.701+0000: shutting down

And there is a bug in rhel7 before:

https://bugzilla.redhat.com/show_bug.cgi?id=1161540

Comment 10 Luyao Huang 2015-10-12 07:25:25 UTC
Hi Jan,

Would you please help to check the left issue in comment 9, is that worth fix in rhel6 ? should we open a new bug for that issue ? Thanks a lot for your reply.

Luyao

Comment 11 Ján Tomko 2015-10-12 11:24:45 UTC
We should fix CPU hotplug as well (especially if the attempt kills the domain).
Let's try to backport the additional patches.

Comment 17 Luyao Huang 2015-10-16 02:20:27 UTC
verify this bug with libvirt-0.10.2-54.el6_7.2:

1. prepare a guest which memory bind to host node 1 (and DMA is not in node 1):

# cat /proc/zoneinfo |grep DMA
Node 0, zone      DMA
Node 0, zone    DMA32

# virsh dumpxml r6
<domain type='kvm'>
  <name>r6</name>
  <uuid>63b566d4-40e9-4152-b784-f46cc953abb0</uuid>
  <memory unit='KiB'>4024000</memory>
  <currentMemory unit='KiB'>3024000</currentMemory>
  <vcpu placement='static' current='2'>4</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>


2. start guest:

# virsh start r6
Domain r6 started

3. try to hot-plug vcpu:

# virsh setvcpus r6 3


4. check the vcpu number in guest

IN GUEST:

# lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                3
On-line CPU(s) list:   0-2
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             3
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Stepping:              1
CPU MHz:               1994.984
BogoMIPS:              3989.96
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
NUMA node0 CPU(s):     0-2

5. check the cgroup:

# lscgroup 
cpuset:/
cpuset:/libvirt
cpuset:/libvirt/lxc
cpuset:/libvirt/qemu
cpuset:/libvirt/qemu/r6
cpuset:/libvirt/qemu/r6/vcpu2
cpuset:/libvirt/qemu/r6/emulator
cpuset:/libvirt/qemu/r6/vcpu1
cpuset:/libvirt/qemu/r6/vcpu0
...

# cgget -g cpuset /libvirt/qemu/r6/vcpu2
/libvirt/qemu/r6/vcpu2:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 1
cpuset.cpus: 0-15

# cgget -g cpuset /libvirt/qemu/r6
/libvirt/qemu/r6:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 1
cpuset.cpus: 0-15

# cgget -g cpuset /libvirt/qemu/r6/emulator
/libvirt/qemu/r6/emulator:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 1
cpuset.cpus: 0-15


And test with disable cpuset in qemu.conf:
1.
# vim /etc/libvirt/qemu.conf
cgroup_controllers = [ "cpu", "devices", "memory", "blkio", "cpuacct" ]

# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

2.
# virsh dumpxml r6
<domain type='kvm'>
  <name>r6</name>
  <uuid>63b566d4-40e9-4152-b784-f46cc953abb0</uuid>
  <memory unit='KiB'>4024000</memory>
  <currentMemory unit='KiB'>3024000</currentMemory>
  <vcpu placement='static' current='2'>4</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>

3.

# virsh start r6
Domain r6 started

4. 
# virsh setvcpus r6 3

5.
IN GUEST:

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                3
On-line CPU(s) list:   0-2
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             3
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Stepping:              1
CPU MHz:               1994.984
BogoMIPS:              3989.96
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
NUMA node0 CPU(s):     0-2

Comment 21 errata-xmlrpc 2015-11-10 09:14:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2000.html