Bug 1162947

Summary: emulatorpin is not consistent with cgroup in NUMA host
Product: Red Hat Enterprise Linux 7 Reporter: Jincheng Miao <jmiao>
Component: libvirtAssignee: Martin Kletzander <mkletzan>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: dyuan, honzhang, lhuang, mzhan, rbalakri
Target Milestone: rcKeywords: Upstream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.2.17-5.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 05:55:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jincheng Miao 2014-11-12 03:46:07 UTC
Description of problem:
In NUMA host, if set vcpu placement to 'auto', libvirtd will follow numad's suggestion to pin vcpus.
And emulator thread of guest will also be pinned to the querying result of numad.
But virsh command 'emulatorpin' returns a different value.

version:
libvirt-1.2.8-6.el7.x86_64
qemu-kvm-1.5.3-77.el7.x86_64
kernel-3.10.0-195.el7.x86_64

How reproducible:
100%

Step to reproduce:
1. start a guest in NUMA with auto placement
# virsh dumpxml a
...
  <vcpu placement='auto' current='2'>4</vcpu>
  <numatune>
    <memory mode='strict' placement='auto'/>
  </numatune>
...

# virsh start a
Domain a started

2.  check emulatorpin
# virsh emulatorpin a
emulator: CPU Affinity
----------------------------------
       *: 0-31

# cat /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2da.scope/emulator/cpuset.cpus
0-7,16-23


Expect result:
the value from cgroup and virsh command 'emulatorpin' shoud be same.

Comment 1 Jincheng Miao 2014-11-12 09:52:26 UTC
Furthermore, the vcpu pinning is also wrong, like:

# virsh vcpupin r70
VCPU: CPU Affinity
----------------------------------
   0: 9
   1: 0-31
   2: 0-31
   3: 0-31

# cat /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2dr70.scope/vcpu1/cpuset.cpus 
8-15,24-31

Comment 3 Martin Kletzander 2015-07-26 18:41:39 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2015-July/msg01022.html

Comment 4 Martin Kletzander 2015-08-13 13:13:20 UTC
Fixed upstream by v1.2.18-113-g92ddffdbd3c9 -- v1.2.18-116-g776924e37649:

commit 92ddffdbd3c91d99f8f7ed9b661388a2c5d36cc2
Author: Martin Kletzander <mkletzan>
Date:   Thu Aug 13 10:03:12 2015 +0200

    qemu: Fix segfault when parsing private domain data

commit 7c8028cda95c3af388f7485e682ed07629bb9e7a
Author: Martin Kletzander <mkletzan>
Date:   Fri Jul 24 19:35:00 2015 +0200

    conf: Pass private data to Parse function of XML options

commit 8ce86722d78d8b2a1e7d9cb29571beb791c9f3d7
Author: Martin Kletzander <mkletzan>
Date:   Fri Jul 24 16:06:33 2015 +0200

    qemu: Keep numad hint after daemon restart

commit 776924e37649f2d47acd805746d5fd9325212ea5
Author: Martin Kletzander <mkletzan>
Date:   Sun Jul 26 18:49:02 2015 +0200

    qemu: Use numad information when getting pin information

Comment 7 Luyao Huang 2015-08-18 08:25:34 UTC
Verify this bug with libvirt-1.2.17-5.el7.x86_64:

1. prepare a guest in numa host:

# virsh dumpxml rhel7.0-rhel
...
  <vcpu placement='auto'>4</vcpu>
  <iothreads>2</iothreads>
  <numatune>
    <memory mode='strict' placement='auto'/>
  </numatune>
...

# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65514 MB
node 0 free: 59248 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB
node 1 free: 56962 MB
node distances:
node   0   1 
  0:  10  11 
  1:  11  10 

2. start the guest and recheck the emulatorpin/vcpupin/iothreadpin :

Test with emulatorpin:

# virsh emulatorpin rhel7.0-rhel
emulator: CPU Affinity
----------------------------------
       *: 8-15,24-31

# cgget -g cpuset /machine.slice/machine-qemu\\x2drhel7.0\\x2drhel.scope/emulator
/machine.slice/machine-qemu\x2drhel7.0\x2drhel.scope/emulator:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 1
cpuset.cpus: 8-15,24-31

# taskset --cpu-list -p 28198
pid 28198's current affinity list: 8-15,24-31


Test with vcpupin:

# virsh vcpupin rhel7.0-rhel
VCPU: CPU Affinity
----------------------------------
   0: 8-15,24-31
   1: 8-15,24-31
   2: 8-15,24-31
   3: 8-15,24-31

# cgget -g cpuset /machine.slice/machine-qemu\\x2drhel7.0\\x2drhel.scope/vcpu0
/machine.slice/machine-qemu\x2drhel7.0\x2drhel.scope/vcpu0:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 1
cpuset.cpus: 8-15,24-31

# virsh qemu-monitor-command rhel7.0-rhel --hmp info cpus
* CPU #0: pc=0xffffffff81055e06 (halted) thread_id=28204
  CPU #1: pc=0xffffffff81055e06 (halted) thread_id=28205
  CPU #2: pc=0xffffffff81055e06 (halted) thread_id=28206
  CPU #3: pc=0xffffffff81055e06 (halted) thread_id=28207


# taskset --cpu-list -p 28204
pid 28204's current affinity list: 8-15,24-31


Test iothread:

# virsh iothreadinfo rhel7.0-rhel
 IOThread ID     CPU Affinity   
---------------------------------------------------
 1               8-15,24-31
 2               8-15,24-31

# cgget -g cpuset /machine.slice/machine-qemu\\x2drhel7.0\\x2drhel.scope/iothread1
/machine.slice/machine-qemu\x2drhel7.0\x2drhel.scope/iothread1:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 1
cpuset.cpus: 8-15,24-31

# virsh qemu-monitor-command rhel7.0-rhel --pretty '{"execute":"query-iothreads"}'
{
    "return": [
        {
            "thread-id": 28200,
            "id": "iothread1"
        },
        {
            "thread-id": 28201,
            "id": "iothread2"
        }
    ],
    "id": "libvirt-101"
}

# taskset --cpu-list -p 28200
pid 28200's current affinity list: 8-15,24-31

3. check the status file in /run/libvirt/qemu/$guestname.xml

# cat /run/libvirt/qemu/rhel7.0-rhel.xml |grep -3 nodeset
    <device alias='virtio-serial0'/>
    <device alias='usb'/>
  </devices>
  <numad nodeset='1'/>
  <domain type='kvm' id='46'>
    <name>rhel7.0-rhel</name>
    <uuid>67c7a123-5415-4136-af62-a2ee098ba6cd</uuid>

4. restart libvirtd and recheck the emulatorpin/iothreadpin/vcpupin:

# service libvirtd restart
Redirecting to /bin/systemctl restart  libvirtd.service


# virsh iothreadinfo rhel7.0-rhel
 IOThread ID     CPU Affinity   
---------------------------------------------------
 1               8-15,24-31
 2               8-15,24-31

# virsh vcpupin rhel7.0-rhel
VCPU: CPU Affinity
----------------------------------
   0: 8-15,24-31
   1: 8-15,24-31
   2: 8-15,24-31
   3: 8-15,24-31

# virsh emulatorpin rhel7.0-rhel
emulator: CPU Affinity
----------------------------------
       *: 8-15,24-31

Comment 9 errata-xmlrpc 2015-11-19 05:55:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html