Bug 869096

Summary: Vcpuinfo don't return numa's CPU Affinity properly on mutiple numa node's machine
Product: Red Hat Enterprise Linux 6 Reporter: Luwen Su <lsu>
Component: libvirtAssignee: Osier Yang <jyang>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: acathrow, dallan, dyasny, dyuan, gsun, mzhan, tlavigne
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.10.2-6.el6 Doc Type: Bug Fix
Doc Text:
Cause: Libvirt supported "emulatorpin" to set the CPU affinity for qemu domain process. However, it overrides the CPU affinity set by vcpu "auto" placement when creating cgroup for the domain process. And the CPU affinity set by vcpu "auto" placement uses the advisory nodeset from numad. Consequence: The "auto" placement doesn't work for qemu domain process anymore. I.E. It breaks the numad support. Fix: Inherit the nodeset from numad if the vcpu placement is "auto", to set the affinity when creating cgroup for the domain process. And to avoid the corruption, don't allow to change the domain process's CPU affinity via API virDomainPinEmulator if the vcpu placement is "auto". Result: The vcpu "auto" placement with numad support works again.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:10:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Luwen Su 2012-10-23 02:49:50 UTC
The command vcpuinfo don't show  CPU Affinity properly.
And i'm not sure if there is some selinux configuration need change because of the guest log below
virSecuritySELinuxSetSecurityProcessLabel:1569 : label=unconfined_u:system_r:svirt_t:s0:c223,c792
virGetUserIDByName:2561 : User record for user '107' does not exist
virGetGroupIDByName:2643 : Group record for group '107' does not exist

The function works well in libvirt 0.9.10-21

Version-Release number of selected component (if applicable):
libvirt-0.10.2-4.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.317.el6.x86_64
kernel-2.6.32-323.el6.x86_64
numad-0.5-3.20120316git.el6.x86_64


How reproducible:
100%


Steps to Reproduce:
1.
# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 2 4 6 8 10
node 0 size: 10205 MB
node 0 free: 9499 MB
node 1 cpus: 12 14 16 18 20 22
node 1 size: 8192 MB
node 1 free: 7826 MB
node 2 cpus: 1 3 5 7 9 11
node 2 size: 6144 MB
node 2 free: 5912 MB
node 3 cpus: 13 15 17 19 21 23
node 3 size: 8175 MB
node 3 free: 7538 MB
node distances:
node   0   1   2   3
  0:  10  20  20  20
  1:  20  10  20  20
  2:  20  20  10  20
  3:  20  20  20  10



2.
# virsh start test
 <vcpu placement='auto'>10</vcpu>
  <numatune>
    <memory mode='strict' placement='auto'/>
  </numatune>

3.
#  cat /var/log/libvirt/libvirtd.log| grep Nodeset
2012-10-22 10:28:04.603+0000: 7870: debug : qemuProcessStart:3569 : Nodeset returned from numad: 0,3

#cat /var/log/libvirt/qemu/test.log
2012-10-22 10:28:04.718+0000: 7997: debug : qemuProcessInitCpuAffinity:1895 : Setting CPU affinity
2012-10-22 10:28:04.725+0000: 7997: debug : qemuProcessInitCpuAffinity:1915 : Set CPU affinity with advisory nodeset from numad
2012-10-22 10:28:04.726+0000: 7997: debug : qemuProcessInitNumaMemoryPolicy:1765 : Set NUMA memory policy with advisory nodeset from numad
2012-10-22 10:28:04.726+0000: 7997: debug : qemuProcessHook:2675 : Setting up security labelling
2012-10-22 10:28:04.726+0000: 7997: debug : virSecuritySELinuxSetSecurityProcessLabel:1569 : label=unconfined_u:system_r:svirt_t:s0:c223,c792
2012-10-22 10:28:04.727+0000: 7997: debug : virGetUserIDByName:2561 : User record for user '107' does not exist
2012-10-22 10:28:04.727+0000: 7997: debug : virGetGroupIDByName:2643 : Group record for group '107' does not exist
2012-10-22 10:28:04.727+0000: 7997: debug : virSecurityDACSetProcessLabel:869 : Dropping privileges of DEF to 107:107
2012-10-22 10:28:04.728+0000: 7997: debug : qemuProcessHook:2682 : Hook complete ret=0

#cat /proc/`pidof qemu-kvm`/status
Cpus_allowed:        ffffff
Cpus_allowed_list:        0-23
Mems_allowed:        00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000009
Mems_allowed_list:        0,3

#virsh vcpuinfo test
VCPU:           9
CPU:            2
State:          running
CPU time:       3.3s
CPU Affinity:   yyyyyyyyyyyyyyyyyyyyyyyy------------------------

4.run compare script
#! /bin/sh
for i in {0,3}; do numactl --hardware | grep "node $i cpus:" >> cpus; done
cat cpus | awk -F':' '{print $2}' > cpus2
for i in {0..23}; do
    if grep "\b$i\b" cpus2 > /dev/null; then
        echo -n "y"
    else
        echo -n "-"
    fi
 done
echo > cpus
  
sh compare.sh
y-y-y-y-y-y--y-y-y-y-y-y

Actual results:
CPU Affinity not right

Expected results:
CPU Affinity should show right

Additional info:
In 0.9.10-21 , all the result are expected.
Guest log:
2012-10-22 10:42:11.054+0000: 9158: debug : qemuProcessInitCpuAffinity:1731 : Setting CPU affinity
2012-10-22 10:42:11.059+0000: 9158: debug : qemuProcessInitCpuAffinity:1749 : Set CPU affinity with advisory nodeset from numad
2012-10-22 10:42:11.059+0000: 9158: debug : qemuProcessInitNumaMemoryPolicy:1599 : Set NUMA memory policy with advisory nodeset from numad
2012-10-22 10:42:11.059+0000: 9158: debug : qemuProcessHook:2512 : Setting up security labelling
2012-10-22 10:42:11.059+0000: 9158: debug : virSecurityDACSetProcessLabel:637 : Dropping privileges of DEF to 107:107
2012-10-22 10:42:11.060+0000: 9158: debug : qemuProcessHook:2519 : Hook complete ret=0

#virsh vcpuinfo test
...
VCPU:           9
CPU:            2
State:          running
CPU time:       3.9s
CPU Affinity:   y-y-y-y-y-y--y-y-y-y-y-y

Comment 5 Wayne Sun 2012-10-30 09:02:17 UTC
pkgs:
libvirt-0.10.2-6.el6.x86_64
kernel-2.6.32-330.el6.x86_64
qemu-kvm-0.12.1.2-2.316.el6.x86_64

steps:
1. check host numa node
# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65514 MB
node 0 free: 62387 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB
node 1 free: 61425 MB
node distances:
node   0   1 
  0:  10  11 
  1:  11  10 

2. start a domain with placement as auto.
# virsh start libvirt_test_api
Domain libvirt_test_api started

# virsh dumpxml libvirt_test_api
...
  <vcpu placement='auto'>2</vcpu>
  <numatune>
    <memory mode='strict' placement='auto'/>
  </numatune>
...

3. check log
# grep Nodeset /var/log/libvirt/libvirtd.log
2012-10-30 07:53:56.438+0000: 28878: debug : qemuProcessStart:3605 : Nodeset returned from numad: 0

# grep numad /var/log/libvirt/qemu/libvirt_test_api.log
2012-10-30 07:53:56.564+0000: 6159: debug : qemuProcessInitCpuAffinity:1960 : Set CPU affinity with advisory nodeset from numad
2012-10-30 07:53:56.564+0000: 6159: debug : qemuProcessInitNumaMemoryPolicy:1779 : Set NUMA memory policy with advisory nodeset from numad

4. check vcpuinfo
# virsh vcpuinfo libvirt_test_api
VCPU:           0
CPU:            21
State:          running
CPU time:       10.9s
CPU Affinity:   yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

VCPU:           1
CPU:            0
State:          running
CPU time:       5.4s
CPU Affinity:   yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

5. check process
# ps aux|grep qemu
qemu      8441 36.1  0.0 1396368 26200 ?       Sl   16:26   0:02 /usr/libexec/qemu-kvm -name libvirt_test_api -S -M rhel6.4.0 -enable-kvm -m 1024 -smp 2,sockets=2,cores=1,threads=1 -uuid 05867c1a-afeb-300e-e55e-2673391ae080 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/libvirt_test_api.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/libvirt-test-api,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=54:52:00:45:c3:8a,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:1 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

# pstree -apnh 8441
qemu-kvm,8441 -name libvirt_test_api -S -M rhel6.4.0 -enable-kvm -m 1024 -smp 2,sockets=2,cores=1,threads=1 -uuid 05867c1a-afeb-300e-e55e-2673391ae080 -nodefconfig -nodefaults -chardevsocket,id=charmonitor,path
  ├─{qemu-kvm},8461
  └─{qemu-kvm},8462

# grep Cpus_allowed_list /proc/8441/status 
Cpus_allowed_list:	0-7,16-23

The domain main process cpus allowed list value is as expected.
# sh test.sh 
yyyyyyyy--------yyyyyyyy--------


# grep Cpus_allowed_list /proc/8461/status 
Cpus_allowed_list:	0-31
# grep Cpus_allowed_list /proc/8462/status 
Cpus_allowed_list:	0-31

The guest vcpu processes is untouched and pin to all available host cpus, so the vcpuinfo shows the right info.

6. do vcpupin
# virsh vcpupin libvirt_test_api 1 2

# virsh vcpupin libvirt_test_api 0 22

# virsh vcpuinfo libvirt_test_api
VCPU:           0
CPU:            22
State:          running
CPU time:       11.0s
CPU Affinity:   ----------------------y---------

VCPU:           1
CPU:            2
State:          running
CPU time:       5.5s
CPU Affinity:   --y-----------------------------


It's as expected, although vcpupin will not show up in xml since it can not show up with numatune at same time, but as emulator main process is split up with guest vcpu process, this also should be split up then. This problem is another bug which osier will help to file.

As steps show here, this is working now.

also tested on libvirt-0.10.2-4.el6.x86_64, the problem exist on it.

Comment 6 errata-xmlrpc 2013-02-21 07:10:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html