Red Hat Bugzilla – Bug 810157
numad: Pre-set memory policy and convert nodeset from numad to CPUs list before affinity setting
Last modified: 2012-06-20 02:52:01 EDT
Description of problem: numad's document was a bit confused, and libvirt expects the CPUs list, but not node list, so there are two ways to fix the problem 1) numad returns CPUs list instead, 2) libvirt converts the node list into CPUs list, and numad updates the doc. We had agreement to go forward with 2). Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: For a box which has n nodes, and m (let's suppose m = 16 * n) CPUs, the domain process will be pinned to part of CPU0...CPUn, which definitely will cause significient low performance. Expected results: Additional info:
pkgs: libvirt-0.9.10-12.el6.x86_64 numad-0.5-3.20120316git.el6.x86_64 kernel-2.6.32-250.el6.x86_64 qemu-kvm-0.12.1.2-2.275.el6.x86_64 steps 1. prepare a domain with vcpu parts set auto placement: # virsh dumpxml rhel6u3-501|grep vcpu <vcpu placement='auto'>24</vcpu> 2. check domain vcpuinfo # virsh start rhel6u3-501 # virsh vcpuinfo rhel6u3-501 VCPU: 0 CPU: 2 State: running CPU time: 1.5s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 1 CPU: 39 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 2 CPU: 36 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 3 CPU: 12 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 4 CPU: 32 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 5 CPU: 0 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 6 CPU: 52 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 7 CPU: 33 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 8 CPU: 1 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 9 CPU: 4 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 10 CPU: 12 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 11 CPU: 48 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 12 CPU: 2 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 13 CPU: 37 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 14 CPU: 9 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 15 CPU: 41 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 16 CPU: 56 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 17 CPU: 8 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 18 CPU: 40 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 19 CPU: 13 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 20 CPU: 44 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 21 CPU: 43 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 22 CPU: 3 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy VCPU: 23 CPU: 6 State: running CPU time: 0.0s CPU Affinity: yyyyyyyyyyyyyyyy----------------yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy After numad default refresh interval time, recheck: # virsh vcpuinfo rhel6u3-501 VCPU: 0 CPU: 35 State: running CPU time: 16.4s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 1 CPU: 47 State: running CPU time: 1.0s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 2 CPU: 55 State: running CPU time: 2.1s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 3 CPU: 39 State: running CPU time: 1.5s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 4 CPU: 39 State: running CPU time: 1.7s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 5 CPU: 3 State: running CPU time: 0.9s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 6 CPU: 23 State: running CPU time: 1.4s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 7 CPU: 11 State: running CPU time: 1.2s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 8 CPU: 43 State: running CPU time: 2.1s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 9 CPU: 3 State: running CPU time: 1.0s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 10 CPU: 47 State: running CPU time: 1.0s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 11 CPU: 3 State: running CPU time: 1.4s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 12 CPU: 15 State: running CPU time: 0.9s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 13 CPU: 15 State: running CPU time: 0.7s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 14 CPU: 3 State: running CPU time: 0.6s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 15 CPU: 43 State: running CPU time: 0.6s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 16 CPU: 39 State: running CPU time: 0.8s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 17 CPU: 3 State: running CPU time: 0.6s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 18 CPU: 11 State: running CPU time: 0.6s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 19 CPU: 7 State: running CPU time: 0.5s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 20 CPU: 3 State: running CPU time: 1.4s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 21 CPU: 51 State: running CPU time: 0.5s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 22 CPU: 7 State: running CPU time: 0.4s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU: 23 CPU: 23 State: running CPU time: 0.4s CPU Affinity: ---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y---y VCPU and CPU relation changed after rescan, but CPU Affinity always show the wrong pin, and it will follow the pattern as first & second check shows, at last it will remain as what second time check show. Something might wrong here.
Hi, Bill, I suspect it's caused by numad rebalance the affinity dynamically, and thus "virsh vcpuinfo" (using sched_getaffinity underlying) will display different results. but I'd like to confirm with you, is that true? Osier
(In reply to comment #0) > Description of problem: > numad's document was a bit confused, and libvirt expects the CPUs list, but not > node list, so there are two ways to fix the problem 1) numad returns CPUs list > instead, 2) libvirt converts the node list into CPUs list, and numad updates > the doc. We had agreement to go forward with 2). Fixes go together with this BZ: 1) Pre-set memory policy of domain process with the advisory nodeset from numad, using libnuma's API. 2) Devide currentMemory's value by 1024 before passing it to numad's command line, as libvirt stores the value in KB unit in memory. Patches posted internally: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-May/msg00201.html Check the documents come with the patches to see how to fully drive numad now.
Okay the default placement based on both memory and CPU affinity should have been improved in build libvirt-0.9.10-18.el6, I suggest to use that version for further testing Daniel
pkgs: libvirt-0.9.10-18.el6NumadBuild.x86_64 numad-0.5-3.20120316git.el6.x86_64 kernel-2.6.32-269.el6.x86_64 qemu-kvm-0.12.1.2-2.290.el6.x86_64 steps 1. prepare a domain with vcpu parts set auto placement: # virsh dumpxml rhel6u2 ... <vcpu placement='auto'>24</vcpu> ... 2. start domain and check xml # virsh start rhel6u2 Domain rhel6u2 started # virsh dumpxml rhel6u2 ... <vcpu placement='auto'>24</vcpu> <numatune> <memory mode='strict' placement='auto'/> </numatune> ... 3. check vcpuinfo # virsh vcpuinfo rhel6u2 VCPU: 0 CPU: 1 State: running CPU time: 10.6s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 1 CPU: 40 State: running CPU time: 1.1s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 2 CPU: 14 State: running CPU time: 1.1s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 3 CPU: 46 State: running CPU time: 1.1s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 4 CPU: 6 State: running CPU time: 1.4s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 5 CPU: 59 State: running CPU time: 1.6s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 6 CPU: 19 State: running CPU time: 1.8s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 7 CPU: 13 State: running CPU time: 1.1s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 8 CPU: 53 State: running CPU time: 1.0s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 9 CPU: 1 State: running CPU time: 1.2s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 10 CPU: 11 State: running CPU time: 1.3s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 11 CPU: 40 State: running CPU time: 1.4s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 12 CPU: 4 State: running CPU time: 1.3s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 13 CPU: 40 State: running CPU time: 2.3s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 14 CPU: 42 State: running CPU time: 1.6s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 15 CPU: 41 State: running CPU time: 1.5s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 16 CPU: 42 State: running CPU time: 1.2s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 17 CPU: 12 State: running CPU time: 1.1s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 18 CPU: 40 State: running CPU time: 1.2s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 19 CPU: 10 State: running CPU time: 1.3s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 20 CPU: 50 State: running CPU time: 1.2s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 21 CPU: 2 State: running CPU time: 1.2s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 22 CPU: 48 State: running CPU time: 1.3s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- VCPU: 23 CPU: 2 State: running CPU time: 1.5s CPU Affinity: yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy-------------------- The CPU pin is in CPU Affinity range, so this is working right now. 4. destroy domain and edit domain xml as: # virsh destroy rhel6u2 # virsh edit rhel6u2 ... <vcpu placement='auto'>24</vcpu> <numatune> <memory mode='interleave'/> </numatune> ... 5. start domain and check xml # virsh start rhel6u2 Domain rhel6u2 started # virsh dumpxml rhel6u2 ... <vcpu placement='auto'>24</vcpu> <numatune> <memory mode='interleave' placement='auto'/> </numatune> ... 6. check with vcpupin # virsh vcpuinfo rhel6u2 result is similar to step 5. 7. destroy and edit domain # virsh destroy rhel6u2 # virsh edit rhel6u2 ... <vcpu placement='auto'>24</vcpu> <numatune> <memory mode='strict' nodeset='0-1,3'/> </numatune> ... 8. start domain and check vcpuinfo # virsh start rhel6u2 Domain rhel6u2 started # virsh vcpuinfo rhel6u2 result is similar to step 5. 9. destroy and edit domain # virsh destroy rhel6u2 # virsh edit rhel6u2 ... <vcpu placement='static' cpuset='0-11,13-22,66-79'>24</vcpu> <numatune> <memory mode='interleave' placement='auto'/> </numatune> ... 10. start domain and check vcpuinfo # virsh start rhel6u2 Domain rhel6u2 started # virsh vcpuinfo rhel6u2 result is similar to step 5 and affinity is right: CPU Affinity: yyyyyyyyyyyy-yyyyyyyyyy-------------------------------------------yyyyyyyyyyyyyy
Test with libvirt-0.9.10-20.el6nodeinfo.x86_64. Results as following: 0. # numactl --hardware available: 8 nodes (0-7) node 0 cpus: 0 4 8 12 16 20 node 0 size: 65526 MB node 0 free: 62852 MB node 1 cpus: 24 28 32 36 40 44 node 1 size: 65536 MB node 1 free: 60043 MB node 2 cpus: 3 7 11 15 19 23 node 2 size: 65536 MB node 2 free: 58456 MB node 3 cpus: 27 31 35 39 43 47 node 3 size: 65536 MB node 3 free: 63275 MB node 4 cpus: 2 6 10 14 18 22 node 4 size: 65536 MB node 4 free: 63405 MB node 5 cpus: 26 30 34 38 42 46 node 5 size: 65536 MB node 5 free: 63714 MB node 6 cpus: 1 5 9 13 17 21 node 6 size: 65536 MB node 6 free: 63833 MB node 7 cpus: 25 29 33 37 41 45 node 7 size: 65536 MB node 7 free: 63806 MB node distances: node 0 1 2 3 4 5 6 7 0: 10 16 16 22 16 22 16 22 1: 16 10 16 22 22 16 22 16 2: 16 16 10 16 16 16 16 22 3: 22 22 16 10 16 16 22 16 4: 16 22 16 16 10 16 16 16 5: 22 16 16 16 16 10 22 22 6: 16 22 16 22 16 22 10 16 7: 22 16 22 16 16 22 16 10 1. Prepare a domain with no 'cpuset', no 'placement' for <vcpu>, and no <numatune>, then start the domain. #virsh dumpxml rhel62 |grep vcpu <vcpu placement='static'>24</vcpu> #virsh vcpuinfo rhel62 <snip> VCPU: 0 CPU: 29 State: running CPU time: 10.7s CPU Affinity: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy </snip> the domain process is pinned to all available CPUs as expected. 2. Edit domain with no 'cpuset', no 'placement' for <vcpu>, and 'placement' for <numatune> is 'auto'. Then start the domain. # virsh dumpxml rhel62|grep vcpu -A 3 <vcpu placement='auto'>24</vcpu> <numatune> <memory mode='strict' placement='auto'/> </numatune> #cat /tmp/libvirtd.debug | grep Nodeset 2012-05-17 11:52:07.434+0000: 53030: debug : qemuProcessStart:3356 : Nodeset returned from numad: 2-5 # virsh vcpuinfo rhel62 <snip> VCPU: 0 CPU: 2 State: running CPU time: 11.1s CPU Affinity: --yy--yy--yy--yy--yy--yy--yy--yy--yy--yy--yy--yy </snip> #tail -n50 /var/log/libvirt/qemu/rhel62.log <snip> 2012-05-17 11:52:07.521+0000: 53150: debug : qemuProcessInitCpuAffinity:1731 : Setting CPU affinity 2012-05-17 11:52:07.526+0000: 53150: debug : qemuProcessInitCpuAffinity:1749 : Set CPU affinity with advisory nodeset from numad 2012-05-17 11:52:07.526+0000: 53150: debug : qemuProcessInitNumaMemoryPolicy:1599 : Set NUMA memory policy with advisory nodeset from numad </snip> # cat /proc/53150/status <snip> Cpus_allowed: 0000cccc,cccccccc Cpus_allowed_list: 2-3,6-7,10-11,14-15,18-19,22-23,26-27,30-31,34-35,38-39,42-43,46-47 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000003c Mems_allowed_list: 2-5 </snip> 3. Edit domain with no 'cpuset', 'placement=auto' for <vcpu>, and no <numatune>. Then start the domain. # cat /tmp/libvirtd.debug | grep Nodeset 2012-05-17 11:52:07.434+0000: 53030: debug : qemuProcessStart:3356 : Nodeset returned from numad: 2-5 2012-05-17 12:01:11.469+0000: 53030: debug : qemuProcessStart:3356 : Nodeset returned from numad: 3-4,6-7 # virsh vcpuinfo rhel62 <snip> VCPU: 23 CPU: 47 State: running CPU time: 0.7s CPU Affinity: -yy--yy--yy--yy--yy--yy--y-y-y-y-y-y-y-y-y-y-y-y </snip> #cat /proc/53406/status <snip> Cpus_allowed: 0000aaaa,aa666666 Cpus_allowed_list: 1-2,5-6,9-10,13-14,17-18,21-22,25,27,29,31,33,35,37,39,41,43,45,47 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000d8 Mems_allowed_list: 3-4,6-7 </snip> 4. Edit domain with placement='static' cpuset='0-11,13-22,66-79' for <vcpu> and no <numatune>, then start the domain # virsh dumpxml rhel62|grep vcpu -A 1 <vcpu placement='static' cpuset='0-11,13-22,66-79'>24</vcpu> <os> No new Nodeset recorder in libvirtd log. # virsh vcpuinfo rhel62 <snip> VCPU: 0 CPU: 2 State: running CPU time: 10.9s CPU Affinity: yyyyyyyyyyyy-yyyyyyyyyy------------------------- </snip> #cat /proc/53769/status <snip> Cpus_allowed: 00000000,007fefff Cpus_allowed_list: 0-11,13-22 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff Mems_allowed_list: 0-7 </snip>
(In reply to comment #15) > Test with libvirt-0.9.10-20.el6nodeinfo.x86_64. Results as following: > 0. > # numactl --hardware > available: 8 nodes (0-7) > node 0 cpus: 0 4 8 12 16 20 > node 0 size: 65526 MB > node 0 free: 62852 MB > node 1 cpus: 24 28 32 36 40 44 > node 1 size: 65536 MB > node 1 free: 60043 MB > node 2 cpus: 3 7 11 15 19 23 > node 2 size: 65536 MB > node 2 free: 58456 MB > node 3 cpus: 27 31 35 39 43 47 > node 3 size: 65536 MB > node 3 free: 63275 MB > node 4 cpus: 2 6 10 14 18 22 > node 4 size: 65536 MB > node 4 free: 63405 MB > node 5 cpus: 26 30 34 38 42 46 > node 5 size: 65536 MB > node 5 free: 63714 MB > node 6 cpus: 1 5 9 13 17 21 > node 6 size: 65536 MB > node 6 free: 63833 MB > node 7 cpus: 25 29 33 37 41 45 > node 7 size: 65536 MB > node 7 free: 63806 MB > node distances: > node 0 1 2 3 4 5 6 7 > 0: 10 16 16 22 16 22 16 22 > 1: 16 10 16 22 22 16 22 16 > 2: 16 16 10 16 16 16 16 22 > 3: 22 22 16 10 16 16 22 16 > 4: 16 22 16 16 10 16 16 16 > 5: 22 16 16 16 16 10 22 22 > 6: 16 22 16 22 16 22 10 16 > 7: 22 16 22 16 16 22 16 10 > 1. Prepare a domain with no 'cpuset', no 'placement' for <vcpu>, and no > <numatune>, then start the domain. > #virsh dumpxml rhel62 |grep vcpu > <vcpu placement='static'>24</vcpu> > #virsh vcpuinfo rhel62 > <snip> > VCPU: 0 > CPU: 29 > State: running > CPU time: 10.7s > CPU Affinity: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy > </snip> > the domain process is pinned to all available CPUs as expected. Works as expected. > 2. Edit domain with no 'cpuset', no 'placement' for <vcpu>, and 'placement' for > <numatune> is 'auto'. Then start the domain. > # virsh dumpxml rhel62|grep vcpu -A 3 > <vcpu placement='auto'>24</vcpu> > <numatune> > <memory mode='strict' placement='auto'/> > </numatune> > #cat /tmp/libvirtd.debug | grep Nodeset > 2012-05-17 11:52:07.434+0000: 53030: debug : qemuProcessStart:3356 : Nodeset > returned from numad: 2-5 > > # virsh vcpuinfo rhel62 > <snip> > VCPU: 0 > CPU: 2 > State: running > CPU time: 11.1s > CPU Affinity: --yy--yy--yy--yy--yy--yy--yy--yy--yy--yy--yy--yy > </snip> > #tail -n50 /var/log/libvirt/qemu/rhel62.log > <snip> > 2012-05-17 11:52:07.521+0000: 53150: debug : qemuProcessInitCpuAffinity:1731 : > Setting CPU affinity > 2012-05-17 11:52:07.526+0000: 53150: debug : qemuProcessInitCpuAffinity:1749 : > Set CPU affinity with advisory nodeset from numad > 2012-05-17 11:52:07.526+0000: 53150: debug : > qemuProcessInitNumaMemoryPolicy:1599 : Set NUMA memory policy with advisory > nodeset from numad > </snip> > # cat /proc/53150/status > <snip> > Cpus_allowed: 0000cccc,cccccccc > Cpus_allowed_list: > 2-3,6-7,10-11,14-15,18-19,22-23,26-27,30-31,34-35,38-39,42-43,46-47 > Mems_allowed: > 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000003c > Mems_allowed_list: 2-5 > </snip> Could you confirm the Cpus_allowed_list and vcpuinfo have right CPUs by comparing with what you get from "numactl --hardware"? A simple bash script can do it. > 3. Edit domain with no 'cpuset', 'placement=auto' for <vcpu>, and no > <numatune>. Then start the domain. > # cat /tmp/libvirtd.debug | grep Nodeset > 2012-05-17 11:52:07.434+0000: 53030: debug : qemuProcessStart:3356 : Nodeset > returned from numad: 2-5 > 2012-05-17 12:01:11.469+0000: 53030: debug : qemuProcessStart:3356 : Nodeset > returned from numad: 3-4,6-7 > # virsh vcpuinfo rhel62 > <snip> > VCPU: 23 > CPU: 47 > State: running > CPU time: 0.7s > CPU Affinity: -yy--yy--yy--yy--yy--yy--y-y-y-y-y-y-y-y-y-y-y-y > </snip> > #cat /proc/53406/status > <snip> > Cpus_allowed: 0000aaaa,aa666666 > Cpus_allowed_list: > 1-2,5-6,9-10,13-14,17-18,21-22,25,27,29,31,33,35,37,39,41,43,45,47 > Mems_allowed: > 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000d8 > Mems_allowed_list: 3-4,6-7 Except the CPUs_allowed_list and vcpuinfo which I'm not sure, everything works as expected. > </snip> > 4. Edit domain with placement='static' cpuset='0-11,13-22,66-79' for <vcpu> and > no <numatune>, then start the domain > # virsh dumpxml rhel62|grep vcpu -A 1 > <vcpu placement='static' cpuset='0-11,13-22,66-79'>24</vcpu> > <os> > > No new Nodeset recorder in libvirtd log. > # virsh vcpuinfo rhel62 > <snip> > VCPU: 0 > CPU: 2 > State: running > CPU time: 10.9s > CPU Affinity: yyyyyyyyyyyy-yyyyyyyyyy------------------------- > </snip> > #cat /proc/53769/status > <snip> > Cpus_allowed: 00000000,007fefff > Cpus_allowed_list: 0-11,13-22 > Mems_allowed: > 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff > Mems_allowed_list: 0-7 Works fine. "66-79" is ignored. > </snip>
Hi Osier, Compare the CPUs_allowed_list and vcpuinfo in comment 15 step 2, #cat compare-cpu.sh #! /bin/sh for i in {2..5}; do numactl --hardware | grep "node $i cpus:" >> cpus; done cat cpus | awk -F':' '{print $2}' > cpus2 for i in {0..47}; do if grep "\b$i\b" cpus2 > /dev/null; then echo -n "y" else echo -n "-" fi done #sh compare-cpu.sh --yy--yy--yy--yy--yy--yy--yy--yy--yy--yy--yy--yy Result is same with vcpuinfo output.
According comment 16 & comment 17, move this bug to VERIFIED.
This is for updated numad compatiability testing. packages: libvirt-0.9.10-21.el6.x86_64 numad-0.5-4.20120522git.el6.x86_64 kernel-2.6.32-269.el6.x86_64 qemu-kvm-0.12.1.2-2.294.el6.x86_64 # numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 40 41 42 43 44 45 46 47 48 49 node 0 size: 131050 MB node 0 free: 127328 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59 node 1 size: 131072 MB node 1 free: 127338 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66 67 68 69 node 2 size: 131072 MB node 2 free: 127540 MB node 3 cpus: 30 31 32 33 34 35 36 37 38 39 70 71 72 73 74 75 76 77 78 79 node 3 size: 131072 MB node 3 free: 127442 MB node distances: node 0 1 2 3 0: 10 11 11 11 1: 11 10 11 11 2: 11 11 10 11 3: 11 11 11 10 steps: 1. prepare a domain with vcpu parts set auto placement: # virsh dumpxml rhel63 ... <vcpu placement='auto'>24</vcpu> ... # virsh start rhel63 # virsh dumpxml rhel63 ... <vcpu placement='auto'>24</vcpu> <numatune> <memory mode='strict' placement='auto'/> </numatune> ... check log: # cat /var/log/libvirtd.log| grep Nodeset 2012-05-24 09:33:04.300+0000: 20342: debug : qemuProcessStart:3356 : Nodeset returned from numad: 1-2 # cat /var/log/libvirt/qemu/rhel63.log ... 2012-05-24 09:33:04.447+0000: 23518: debug : qemuProcessInitCpuAffinity:1731 : Setting CPU affinity 2012-05-24 09:33:04.456+0000: 23518: debug : qemuProcessInitCpuAffinity:1749 : Set CPU affinity with advisory nodeset from numad 2012-05-24 09:33:04.456+0000: 23518: debug : qemuProcessInitNumaMemoryPolicy:1599 : Set NUMA memory policy with advisory nodeset from numad ... check vcpuinfo: # virsh vcpuinfo rhel63 VCPU: 0 CPU: 55 State: running CPU time: 7.5s CPU Affinity: ----------yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy---------- ... # cat /proc/23518/status ... check cpu_allow_list: Cpus_allowed: 003f,fffc0000,3ffffc00 Cpus_allowed_list: 10-29,50-69 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000006 Mems_allowed_list: 1-2 ... check acctual cpu affinity: #! /bin/sh for i in {1..2}; do numactl --hardware | grep "node $i cpus:" >> cpus; done cat cpus | awk -F':' '{print $2}' > cpus2 for i in {0..79}; do if grep "\b$i\b" cpus2 > /dev/null; then echo -n "y" else echo -n "-" fi done # sh compare-cpu.sh ----------yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy---------- The output is the same with vcpuinfo So this is working as expected. 2. destroy domain and edit domain xml as: # virsh destroy rhel63 # virsh edit rhel63 ... <vcpu placement='auto'>24</vcpu> <numatune> <memory mode='interleave'/> </numatune> ... # virsh start rhel63 Domain rhel63 started # virsh dumpxml rhel63 ... <vcpu placement='auto'>24</vcpu> <numatune> <memory mode='interleave' placement='auto'/> </numatune> ... check libvirtd.log: # cat /var/log/libvirtd.log| grep Nodeset 2012-05-24 09:50:42.481+0000: 20343: debug : qemuProcessStart:3356 : Nodeset returned from numad: 1,3 # virsh vcpuinfo rhel63 VCPU: 0 CPU: 72 State: running CPU time: 7.4s CPU Affinity: ----------yyyyyyyyyy----------yyyyyyyyyy----------yyyyyyyyyy----------yyyyyyyyyy ... # cat /proc/24429/status ... Cpus_allowed: ffc0,0ffc00ff,c00ffc00 Cpus_allowed_list: 10-19,30-39,50-59,70-79 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f Mems_allowed_list: 0-3 ... modify compare-cpu.sh and run: # sh compare-cpu.sh ----------yyyyyyyyyy----------yyyyyyyyyy----------yyyyyyyyyy----------yyyyyyyyyy Also run other steps in comment 15 and comment 16, both libvirt and numad is working as expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0748.html