Description of problem: offlice the cpu in the node which numad return, without restart libvirtd, start domain will fail Version-Release number of selected component (if applicable): # rpm -q libvirt kernel qemu-kvm-rhev libvirt-1.2.17-3.el7.x86_64 kernel-3.10.0-302.el7.x86_64 qemu-kvm-rhev-2.3.0-15.el7.x86_64 # numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 65514 MB node 0 free: 60859 MB node 1 cpus: 8 9 10 11 12 13 14 15 node 1 size: 65536 MB node 1 free: 61649 MB node distances: node 0 1 0: 10 11 1: 11 10 How reproducible: always Steps to Reproduce: 1. update numad to return certain node for testing # mv /usr/bin/numad /usr/bin/numad_ # vim /usr/bin/numad #!/usr/bin/env python import os import sys if len(sys.argv) == 3 and sys.argv[1] == '-w': print(1) sys.exit(0) os.execv('/usr/bin/numad_', sys.argv) sys.exit(1) # chmod +x /usr/bin/numad 2. offline a cpu in node 1 # cat /sys/devices/system/cpu/cpu8/online 1 # echo 0 > /sys/devices/system/cpu/cpu8/online 3. start vm with placement as auto # virsh dumpxml virt-tests-vm1 ... <vcpu placement='auto'>2</vcpu> ... # virsh start virt-tests-vm1 error: Failed to start domain virt-tests-vm1 error: Invalid value '8-15' for 'cpuset.cpus': Invalid argument vm failed to start as '8-15' should be updated to '9-15' after offline cpu8 Actual results: vm failed to start Expected results: vm could start with updated cpu value Additional info: If restarted libvirtd after offline the cpu, then start domain will succeed.
Thank you for filing this BZ. Could you also try if running "virsh capabilities" fixes the issue (without restarting libvirtd)? If it does, the problem is in virCapabilitiesGetCpusForNodemask() that it does not take actual state of the host into account, but rather uses cached data. It's hard to say at this point whether libvirt should use inotify for the changes or not use the capabilities.
Also similar issue should happen if you just remove the cpu from cpuset.cpus from the machine.slice (but leave it online). The only difference will be that instead of 'Invalid argument' you'll get 'Permission denied'. It's worth noting because there will be two different fixes needed for this to work in libvirt, even though very similar error appears. Thanks for understanding.
(In reply to Martin Kletzander from comment #1) virsh capabilites did not work # echo 0 > /sys/devices/system/cpu/cpu8/online # virsh capabilities ... # echo $? 0 # virsh start virt-tests-vm1 error: Failed to start domain virt-tests-vm1 error: Invalid value '8-15' for 'cpuset.cpus': Invalid argument # echo 1 > /sys/devices/system/cpu/cpu8/online # systemctl restart libvirtd # head /sys/fs/cgroup/cpuset/{,machine.slice/}cpuset.{mems,cpus} ==> /sys/fs/cgroup/cpuset/cpuset.mems <== 0-1 ==> /sys/fs/cgroup/cpuset/cpuset.cpus <== 0-15 ==> /sys/fs/cgroup/cpuset/machine.slice/cpuset.mems <== 0-1 ==> /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus <== 0-7,9-15 ^ here cpuset.cpus not updated even after restart libvirtd as comment #2 said the error now will be following: # virsh start virt-tests-vm1 error: Failed to start domain virt-tests-vm1 error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope/emulator/cpuset.cpus': Permission denied
Looks like this will be closely related to a few fixes I'm planing to to so I'll assign this to myself.
This bug is going to be addressed in next major release.
This bug was closed deferred as a result of bug triage. Please reopen if you disagree and provide justification why this bug should get enough priority. Most important would be information about impact on customer or layered product. Please indicate requested target release.