Bug 1118517
Summary: | Can't simultaneously boot multiple machines using NUMA automatic placement (regression from R6.5) | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Dax Kelson <dkelson> |
Component: | libvirt | Assignee: | Martin Kletzander <mkletzan> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.0 | CC: | dkelson, dyuan, honzhang, jmiao, mkletzan, mzhan, rbalakri |
Target Milestone: | rc | Keywords: | Reopened, Upstream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-1.2.7-1.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-12-22 10:08:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dax Kelson
2014-07-10 23:01:28 UTC
This might be already fixed. Could you try few things to confirm that? 1) Check libvirtd.log for the placements offered by numad and paste it here. 2) Paste output of 'grep DMA /proc/zoneinfo' here. 3) Edit qemu.conf and set cgroup_controllers to the default list (the commented one), but remove "cpuset" and try whether it fixes your issue. Thank you, Martin I suspect this is caused by numad advising nodes without node 0 that probably has DMA and DMA_32 zones (that kvm needs). Fixed upstream with v1.2.6-176-g7e72ac7: commit 7e72ac787848b7434c9359a57c1e2789d92350f8 Author: Martin Kletzander <mkletzan> Date: Tue Jul 8 09:59:49 2014 +0200 qemu: leave restricting cpuset.mems after initialization (In reply to Martin Kletzander from comment #2) > This might be already fixed. Could you try few things to confirm that? > > 1) Check libvirtd.log for the placements offered by numad and paste it here. 2014-09-17 16:38:16.060+0000: 4685: debug : virCommandRunAsync:2282 : About to run /bin/numad -w 2:8192 2014-09-17 16:38:18.066+0000: 4685: debug : qemuProcessStart:3771 : Nodeset returned from numad: 1 2014-09-17 16:38:18.087+0000: 4685: debug : qemuProcessInitCpuAffinity:2021 : Set CPU affinity with advisory nodeset from numad 2014-09-17 16:38:18.444+0000: 4684: debug : virCommandRunAsync:2282 : About to run /bin/numad -w 2:768 2014-09-17 16:38:20.449+0000: 4684: debug : qemuProcessStart:3771 : Nodeset returned from numad: 0 2014-09-17 16:38:20.468+0000: 4684: debug : qemuProcessInitCpuAffinity:2021 : Set CPU affinity with advisory nodeset from numad > > 2) Paste output of 'grep DMA /proc/zoneinfo' here. Node 0, zone DMA Node 0, zone DMA32 > 3) Edit qemu.conf and set cgroup_controllers to the default list (the > commented one), but remove "cpuset" and try whether it fixes your issue. I made the requested edit, but still broken. for i in keystonedev.gurulabs.com storagedev.gurulabs.com unifi.gurulabs.com; do virsh start $i; doneerror: Failed to start domain keystonedev.gurulabs.com error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory Domain storagedev.gurulabs.com started error: Failed to start domain unifi.gurulabs.com error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory I guess the thread that got returned nodeset 0 (4684) started the domain successfully and the other one (4684) failed to start the domain. As I expected this should be already fixed. I'm not sure why the workaround doesn't work, though. Using libvirt-1.1.1-29.el7_0.3.x86_64 it is still broken, but with a new error message.
Using <vcpu placement='auto'> I get the following trying to rapidly start a bunch of vms:
# for i in computedev1 computedev2 edxdev keystonedev oldspidey shaka storagedev tlv unifi
> do
> virsh start $i
> done
error: Failed to start domain computedev1
error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dcomputedev1.scope/cpuset.cpus': Device or resource busy
error: Failed to start domain computedev2
error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dcomputedev2.scope/cpuset.cpus': Device or resource busy
error: Failed to start domain edxdev
error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dedxdev.scope/cpuset.cpus': Device or resource busy
Domain keystonedev started
error: Failed to start domain oldspidey
error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2doldspidey.scope/cpuset.cpus': Device or resource busy
error: Failed to start domain shaka
error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dshaka.scope/cpuset.cpus': Device or resource busy
Domain storagedev started
error: Failed to start domain tlv
error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dtlv.scope/cpuset.cpus': Device or resource busy
error: Failed to start domain unifi
error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dunifi.scope/cpuset.cpus': Device or resource busy
Can you attach a debug log, please? I can't reproduce it in libvirt-1.1.1-29.el7_0.3.x86_64, in my environment, machine has two NUMA nodes: # numactl --hard available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 65514 MB node 0 free: 58918 MB node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 65536 MB node 1 free: 55141 MB node distances: node 0 1 0: 10 11 1: 11 10 And DMA zone resides in Node 0 # grep "DMA" /proc/zoneinfo Node 0, zone DMA Node 0, zone DMA32 So, both guest configured that mbinding in Node 1, as: # virsh dumpxml a ... <vcpu placement='static'>4</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> ... Than start them: # for i in {a..b}; do virsh start $i; done Domain a started Domain b started # for i in {a..b}; do virsh destroy $i; done Domain a destroyed Domain b destroyed # for i in {a..b}; do virsh start $i; done Domain a started Domain b started # for i in {a..b}; do virsh destroy $i; done Domain a destroyed Domain b destroyed There is no error occurs. IMO, the error like "Device or resource busy" should be the problem with kernel " error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dunifi.scope/cpuset.cpus': Device or resource busy " My software stacks are libvirt-1.1.1-29.el7_0.3.x86_64 qemu-kvm-1.5.3-60.el7.x86_64 kernel-3.10.0-123.17.1.el7.x86_64 I also used libvirt-1.1.1-29.el7_0.3.x86_64 and kernel-3.10.0-123.4.2.el7.x86_64, but still can't reproduce it. # rpm -q libvirt libvirt-1.1.1-29.el7_0.3.x86_64 # uname -r 3.10.0-123.4.2.el7.x86_64 use following configuration: <vcpu placement='auto'>4</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> [root@localhost ~]# for i in {a..e}; do virsh start $i; done Domain a started Domain b started Domain c started Domain d started Domain e started [root@localhost ~]# for i in {a..e}; do virsh destroy $i; done Domain a destroyed Domain b destroyed Domain c destroyed Domain d destroyed Domain e destroyed [root@localhost ~]# for i in {a..e}; do virsh start $i; done Domain a started Domain b started Domain c started Domain d started Domain e started [root@localhost ~]# for i in {a..e}; do virsh destroy $i; done Domain a destroyed Domain b destroyed Domain c destroyed Domain d destroyed Domain e destroyed Hi Martin, I think this bug is triggered by strict cpuset.mems in cgroup initialization. For "Unable to write to cpuset.cpus: Device or resource busy", I couldn't reproduce it in my environment. Is it a real problem? or could I verify it as testing cpuset.mems for DMA Zone? For the 'Device or resource busy" error, this looks like it has the same root cause as Bug 1168866, so this should "just work" with libvirt-1.2.8-10.el7. Just to be sure that my previous comment is not a complete guess, could you attach an XML of a domain that cannot be started, please? Thank you. For problem "kvm_init_vcpu failed: Cannot allocate memory", I can reproduce it in libvirt-1.1.1-29.el7_0.1.x86_64. # rpm -q libvirt; uname -r libvirt-1.1.1-29.el7_0.1.x86_64 use following configuration: <vcpu placement='auto'>4</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> # for i in {a..e}; do echo starting $i; virsh start $i; done starting a error: Failed to start domain a error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory starting b error: Failed to start domain b error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory starting c error: Failed to start domain c error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory starting d error: Failed to start domain d error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory starting e error: Failed to start domain e error: internal error: process exited while connecting to monitor: kvm_init_vcpu failed: Cannot allocate memory According to bug description, the problem of "Device or resource busy" is similar with bug 1168866, and I will mark this bug as duplicate to bug 1168866. If you still have question about it, feel free to reopen it. *** This bug has been marked as a duplicate of bug 1168866 *** We are not sure whether this is duplicate or not. Please read my previous messages saying that we cannot do anything unless we have more information from the reporter. I'm moving this back to assigned until we have that info and until we are sure what fixes it. If there is no new info and we're past some deadline, we'll have to close this bug with INSUFFICIENT DATA. There is no response from the reporter for some time and we are not sure whether this is already fixed or not due to reproducibility issues and insufficient data. I'm closing it as such, feel free to reopen this BZ if there is any new information. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |