Bug 949408
| Summary: | domain fail to start with vcpu placement as auto | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Wayne Sun <gsun> | ||||||||||||||||||||
| Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | ||||||||||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||||||||||
| Priority: | medium | ||||||||||||||||||||||
| Version: | 7.0 | CC: | acathrow, cwei, dallan, dyuan, honzhang, jmiao, mzhan, pkrempa | ||||||||||||||||||||
| Target Milestone: | rc | ||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||||
| OS: | Linux | ||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||
| Fixed In Version: | libvirt-1.1.1-1.el7 | Doc Type: | Bug Fix | ||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||
| Last Closed: | 2014-06-13 11:26:20 UTC | Type: | Bug | ||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||
| Embargoed: | |||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||
Created attachment 732554 [details]
libvirtd log
>
> Additional info:
> check in log:
> 2013-04-08 05:28:33.765+0000: 26491: debug : qemuProcessStart:3728 : Nodeset
> returned from numad: 1
>
> so it's not a problem with numad
Can you provide the CPU toplogy?
Created attachment 737121 [details]
cpuinfo
# virsh nodeinfo
CPU model: x86_64
CPU(s): 32
CPU frequency: 1064 MHz
CPU socket(s): 1
Core(s) per socket: 8
Thread(s) per core: 2
NUMA cell(s): 2
Memory size: 131875768 KiB
# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65514 MB
node 0 free: 62316 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB
node 1 free: 62744 MB
node distances:
node 0 1
0: 10 11
1: 11 10
Created attachment 737124 [details]
sysfs dump info
# ll /sys/devices/system/
Wayne, I need more info on this, one is the debug log, except the log containing "Nodeset", logs about setting "cpuset" cgroup are also needed. And can you make a tarball of the domain's cpuset cgroup files? It looks like similar problem with this: https://www.redhat.com/archives/libvirt-users/2013-January/msg00085.html Created attachment 738435 [details]
libvirtd log with Nodeset
after updated kernel to latest:
3.9.0-0.rc7.52.el7.x86_64
domain can start successfully.
Also tried on another machine with
3.9.0-0.rc6.51.el7.x86_64
it also works, so it migth be kernel's cgroup problem.
Anyway, the fail full log before update kernel is provided.
Created attachment 738450 [details] cpuset cgroup files of domain (In reply to comment #7) > Created attachment 738435 [details] > libvirtd log with Nodeset > > after updated kernel to latest: > 3.9.0-0.rc7.52.el7.x86_64 > > domain can start successfully. > > Also tried on another machine with > 3.9.0-0.rc6.51.el7.x86_64 > > it also works, so it migth be kernel's cgroup problem. > > Anyway, the fail full log before update kernel is provided. after repeat run several times it occured again, so the problem still exist. The cgroup file is removed after failed to start domain. So, I will attach the cpuset cgroup files without placement as auto set. Okay, it's the same problem with https://www.redhat.com/archives/libvirt-users/2013-January/msg00085.html indeed, the cpuset.cpus is set with 0-31 (all cpus), however, cpuset.mems is 1 (only node 1). numatune with mode as interleave will not take effect
# virsh dumpxml rhel6_local|grep interleave -2
<vcpu placement='static'>2</vcpu>
<numatune>
<memory mode='interleave' nodeset='1-2'/>
</numatune>
<os>
start the domain and check process:
# cat /proc/3713/status |grep Mems_allowed_list
Mems_allowed_list: 0-3
# virsh numatune rhel6_local
numa_mode : interleave
numa_nodeset : 0-3
Check in log:
2013-04-23 11:19:39.156+0000: 3154: debug : virCgroupSetValueStr:331 : Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/rhel6_local/emulator/cpuset.mems' to '0-3'
Check with nmstat tool shows the memory usage is not even distributed on node 1-2. Is this the same problem here?
> Check with nmstat tool shows the memory usage is not even distributed on
> node 1-2. Is this the same problem here?
Yes. similar.
patches posted to upstream. https://www.redhat.com/archives/libvir-list/2013-May/msg00637.html. Created attachment 758015 [details]
libvirtd log
reproduce with:
libvirt-1.0.6-1.el7.x86_64
qemu-kvm-1.5.0-2.el7.x86_64
kernel-3.9.0-0.55.el7.x86_64
latest libvirtd log attached
Created attachment 758020 [details]
domain qemu log
domain qemu log also updated
Updated version posted for review: http://www.redhat.com/archives/libvir-list/2013-July/msg01159.html Fixed upstream:
commit a39f69d2bb5494d661be917956baa437d01a4d13
Author: Osier Yang <jyang>
Date: Fri May 24 17:08:28 2013 +0800
qemu: Set cpuset.cpus for domain process
When either "cpuset" of <vcpu> is specified, or the "placement" of
<vcpu> is "auto", only setting the cpuset.mems might cause the guest
starting to fail. E.g. ("placement" of both <vcpu> and <numatune> is
"auto"):
1) Related XMLs
<vcpu placement='auto'>4</vcpu>
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
2) Host NUMA topology
% numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 4 8 12 16 20 24 28
node 0 size: 16374 MB
node 0 free: 11899 MB
node 1 cpus: 32 36 40 44 48 52 56 60
node 1 size: 16384 MB
node 1 free: 15318 MB
node 2 cpus: 2 6 10 14 18 22 26 30
node 2 size: 16384 MB
node 2 free: 15766 MB
node 3 cpus: 34 38 42 46 50 54 58 62
node 3 size: 16384 MB
node 3 free: 15347 MB
node 4 cpus: 3 7 11 15 19 23 27 31
node 4 size: 16384 MB
node 4 free: 15041 MB
node 5 cpus: 35 39 43 47 51 55 59 63
node 5 size: 16384 MB
node 5 free: 15202 MB
node 6 cpus: 1 5 9 13 17 21 25 29
node 6 size: 16384 MB
node 6 free: 15197 MB
node 7 cpus: 33 37 41 45 49 53 57 61
node 7 size: 16368 MB
node 7 free: 15669 MB
4) cpuset.cpus will be set as: (from debug log)
2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus'
to '0-63'
5) The advisory nodeset got from querying numad (from debug log)
2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 :
Nodeset returned from numad: 1
6) cpuset.mems will be set as: (from debug log)
2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems'
to '0-7'
I.E, the domain process's memory is restricted on the first NUMA node,
however, it can use all of the CPUs, which will likely cause the domain
process to fail to start because of the kernel fails to allocate
memory with the the memory policy as "strict".
% tail -n 20 /var/log/libvirt/qemu/toy.log
...
2013-05-09 05:53:32.972+0000: 7318: debug : virCommandHandshakeChild:377 :
Handshake with parent is done
char device redirected to /dev/pts/2 (label charserial0)
kvm_init_vcpu failed: Cannot allocate memory
...
Signed-off-by: Peter Krempa <pkrempa>
commit b8b38321e724b5b1b7858c415566ab5e6e96ec8c
Author: Peter Krempa <pkrempa>
Date: Thu Jul 18 11:21:48 2013 +0200
caps: Add helpers to convert NUMA nodes to corresponding CPUs
These helpers use the remembered host capabilities to retrieve the cpu
map rather than query the host again. The intended usage for this
helpers is to fix automatic NUMA placement with strict memory alloc. The
code doing the prepare needs to pin the emulator process only to cpus
belonging to a subset of NUMA nodes of the host.
v1.1.0-254-ga39f69d
Created attachment 781506 [details]
numa cpuset log
hi Peter ,
I also met the problem with latest libvirt:
# rpm -q libvirt qemu-kvm kernel numad
libvirt-1.1.1-1.el7.x86_64
qemu-kvm-1.5.2-1.el7.x86_64
kernel-3.10.0-3.el7.x86_64
numad-0.5-10.20121130git.el7.x86_64
# virsh dumpxml r7q
...
<vcpu placement='auto'>4</vcpu>
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
...
# virsh start r7q
error: Failed to start domain r7q
error: internal error: process exited while connecting to monitor: char device redirected to /dev/pts/3 (label charserial0)
kvm_init_vcpu failed: Cannot allocate memory
the CPU toplogy is the same as comment 4:
# virsh nodeinfo
CPU model: x86_64
CPU(s): 32
CPU frequency: 1064 MHz
CPU socket(s): 1
Core(s) per socket: 8
Thread(s) per core: 2
NUMA cell(s): 2
Memory size: 131752920 KiB
# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65514 MB
node 0 free: 61865 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB
node 1 free: 63435 MB
node distances:
node 0 1
0: 10 11
1: 11 10
And in libvirtd.log(the attachment I uploaded):
I can see cpuset.cpus = 0-31, and cpuset.mems = 0-1. According to comment 9 from Osier, this should be right.
Is that meaning libvirt is ok, the error is in qemu-kvm?
Well libvirt takes the information returned from "numad" and uses it to create the topology of the guest. The data returned from numad depend on multiple factors and it's not guaranteed that the guest will start successfully even after querying numad. According to the log, NUMA nodes 0-1 should be used. This corresponds to all cpus (0-31) in the host. To verify that the fix is okay, you have to re-run the guest (with less memory maybe) so that "numad" will provide a different node range. Then you need to verify that the CPU range provided to the guest corresponds to the NUMA node range. When they do match the fix is okay, but it's still not guaranteed that qemu will successfully be able to allocate it's memory. According to libvirtd.log, the cpu range matches NUMA node range. But no guaranteed success of qemu starting makes me confused. Should this bug move to qemu-kvm component ? (In reply to Jincheng Miao from comment #21) > According to libvirtd.log, the cpu range matches NUMA node range. That is corresponding to the original problem described by this bug. > > But no guaranteed success of qemu starting makes me confused. The problem is that invoking numad to find out what nodes contain enough memory to accomodate a guest doesn't guarantee that the memory will be available at the time the guest will be allocating it. This creates a race condition that may sometimes result into the guest failing to start when there's "just enough" free memory. > > Should this bug move to qemu-kvm component ? No this is a problem in the approach libvirt is using to determine the node range. The problem for now is that there is no way to do it without the race condition as other processes may take the memory that was available at the time we determined the node range before the starting domain is able to allocate it. It may be worth opening a separate bug to track that issue as this bug is regarding the invalid CPU range that was generated from the node list which was fixed by the patches mentioned above. According Peter reply comment 22, this bug is related to the invalid CPU range, and it is fix in this patch, so I choose to change the status to VERIFIED. For failing to start domain, there is a race condition for allocating memory, so I opened a new bug ( https://bugzilla.redhat.com/show_bug.cgi?id=1010885 ) to track that issue. Thanks for Peter's advice. This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |
Created attachment 732553 [details] domain qemu log Description of problem: domain fail to start with vcpu placement as auto Version-Release number of selected component (if applicable): libvirt-1.0.3-1.el7.x86_64 qemu-kvm-1.4.0-1.el7.x86_64 kernel-3.7.0-0.32.el7.x86_64 numad-0.5-8.20121130git.el7.x86_64 How reproducible: always Steps to Reproduce: 1. set vcpu placement as auto # virsh dumpxml aa ... <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='auto'>4</vcpu> <numatune> <memory mode='strict' placement='auto'/> </numatune> ... 2. start domain # virsh start aa error: Failed to start domain aa error: internal error process exited while connecting to monitor: 2013-04-08 05:28:33.826+0000: 5494: debug : virFileClose:72 : Closed fd 25 2013-04-08 05:28:33.826+0000: 5494: debug : virFileClose:72 : Closed fd 31 2013-04-08 05:28:33.828+0000: 5494: debug : virFileClose:72 : Closed fd 3 2013-04-08 05:28:33.828+0000: 5495: debug : virExec:602 : Run hook 0x7fae6521fef0 0x7fae6a0f53c0 2013-04-08 05:28:33.828+0000: 5495: debug : qemuProcessHook:2728 : Obtaining domain lock 2013-04-08 05:28:33.828+0000: 5495: debug : virSecuritySELinuxSetSecuritySocketLabel:1963 : Setting VM aa socket context system_u:system_r:svirt_t:s0:c160,c656 2013-04-08 05:28:33.829+0000: 5495: debug : virDomainLockProcessStart:170 : plugin=0x7fae5c005600 dom=0x7fae5c29edb0 paused=1 fd=0x7fae6a0f4f4c 2013-04-08 05:28:33.829+0000: 5495: debug : virDomainLockManagerNew:128 : plugin=0x7fae5c005600 dom=0x7fae5c29edb0 withResources=1 2013-04-08 05:28:33.829+0000: 5495: debug : virLockManagerPluginGetDriver:297 : plugin=0x7fae5c005600 2013-04-08 05:28:33.829+0000: 5 3. Check domain log: # vim /var/log/libvirt/qemu/aa.log ... 2013-04-08 05:53:32.972+0000: 7318: debug : virCommandHandshakeChild:377 : Handshake with parent is done char device redirected to /dev/pts/2 (label charserial0) kvm_init_vcpu failed: Cannot allocate memory 2013-04-08 05:53:33.175+0000: shutting down domain fail with 'kvm_init_vcpu failed: Cannot allocate memory' Actual results: domain fail to start Expected results: domain start succeed Additional info: check in log: 2013-04-08 05:28:33.765+0000: 26491: debug : qemuProcessStart:3728 : Nodeset returned from numad: 1 so it's not a problem with numad