Bug 1703661
Summary: | 'cannot set CPU affinity' error when starting guest | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Junxiang Li <junli> | ||||||
Component: | libvirt | Assignee: | Andrea Bolognani <abologna> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 7.7 | CC: | abologna, dzheng, jdenemar, jiyan, jomurphy, jsuchane, mdeng, mtessun, mzamazal, ngu, qzhang | ||||||
Target Milestone: | rc | Keywords: | Automation, Regression | ||||||
Target Release: | --- | ||||||||
Hardware: | ppc64le | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-4.5.0-20.el7 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1716387 (view as bug list) | Environment: | |||||||
Last Closed: | 2019-08-06 13:14:55 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Junxiang Li
2019-04-27 09:54:49 UTC
Is it reproducible only on ppc64le platform? (In reply to Jaroslav Suchanek from comment #3) > Is it reproducible only on ppc64le platform? Yes, I could NOT reproduce it on x86_64 I tried reproducing this on one of the POWER 8 machines I have access to, without success. Given that, and the fact that Comment 5 also points out it can only be reproduced on a single machine, I'm betting my money on that machine having a peculiar NUMA topology that confuses libvirt. Can you please post the output of 'numactl -H'? Ideally I'd get shell access to the machine, but that output is a starting point. Unfortunately the machine you'd given me access to seem to have gotten stuck while I was working on it, and now I'm locked out; it won't even respond to pings or offer serial console access, and since I'm not the one loaning it on Beaker I can't access the power controls :( Before that happened, though, I managed to look around a bit, and my theory that the issue was caused by a peculiar NUMA topology seems to have been incorrect after all, as it looked pretty much the same as any other POWER 8 machine I've worked on. I did also verify that the issue does not reproduce with libvirt-4.5.0-11.el7.ppc64le but does show up after upgrading to libvirt-4.5.0-14.el7.ppc64le; looking at the differences between those two versions, I believe the problematic commits are commit b733703cfcc4b4e8966051ba20bed301645331d0 Author: Michal Privoznik <mprivozn> Date: Thu Apr 18 18:58:58 2019 +0200 qemu: Set up EMULATOR thread and cpuset.mems before exec()-ing qemu https://bugzilla.redhat.com/show_bug.cgi?id=1695434 It's funny how this went unnoticed for such a long time. Long story short, if a domain is configured with VIR_DOMAIN_NUMATUNE_MEM_STRICT libvirt doesn't really honour that. This is because of 7e72ac787848 after which libvirt allowed qemu to allocate memory just anywhere and only after that it used some magic involving cpuset.memory_migrate and cpuset.mems to move the memory to desired NUMA nodes. This was done in order to work around some KVM bug where KVM would fail if there wasn't a DMA zone available on the NUMA node. Well, while the work around might stopped libvirt tickling the KVM bug it also caused a bug on libvirt side: if there is not enough memory on configured NUMA node(s) then any attempt to start a domain must fail. Because of the way we play with guest memory domains can start just happily. The solution is to move the child we've just forked into emulator cgroup, set up cpuset.mems and exec() qemu only after that. This basically reverts 7e72ac787848b7434c9 which was a workaround for kernel bug. This bug was apparently fixed because I've tested this successfully with recent kernel. Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Martin Kletzander <mkletzan> (cherry picked from commit 0eaa4716e1b8f6eb59d77049aed3735c3b5fbdd6) Signed-off-by: Michal Privoznik <mprivozn> Message-Id: <efd9d64c94a027281c244c05f69cc9f4c31ed83b.1555606711.git.mprivozn> Reviewed-by: Jiri Denemark <jdenemar> commit eb7ef8053311d82d43912a5cc1e82d0266bb29de Author: Michal Privoznik <mprivozn> Date: Thu Apr 18 18:58:57 2019 +0200 qemu: Rework setting process affinity RHEL-7.7: https://bugzilla.redhat.com/show_bug.cgi?id=1695434 RHEL-8.0.1: https://bugzilla.redhat.com/show_bug.cgi?id=1503284 The way we currently start qemu from CPU affinity POV is as follows: 1) the child process is set affinity to all online CPUs (unless some vcpu pinning was given in the domain XML) 2) Once qemu is running, cpuset cgroup is configured taking memory pinning into account Problem is that we let qemu allocate its memory just anywhere in 1) and then rely in 2) to be able to move the memory to configured NUMA nodes. This might not be always possible (e.g. qemu might lock some parts of its memory) and is very suboptimal (copying large memory between NUMA nodes takes significant amount of time). The solution is to set affinity to one of (in priority order): - The CPUs associated with NUMA memory affinity mask - The CPUs associated with emulator pinning - All online host CPUs Later (once QEMU has allocated its memory) we then change this again to (again in priority order): - The CPUs associated with emulator pinning - The CPUs returned by numad - The CPUs associated with vCPU pinning - All online host CPUs Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Daniel P. Berrangé <berrange> (cherry picked from commit f136b83139c63f20de0df3285d9e82df2fb97bfc) I had to explicitly free bitmaps, because there is no VIR_AUTOPTR just yet. Signed-off-by: Michal Privoznik <mprivozn> Message-Id: <a6edd347c999f999a49d1a878c74c690eb2ab619.1555606711.git.mprivozn> Reviewed-by: Jiri Denemark <jdenemar which were backported to RHEL 7.7 to fix Bug 1695434. I'm currently trying to get access to a different POWER 8 machine on which to continue the investigation. I'll keep you posted. Alright, I managed to get access to a different POWER 8 machine and reproduce the issue there. The original machine became accessible again in the meantime, but I don't really need it any longer so it can safely be returned. Through comparison between the avocado-vt-vm1 guest that was on the original machine and a Fedora 30 guest that I created with the same command line I would normally use for testing, I figured out why I could not initially reproduce the issue: I usually assign 8 vCPUs to guests, and in that scenario the guest will start just fine, but as soon as I change its configuration to <vcpu placement='auto'>2</vcpu> then I get the error message. The fact that 8 is exactly the number of threads per core on a POWER 8 machine is almost certainly key to understand why that value works, and all others don't. I'll investigate further next week. Milan, can you please estimate, how this issue impacts RHV? Is <numatune><memory mode="strict" placement="auto" /></numatune> used in RHV anyhow? Thanks. Looking around, I can see that: - <numatune><memory mode="strict"/>...</numatune> can be used. - We don't use explicit `placement' attribute in `memory'. - We use <vcpu> also without `placement' attribute. I'm not sure whether this, with any actual combination of elements and attribute values generated in RHV, can induce placement="auto". (In reply to Milan Zamazal from comment #12) > Looking around, I can see that: > > - <numatune><memory mode="strict"/>...</numatune> can be used. > - We don't use explicit `placement' attribute in `memory'. > - We use <vcpu> also without `placement' attribute. > > I'm not sure whether this, with any actual combination of elements and > attribute values generated in RHV, can induce placement="auto". Thanks for looking into this! :) If I'm reading the documentation ([1] and [2]) correctly, then <vcpu> without placement is equivalent to placement='static', and <numatune><memory> will inherit the the placement from <vcpu> in this scenario, which in turn makes providing a nodeset mandatory. The above matches the results I got while testing: # cat test.xml ... <vcpu>2</vcpu> <numatune> <memory mode='strict' /> </numatune> ... # virsh define test.xml error: Failed to define domain from test.cml error: unsupported configuration: nodeset for NUMA memory tuning must be set if 'placement' is 'static' # So RHV must be providing the nodeset argument too, right? And either way, this bug only shows up when using placement='auto', so if RHV doesn't use that feature then it's not going to be affected. [1] https://libvirt.org/formatdomain.html#elementsCPUAllocation [2] https://libvirt.org/formatdomain.html#elementsNUMATuning (In reply to Andrea Bolognani from comment #13) > So RHV must be providing the nodeset argument too, right? Well, looking at https://github.com/oVirt/ovirt-engine/blob/4dbfe06a726ff39c8660d177dc58fd56152830d9/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java#L567, it should. But on a (literally!) closer look, there is `modeset' instead of `nodeset' there, so I wonder whether anything uses that piece of code at all. Not your worry of course, your assumption should be correct. (In reply to Milan Zamazal from comment #14) > (In reply to Andrea Bolognani from comment #13) > > > So RHV must be providing the nodeset argument too, right? > > Well, looking at > https://github.com/oVirt/ovirt-engine/blob/ > 4dbfe06a726ff39c8660d177dc58fd56152830d9/backend/manager/modules/vdsbroker/ > src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/ > LibvirtVmXmlBuilder.java#L567, it should. Good! > But on a (literally!) closer look, > there is `modeset' instead of `nodeset' there, so I wonder whether anything > uses that piece of code at all. Not your worry of course, your assumption > should be correct. Oh boy :) I dug more in the meantime and realized that you can also hit the bug with something like <vcpu>8</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> but thanks to the typo in ovirt-engine you spotted and mentioned above, RHV should still be in the clear. I got confirmation that the <memory> element is currently not used in RHV (the `if' statement referred in Comment 14 is a dead piece of code). Patches posted upstream. https://www.redhat.com/archives/libvir-list/2019-May/msg00919.html (In reply to Milan Zamazal from comment #16) > I got confirmation that the <memory> element is currently not used in RHV > (the `if' statement referred in Comment 14 is a dead piece of code). That's very good to know, thanks! :) (In reply to Junxiang Li from comment #0) > Steps to Reproduce: > 1. To define a guest with: > <numatune><memory mode="strict" placement="auto" /></numatune> > 2. Try to start it > > Actual results: > # virsh start test1 > error: Failed to start domain test1 > error: invalid argument: Failed to parse bitmap '' One thing that I apparently forgot to point out is that I never managed to reproduce those exact symptoms: what I got instead was along the lines of # virsh start guest error: Failed to start domain guest error: cannot set CPU affinity on process 40055: Invalid argument I can hit the specific error message reported above if I edit a guest so that its configuration contains something like <vcpu>2</vcpu> <numatune> <memory nodeset=''/> </numatune> In that case, after saving and closing the editor I get error: invalid argument: Failed to parse bitmap '' Failed. Try again? [y,n,i,f,?]: i error: invalid argument: Failed to parse bitmap '' Failed. Try again? [y,n,f,?]: with no way to proceed, which is the expected behavior. Can you please try reproducing the issue again and confirm that you're indeed seeing the specific error message reported above, and not the same one I'm seeing? Because if that's the case, we might need more digging :) I finally managed to reproduce the issue reported initially (failed to parse bitmap), but since this bug has been mostly used to track work on the second issue (cannot set CPU affinity) I'm changing the title to reflect that. I've created a separate bug (Bug 1716387) for the first issue and will track it there from now on. Sorry for any confusion this might cause. I'm also making the bug public since all the non-public information is already relegated to private comments. Fix merged upstream. commit 5f2212c062c720716b7701fa0a5511311dc6e906 Author: Andrea Bolognani <abologna> Date: Thu May 30 19:20:34 2019 +0200 qemu: Fix qemuProcessInitCpuAffinity() Ever since the feature was introduced with commit 0f8e7ae33ace, it has contained a logic error in that it attempted to use a NUMA node map where a CPU map was expected. Because of that, guests using <numatune> might fail to start: # virsh start guest error: Failed to start domain guest error: cannot set CPU affinity on process 40055: Invalid argument This was particularly easy to trigger on POWER 8 machines, where secondary threads always show up as offline in the host: having <numatune> <memory mode='strict' placement='static' nodeset='1'/> </numatune> in the guest configuration, for example, would result in libvirt trying to set the process affinity so that it would prefer running on CPU 1, but since that's a secondary thread and thus shows up as offline, the operation would fail, and so would starting the guest. Use the newly introduced virNumaNodesetToCPUset() to convert the NUMA node map to a CPU map, which in the example above would be 48,56,64,72,80,88 - a valid input for virProcessSetAffinity(). https://bugzilla.redhat.com/show_bug.cgi?id=1703661 Signed-off-by: Andrea Bolognani <abologna> Reviewed-by: Ján Tomko <jtomko> v5.4.0-45-g5f2212c062 Reproduce: env: # rpm -q libvirt libvirt-4.5.0-19.el7.ppc64le step: 1. Edit the guest xml with "<numatune><memory mode="strict" placement="auto" /></numatune>" 2. Try to start the guest result: report the following error message: error: Failed to start domain avocado-vt-vm1 error: cannot set CPU affinity on process 156584: Invalid argument Verify: env: # rpm -q libvirt libvirt-4.5.0-20.el7.ppc64le step: 1. Edit the guest xml with "<numatune><memory mode="strict" placement="auto" /></numatune>" 2. Try to start the guest result: The guest started with message: Domain avocado-vt-vm1 started In summary, this problem has been fixed. Also reproduced this bug on libvirt-4.5.0-19.el7.x86_64 Version: libvirt-4.5.0-19.el7.x86_64 kernel-3.10.0-1057.el7.x86_64 qemu-kvm-rhev-2.12.0-33.el7.x86_64 Steps: # virsh domstate avocado-vt-vm1 shut off # virsh dumpxml avocado-vt-vm1 --inactive |grep "<vcpu" -A3 <vcpu placement='static'>1</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> # cat /sys/devices/system/cpu/cpu1/online 0 # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: cannot set CPU affinity on process 3470: Invalid argument Hi I am trying to verify this bug in x86_64, and I enountered the following err. Could you please help to have a look at it? thx :) Version: kernel-3.10.0-1057.el7.x86_64 qemu-kvm-rhev-2.12.0-33.el7.x86_64 libvirt-4.5.0-23.el7.x86_64 kernel-3.10.0-1058.el7.x86_64 Steps: # virsh domstate avocado-vt-vm1 shut off # virsh dumpxml avocado-vt-vm1 --inactive |grep "<vcpu" -A3 <vcpu placement='static'>1</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> # echo 0 > /sys/devices/system/cpu/cpu1/online # cat /sys/devices/system/cpu/cpu1/online 0 # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown Created attachment 1584557 [details]
vm.log
Created attachment 1584558 [details]
libvirtd.log
(In reply to jiyan from comment #29) > Hi I am trying to verify this bug in x86_64, and I enountered the following > err. > Could you please help to have a look at it? thx :) > > Version: > kernel-3.10.0-1057.el7.x86_64 > qemu-kvm-rhev-2.12.0-33.el7.x86_64 > libvirt-4.5.0-23.el7.x86_64 > kernel-3.10.0-1058.el7.x86_64 > > Steps: > # virsh domstate avocado-vt-vm1 > shut off > > # virsh dumpxml avocado-vt-vm1 --inactive |grep "<vcpu" -A3 > <vcpu placement='static'>1</vcpu> > <numatune> > <memory mode='strict' nodeset='1'/> > </numatune> > > # echo 0 > /sys/devices/system/cpu/cpu1/online > > # cat /sys/devices/system/cpu/cpu1/online > 0 > > # virsh start avocado-vt-vm1 > error: Failed to start domain avocado-vt-vm1 > error: An error occurred, but the cause is unknown Alright, this happens regardless of whether you have offlined CPU1 before trying to start the guest, and the underlying reason is that you're asking libvirt to pin the guest to NUMA node 1 but the host only has a single NUMA node (0). Can you please open a separate bug to track this? libvirt is doing the right thing by failing, and the only problem is that we're not reporting a good enough error message when that happens. Version: libvirt-4.5.0-23.el7.x86_64 qemu-kvm-rhev-2.12.0-33.el7.x86_64 kernel-3.10.0-1058.el7.x86_64 Stesp: 1. Check numactl info and set host cpu 1 offline # numactl --hard available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 16362 MB node 0 free: 14107 MB node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 16384 MB node 1 free: 14888 MB node distances: node 0 1 0: 10 11 1: 11 10 # echo 0 > /sys/devices/system/cpu/cpu1/online # cat /sys/devices/system/cpu/cpu1/online 0 2。 Prepare a shutdown Vm with the following conf and start VM # virsh domstate vm1 shut off # virsh dumpxml vm1 --inactive |grep "<vcpu" -A4 <vcpu placement='static'>1</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> # virsh start vm1 Domain vm1 started 3. Downgrade VM and restart VM # yum downgrade libvirt* -y # rpm -qa libvirt qemu-kvm-rhev kernel kernel-3.10.0-1058.el7.x86_64 qemu-kvm-rhev-2.12.0-33.el7.x86_64 libvirt-4.5.0-19.el7.x86_64 # virsh destroy vm1;virsh start vm1 Domain vm1 destroyed error: Failed to start domain vm1 error: cannot set CPU affinity on process 13782: Invalid argument The test result is as expected. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2294 |