Description of problem: "An error occurred, but the cause is unknown" raised when starting VM with non-existed numa node in <numatune> Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.12.0-33.el7.x86_64 libvirt-4.5.0-23.el7.x86_64 kernel-3.10.0-1058.el7.x86_64 How reproducible: 100% Steps to Reproduce: https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c29 https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c33 Actual results: Abnormal err raised Expected results: Libvirtd should raise reasonable err Additional info:
(In reply to jiyan from comment #0) > Description of problem: > "An error occurred, but the cause is unknown" raised when starting VM with > non-existed numa node in <numatune> > > > Version-Release number of selected component (if applicable): > qemu-kvm-rhev-2.12.0-33.el7.x86_64 > libvirt-4.5.0-23.el7.x86_64 > kernel-3.10.0-1058.el7.x86_64 > > How reproducible: > 100% > > Steps to Reproduce: > https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c29 > https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c33 > I'm unable to reproduce using these steps. can you please share full domain XML and 'numactl -H' output? And even which cpus are online/offline if I need to switch them to reproduce? Thanks.
Version: libvirt-4.5.0-20.el7.x86_64 kernel-3.10.0-1058.el7.x86_64 qemu-kvm-rhev-2.12.0-18.el7_6.7.x86_64 Steps: # numactl -H available: 1 nodes (0) node 0 cpus: 0 1 2 3 node 0 size: 3965 MB node 0 free: 1116 MB node distances: node 0 0: 10 # virsh domstate vm1 shut off # virsh dumpxml vm1 --inactive |grep "<vcpu" -A4 <vcpu placement='static'>1</vcpu> <numatune> <memory mode='strict' nodeset='1'/> [There is only node-0 in host.] </numatune> # virsh start vm1 error: Failed to start domain vm1 error: An error occurred, but the cause is unknown
Sry for the wrong qemu-kvm version in comment 3: The numactl info shows in https://bugzilla.redhat.com/show_bug.cgi?id=1724866#c3 According to https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c29; the version is as follows: Hi I am trying to verify this bug in x86_64, and I enountered the following err. Could you please help to have a look at it? thx :) Version: kernel-3.10.0-1057.el7.x86_64 qemu-kvm-rhev-2.12.0-33.el7.x86_64 libvirt-4.5.0-23.el7.x86_64 kernel-3.10.0-1058.el7.x86_64 Steps: # virsh domstate avocado-vt-vm1 shut off # virsh dumpxml avocado-vt-vm1 --inactive |grep "<vcpu" -A3 <vcpu placement='static'>1</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> # echo 0 > /sys/devices/system/cpu/cpu1/online # cat /sys/devices/system/cpu/cpu1/online 0 # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown
Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2020-September/msg00372.html
Merged upstream as: 9e0d4b9240 virnuma: Report error when NUMA -> CPUs translation fails v6.7.0-86-g9e0d4b9240
To POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2020-September/msg00167.html Scratch build available here: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=31257178 http://brew-task-repos.usersys.redhat.com/repos/scratch/mprivozn/libvirt/6.6.0/5.el8_rc.e400c6f61d/
With the scratch build, tried in a machine with 2 nodes. Virsh edit the domain with set the nodeset to use "2" in several ways- <numatune> <memory mode='strict' nodeset='2'/> <---- </numatune> <numatune> or <numatune> <memory mode='strict' nodeset='0,2'/> <---- </numatune> or <numatune> <memory mode='strict' nodeset='1-2'/> <-- </numatune> --- <cpu mode='host-model' check='partial'> <feature policy='disable' name='vmx'/> </cpu> # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: operation failed: NUMA node 2 is not available
Verified in a machine with 2 nodes with version: libvirt-daemon-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64 qemu-kvm-5.1.0-8.module+el8.3.0+8141+3cd9cd43.x86_64 Virsh edit the domain with set the nodeset to use "2" in several ways- <numatune> <memory mode='strict' nodeset='2'/> <---- </numatune> <numatune> or <numatune> <memory mode='strict' nodeset='0,2'/> <---- </numatune> or <numatune> <memory mode='strict' nodeset='1-2'/> <-- </numatune> --- <cpu mode='host-model' check='partial'> <feature policy='disable' name='vmx'/> </cpu> # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: operation failed: NUMA node 2 is not available
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137