Bug 1724866
| Summary: | "An error occurred, but the cause is unknown" raised when starting VM with non-existed numa node in <numatune> | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | jiyan <jiyan> |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
| Status: | CLOSED ERRATA | QA Contact: | Jing Qi <jinqi> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 8.0 | CC: | dyuan, jdenemar, jsuchane, lmen, xuzhang, yalzhang |
| Target Milestone: | rc | Keywords: | Upstream |
| Target Release: | 8.1 | Flags: | pm-rhel:
mirror+
|
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-6.6.0-5.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-11-17 17:44:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1806857, 2001390 | ||
|
Description
jiyan
2019-06-28 01:29:38 UTC
(In reply to jiyan from comment #0) > Description of problem: > "An error occurred, but the cause is unknown" raised when starting VM with > non-existed numa node in <numatune> > > > Version-Release number of selected component (if applicable): > qemu-kvm-rhev-2.12.0-33.el7.x86_64 > libvirt-4.5.0-23.el7.x86_64 > kernel-3.10.0-1058.el7.x86_64 > > How reproducible: > 100% > > Steps to Reproduce: > https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c29 > https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c33 > I'm unable to reproduce using these steps. can you please share full domain XML and 'numactl -H' output? And even which cpus are online/offline if I need to switch them to reproduce? Thanks. Version:
libvirt-4.5.0-20.el7.x86_64
kernel-3.10.0-1058.el7.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.7.x86_64
Steps:
# numactl -H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 3965 MB
node 0 free: 1116 MB
node distances:
node 0
0: 10
# virsh domstate vm1
shut off
# virsh dumpxml vm1 --inactive |grep "<vcpu" -A4
<vcpu placement='static'>1</vcpu>
<numatune>
<memory mode='strict' nodeset='1'/> [There is only node-0 in host.]
</numatune>
# virsh start vm1
error: Failed to start domain vm1
error: An error occurred, but the cause is unknown
Sry for the wrong qemu-kvm version in comment 3: The numactl info shows in https://bugzilla.redhat.com/show_bug.cgi?id=1724866#c3 According to https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c29; the version is as follows: Hi I am trying to verify this bug in x86_64, and I enountered the following err. Could you please help to have a look at it? thx :) Version: kernel-3.10.0-1057.el7.x86_64 qemu-kvm-rhev-2.12.0-33.el7.x86_64 libvirt-4.5.0-23.el7.x86_64 kernel-3.10.0-1058.el7.x86_64 Steps: # virsh domstate avocado-vt-vm1 shut off # virsh dumpxml avocado-vt-vm1 --inactive |grep "<vcpu" -A3 <vcpu placement='static'>1</vcpu> <numatune> <memory mode='strict' nodeset='1'/> </numatune> # echo 0 > /sys/devices/system/cpu/cpu1/online # cat /sys/devices/system/cpu/cpu1/online 0 # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: An error occurred, but the cause is unknown Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2020-September/msg00372.html Merged upstream as: 9e0d4b9240 virnuma: Report error when NUMA -> CPUs translation fails v6.7.0-86-g9e0d4b9240 To POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2020-September/msg00167.html Scratch build available here: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=31257178 http://brew-task-repos.usersys.redhat.com/repos/scratch/mprivozn/libvirt/6.6.0/5.el8_rc.e400c6f61d/ With the scratch build, tried in a machine with 2 nodes.
Virsh edit the domain with set the nodeset to use "2" in several ways-
<numatune>
<memory mode='strict' nodeset='2'/> <----
</numatune>
<numatune>
or
<numatune>
<memory mode='strict' nodeset='0,2'/> <----
</numatune>
or
<numatune>
<memory mode='strict' nodeset='1-2'/> <--
</numatune>
---
<cpu mode='host-model' check='partial'>
<feature policy='disable' name='vmx'/>
</cpu>
# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: operation failed: NUMA node 2 is not available
Verified in a machine with 2 nodes with version:
libvirt-daemon-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64
qemu-kvm-5.1.0-8.module+el8.3.0+8141+3cd9cd43.x86_64
Virsh edit the domain with set the nodeset to use "2" in several ways-
<numatune>
<memory mode='strict' nodeset='2'/> <----
</numatune>
<numatune>
or
<numatune>
<memory mode='strict' nodeset='0,2'/> <----
</numatune>
or
<numatune>
<memory mode='strict' nodeset='1-2'/> <--
</numatune>
---
<cpu mode='host-model' check='partial'>
<feature policy='disable' name='vmx'/>
</cpu>
# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: operation failed: NUMA node 2 is not available
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137 |