Bug 1724866 - "An error occurred, but the cause is unknown" raised when starting VM with non-existed numa node in <numatune>
Summary: "An error occurred, but the cause is unknown" raised when starting VM with no...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.0
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: 8.1
Assignee: Michal Privoznik
QA Contact: Jing Qi
URL:
Whiteboard:
Depends On:
Blocks: 1806857
TreeView+ depends on / blocked
 
Reported: 2019-06-28 01:29 UTC by jiyan
Modified: 2020-11-17 17:45 UTC (History)
6 users (show)

Fixed In Version: libvirt-6.6.0-5.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-17 17:44:46 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description jiyan 2019-06-28 01:29:38 UTC
Description of problem:
"An error occurred, but the cause is unknown" raised when starting VM with non-existed numa node in <numatune>


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.12.0-33.el7.x86_64
libvirt-4.5.0-23.el7.x86_64
kernel-3.10.0-1058.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c29
https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c33

Actual results:
Abnormal err raised 

Expected results:
Libvirtd should raise reasonable err

Additional info:

Comment 2 Michal Privoznik 2019-06-28 12:07:14 UTC
(In reply to jiyan from comment #0)
> Description of problem:
> "An error occurred, but the cause is unknown" raised when starting VM with
> non-existed numa node in <numatune>
> 
> 
> Version-Release number of selected component (if applicable):
> qemu-kvm-rhev-2.12.0-33.el7.x86_64
> libvirt-4.5.0-23.el7.x86_64
> kernel-3.10.0-1058.el7.x86_64
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c29
> https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c33
> 

I'm unable to reproduce using these steps. can you please share full domain XML and 'numactl -H' output? And even which cpus are online/offline if I need to switch them to reproduce? Thanks.

Comment 3 jiyan 2019-07-01 03:23:53 UTC
Version:
libvirt-4.5.0-20.el7.x86_64
kernel-3.10.0-1058.el7.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.7.x86_64

Steps:
# numactl -H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 3965 MB
node 0 free: 1116 MB
node distances:
node   0 
  0:  10 

# virsh domstate vm1
shut off

# virsh dumpxml vm1 --inactive |grep "<vcpu" -A4
  <vcpu placement='static'>1</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>   [There is only node-0 in host.]
  </numatune>

# virsh start vm1
error: Failed to start domain vm1
error: An error occurred, but the cause is unknown

Comment 4 jiyan 2019-07-25 03:23:41 UTC
Sry for the wrong qemu-kvm version in comment 3:

The numactl info shows in https://bugzilla.redhat.com/show_bug.cgi?id=1724866#c3

According to https://bugzilla.redhat.com/show_bug.cgi?id=1703661#c29; the version is as follows:

Hi I am trying to verify this bug in x86_64, and I enountered the following err.
Could you please help to have a look at it? thx :)

Version:
kernel-3.10.0-1057.el7.x86_64
qemu-kvm-rhev-2.12.0-33.el7.x86_64
libvirt-4.5.0-23.el7.x86_64
kernel-3.10.0-1058.el7.x86_64

Steps:
# virsh domstate avocado-vt-vm1
shut off

# virsh dumpxml avocado-vt-vm1 --inactive |grep "<vcpu" -A3
  <vcpu placement='static'>1</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>

# echo 0 > /sys/devices/system/cpu/cpu1/online 

# cat /sys/devices/system/cpu/cpu1/online 
0

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: An error occurred, but the cause is unknown

Comment 5 Michal Privoznik 2020-09-07 15:12:31 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2020-September/msg00372.html

Comment 6 Michal Privoznik 2020-09-08 09:02:19 UTC
Merged upstream as:

9e0d4b9240 virnuma: Report error when NUMA -> CPUs translation fails

v6.7.0-86-g9e0d4b9240

Comment 8 Jing Qi 2020-09-10 08:54:28 UTC
With the scratch build, tried in a machine with 2 nodes.

Virsh edit the domain with set the nodeset to use "2" in several ways-
<numatune>
    <memory mode='strict' nodeset='2'/>  <----
  </numatune>
<numatune>
or
<numatune>
    <memory mode='strict' nodeset='0,2'/> <----
  </numatune>
 or
<numatune>
   <memory mode='strict' nodeset='1-2'/> <--
  </numatune>
---
  <cpu mode='host-model' check='partial'>
    <feature policy='disable' name='vmx'/>
  </cpu>


# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: operation failed: NUMA node 2 is not available

Comment 12 Jing Qi 2020-09-21 06:48:45 UTC
Verified in a machine with 2 nodes with version:
libvirt-daemon-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64
qemu-kvm-5.1.0-8.module+el8.3.0+8141+3cd9cd43.x86_64

Virsh edit the domain with set the nodeset to use "2" in several ways-
<numatune>
    <memory mode='strict' nodeset='2'/>  <----
  </numatune>
<numatune>
or
<numatune>
    <memory mode='strict' nodeset='0,2'/> <----
  </numatune>
 or
<numatune>
   <memory mode='strict' nodeset='1-2'/> <--
  </numatune>
---
  <cpu mode='host-model' check='partial'>
    <feature policy='disable' name='vmx'/>
  </cpu>


# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: operation failed: NUMA node 2 is not available

Comment 15 errata-xmlrpc 2020-11-17 17:44:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137


Note You need to log in before you can comment on or make changes to this bug.