Bug 1401974 - Failed to start VM under preferred NUMA mode
Summary: Failed to start VM under preferred NUMA mode
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.18.18
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.1.0-beta
: 4.19.2
Assignee: Martin Sivák
QA Contact: Artyom
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-06 13:41 UTC by Artyom
Modified: 2017-02-01 14:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-01 14:40:56 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-4.0.z+
rule-engine: ovirt-4.1+
rule-engine: blocker+


Attachments (Terms of Use)
vdsm and engine logs (469.52 KB, application/zip)
2016-12-06 13:41 UTC, Artyom
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 68341 0 master MERGED core: Unpinned numa nodes are left with default behaviour 2020-05-14 19:26:05 UTC

Description Artyom 2016-12-06 13:41:29 UTC
Created attachment 1228477 [details]
vdsm and engine logs

Description of problem:
Failed to start VM under preferred NUMA mode

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.0-0.2.master.20161206091320.git94e2a8d.el7.centos.noarch
vdsm-4.18.999-1081.git32572fc.el7.centos.x86_64
libvirt-2.0.0-10.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Add one NUMA node to VM and pin it to host NUMA node(you must pin the VM to the host)
2. Change the VM NUMA mode to preferred
3. Start the VM

Actual results:
VM fails to start with the error under vdsm log:
2016-12-06 15:31:23,243 ERROR (vm/52dba9b9) [virt.vm] (vmId='52dba9b9-0e10-4f03-b160-3635e78e70f1') The vm start process failed (vm:613)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 549, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1980, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 128, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 936, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3777, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error: Process exited prior to exec: libvirt:  error : internal error: NUMA memory tuning in 'preferred' mode only supports single node


Expected results:
The VM succeeds to run

Additional info:
VM NUMA node via REST:
<vm_numa_nodes>
<vm_numa_nodehref="/ovirt-engine/api/vms/52dba9b9-0e10-4f03-b160-3635e78e70f1/numanodes/c79e5f7f-e791-4ab1-818a-63e487b3a901"id="c79e5f7f-e791-4ab1-818a-63e487b3a901">
<cpu>
<cores>
<core>…</core>
</cores>
</cpu>
<index>0</index>
<memory>1024</memory>
<numa_node_pins>
<numa_node_pin>
<index>0</index>
</numa_node_pin>
</numa_node_pins>
<vmhref="/ovirt-engine/api/vms/52dba9b9-0e10-4f03-b160-3635e78e70f1"id="52dba9b9-0e10-4f03-b160-3635e78e70f1"/>
</vm_numa_node>
</vm_numa_nodes>

Comment 1 Red Hat Bugzilla Rules Engine 2016-12-07 11:24:15 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Martin Sivák 2016-12-07 11:24:26 UTC
Artyom, can you please check whether this is working in 4.0.z using standard RHEL?

Comment 3 Artyom 2016-12-08 08:27:33 UTC
All works fine for versions:
ovirt-engine-setup-plugin-ovirt-engine-4.0.6.1-0.1.el7ev.noarch
vdsm-4.18.17-1.el7ev.x86_64
libvirt-client-1.2.17-13.el7_2.6.x86_64

VM NUMA node REST view:
<vm_numa_nodes>
<vm_numa_node href="/ovirt-engine/api/vms/7ad815d0-4aa7-4797-8587-21d72f9c6094/numanodes/38060e9d-c8f8-4ceb-943f-349ebf58c150" id="38060e9d-c8f8-4ceb-943f-349ebf58c150">
<cpu>
<cores>
<core>
<index>0</index>
</core>
</cores>
</cpu>
<index>0</index>
<memory>1024</memory>
<numa_node_pins>
<numa_node_pin>
<host_numa_node id="f51e3ee0-0d70-4595-8a6c-8878b2ba668b" />
<index>0</index>
<pinned>true</pinned>
 </numa_node_pin>
</numa_node_pins>
<vm href="/ovirt-engine/api/vms/7ad815d0-4aa7-4797-8587-21d72f9c6094" id="7ad815d0-4aa7-4797-8587-21d72f9c6094" />
 </vm_numa_node>
</vm_numa_nodes>

Comment 4 Artyom 2016-12-08 08:28:54 UTC
dumpxml:
<domain type='kvm' id='2'>
  <name>golden_env_mixed_virtio_0</name>
  <uuid>7ad815d0-4aa7-4797-8587-21d72f9c6094</uuid>
  <metadata xmlns:ovirt="http://ovirt.org/vm/tune/1.0">
    <ovirt:qos/>
  </metadata>
  <maxMemory slots='16' unit='KiB'>4294967296</maxMemory>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static' current='1'>16</vcpu>
  <cputune>
    <shares>1020</shares>
    <vcpupin vcpu='0' cpuset='0,2,4,6'/>
  </cputune>
  <numatune>
    <memory mode='preferred' nodeset='0'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>Red Hat</entry>
      <entry name='product'>RHEV Hypervisor</entry>
      <entry name='version'>7.2-9.el7_2.1</entry>
      <entry name='serial'>38393636-3530-5A43-4A34-323030314347</entry>
      <entry name='uuid'>7ad815d0-4aa7-4797-8587-21d72f9c6094</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Opteron_G1</model>
    <topology sockets='16' cores='1' threads='1'/>
    <numa>
      <cell id='0' cpus='0' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='variable' adjustment='0' basis='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
....

Comment 5 Martin Sivák 2016-12-08 08:40:21 UTC
Hi Martin, is it possible there was a change in how libvirt treats the numa modes? Because it seems we did not change anything substantial here between 4.0.x and 4.1.

Comment 6 Martin Kletzander 2016-12-08 12:58:18 UTC
There doesn't seem to be any change in this regard from a quick look.  Is the generated XML on 4.1 same as the one in 4.0.z?  The original is not attached here.

Comment 7 Martin Sivák 2016-12-08 13:13:13 UTC
Artyom? Can you attach the full XML generated by the old and new vdsm?

Comment 8 Artyom 2016-12-08 14:05:01 UTC
I do not have dumpxml for the new VDSM(VM failed to start), but you can find it under attached vdsm log.

For the 4.0 VDSM:
<domain type='kvm' id='3'>
  <name>golden_env_mixed_virtio_0</name>
  <uuid>7ad815d0-4aa7-4797-8587-21d72f9c6094</uuid>
  <metadata xmlns:ovirt="http://ovirt.org/vm/tune/1.0">
    <ovirt:qos/>
  </metadata>
  <maxMemory slots='16' unit='KiB'>4294967296</maxMemory>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>                                                                                                       
  <vcpu placement='static' current='1'>16</vcpu>
  <cputune>
    <shares>1020</shares>
    <vcpupin vcpu='0' cpuset='0,2,4,6'/>
  </cputune>
  <numatune>
    <memory mode='preferred' nodeset='0'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>Red Hat</entry>
      <entry name='product'>RHEV Hypervisor</entry>
      <entry name='version'>7.2-9.el7_2.1</entry>
      <entry name='serial'>38393636-3530-5A43-4A34-323030314347</entry>
      <entry name='uuid'>7ad815d0-4aa7-4797-8587-21d72f9c6094</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Opteron_G1</model>
    <topology sockets='16' cores='1' threads='1'/>
    <numa>
      <cell id='0' cpus='0' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='variable' adjustment='0' basis='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source startupPolicy='optional'/>
      <backingStore/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/9668bd64-7ef4-4d28-b41d-53082122f930/50dd83c5-574a-4f4c-9b05-3a40f3b2e2c5/images/2ba2de1a-3f21-4d55-b35b-900f0d69c71b/25585b31-e02d-4fa7-bf68-9270dc2f5c88'>
        <seclabel model='selinux' labelskip='yes'/>
      </source>
      <backingStore type='file' index='1'>
        <format type='qcow2'/>
        <source file='/rhev/data-center/9668bd64-7ef4-4d28-b41d-53082122f930/50dd83c5-574a-4f4c-9b05-3a40f3b2e2c5/images/2ba2de1a-3f21-4d55-b35b-900f0d69c71b/a18d6f05-ebad-4c61-917b-15ce97bb34de'/>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <serial>2ba2de1a-3f21-4d55-b35b-900f0d69c71b</serial>
      <boot order='1'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0' ports='16'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </controller>
    <controller type='usb' index='0'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='00:1a:4a:16:01:a0'/>
      <source bridge='ovirtmgmt'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/7ad815d0-4aa7-4797-8587-21d72f9c6094.com.redhat.rhevm.vdsm'/>
      <target type='virtio' name='com.redhat.rhevm.vdsm' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/7ad815d0-4aa7-4797-8587-21d72f9c6094.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0' state='disconnected'/>
      <alias name='channel2'/>
      <address type='virtio-serial' controller='0' bus='0' port='3'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' tlsPort='5900' autoport='yes' listen='10.35.117.28' defaultMode='secure' passwdValidTo='1970-01-01T00:00:01'>
      <listen type='network' address='10.35.117.28' network='vdsm-ovirtmgmt'/>
      <channel name='main' mode='secure'/>
      <channel name='display' mode='secure'/>
      <channel name='inputs' mode='secure'/>
      <channel name='cursor' mode='secure'/>
      <channel name='playback' mode='secure'/>
      <channel name='record' mode='secure'/>
      <channel name='smartcard' mode='secure'/>
      <channel name='usbredir' mode='secure'/>
    </graphics>
    <sound model='ich6'>
      <alias name='sound0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='8192' vgamem='16384' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c53,c859</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c53,c859</imagelabel>
  </seclabel>
</domain>

Comment 9 Martin Kletzander 2016-12-08 14:57:58 UTC
Yeah, so the problem is that the newer vdsm (I presume) formats:
  <memory mode="preferred" nodeset="0,1" />
even though the old one did nodeset="0".

The former is wrong, because (as the error message says) you can only have one node preferred.

Comment 10 Doron Fediuck 2017-01-11 11:48:00 UTC
Artyom,
cam you please verify the bug exists in 4.0.z?

Comment 11 Artyom 2017-01-11 13:40:08 UTC
Work fine on:
# rpm -qa | egrep 'vdsm*|libvirt*'
libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.6.x86_64
libvirt-daemon-driver-storage-1.2.17-13.el7_2.6.x86_64
vdsm-xmlrpc-4.18.21-1.el7ev.noarch
libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.6.x86_64
vdsm-infra-4.18.21-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.6.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-daemon-driver-secret-1.2.17-13.el7_2.6.x86_64
vdsm-api-4.18.21-1.el7ev.noarch
vdsm-yajsonrpc-4.18.21-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.18.21-1.el7ev.noarch
vdsm-cli-4.18.21-1.el7ev.noarch
libvirt-daemon-1.2.17-13.el7_2.6.x86_64
libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.6.x86_64
libvirt-daemon-driver-interface-1.2.17-13.el7_2.6.x86_64
libvirt-daemon-kvm-1.2.17-13.el7_2.6.x86_64
vdsm-python-4.18.21-1.el7ev.noarch
vdsm-jsonrpc-4.18.21-1.el7ev.noarch
vdsm-4.18.21-1.el7ev.x86_64
libvirt-daemon-driver-network-1.2.17-13.el7_2.6.x86_64
libvirt-lock-sanlock-1.2.17-13.el7_2.6.x86_64
libvirt-daemon-driver-qemu-1.2.17-13.el7_2.6.x86_64

rhevm-4.0.6.3-0.1.el7ev.noarch

Comment 12 Artyom 2017-01-16 09:08:03 UTC
Verified on vdsm-4.19.1-38.git3c85602.el7.centos.x86_64
<numatune>
    <memnode cellid='0' mode='preferred' nodeset='0'/>
  </numatune>


Note You need to log in before you can comment on or make changes to this bug.