Bug 1481165 - Start VM with a disk resides on Gluster fails with 'UnsupportedType' error
Summary: Start VM with a disk resides on Gluster fails with 'UnsupportedType' error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.20.0
Hardware: x86_64
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.2.0
: ---
Assignee: Francesco Romani
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-14 09:18 UTC by Elad
Modified: 2017-12-20 11:23 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-12-20 11:23:22 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)
logs (12.12 MB, application/x-gzip)
2017-08-14 09:18 UTC, Elad
no flags Details
engine and vdsm logs (93.29 KB, application/x-gzip)
2017-08-21 08:28 UTC, Raz Tamir
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 80699 0 master MERGED virt: metadata: support network storage 2017-08-17 13:32:07 UTC
oVirt gerrit 80851 0 master MERGED metadata: extract helpers 2017-08-21 13:18:52 UTC
oVirt gerrit 80852 0 master ABANDONED virt: metadata: volumeInfo for network drives 2017-08-22 12:29:49 UTC
oVirt gerrit 80916 0 master MERGED virt: metadata: don't (re)store drive.volumeInfo 2017-08-22 16:09:53 UTC

Description Elad 2017-08-14 09:18:38 UTC
Created attachment 1313004 [details]
logs

Description of problem:
Start VM with a disk that resides on a Gluster domain fails with the following error in vdsm.log:

2017-08-14 11:54:23,320+0300 INFO  (libvirt/events) [virt.vm] (vmId='21209778-8acf-4075-9dd7-dbd17db14a92') CPU running: onResume (vm:5566)
2017-08-14 11:54:23,393+0300 ERROR (vm/21209778) [virt.vm] (vmId='21209778-8acf-4075-9dd7-dbd17db14a92') The vm start process failed (vm:853)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 782, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2550, in _run
    self._domDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2264, in _domDependentInit
    self._vmDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2274, in _vmDependentInit
    self._save_legacy_disk_conf_to_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2342, in _save_legacy_disk_conf_to_metadata
    self._sync_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4694, in _sync_metadata
    self._md_desc.dump(self._dom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 382, in dump
    md_xml = self._build_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 552, in _build_xml
    dev_elem = _dump_device(metadata_obj, data)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 609, in _dump_device
    elems.append(md_obj.dump(key, **value))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 198, in dump
    _keyvalue_to_elem(self._add_ns(key), value, elem)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 700, in _keyvalue_to_elem
    raise UnsupportedType(key, value)
UnsupportedType: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster-server01.qa.lab.tlv.redhat.com'}] for hosts
2017-08-14 11:54:23,439+0300 INFO  (vm/21209778) [virt.vm] (vmId='21209778-8acf-4075-9dd7-dbd17db14a92') Changed state to Down: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'glust
er-server01.qa.lab.tlv.redhat.com'}] for hosts (code=1) (vm:1539)
2017-08-14 11:54:23,441+0300 ERROR (vm/21209778) [root] FINISH thread <Thread(vm/21209778, started daemon 140205539231488)> failed (concurrent:198)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 191, in run
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 854, in _startUnderlyingVm
    self.setDownStatus(ERROR, vmexitreason.GENERIC_ERROR, str(e))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1541, in setDownStatus
    self._update_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4691, in _update_metadata
    self._sync_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4694, in _sync_metadata
    self._md_desc.dump(self._dom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 382, in dump
    md_xml = self._build_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 552, in _build_xml
    dev_elem = _dump_device(metadata_obj, data)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 609, in _dump_device
    elems.append(md_obj.dump(key, **value))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 198, in dump
    _keyvalue_to_elem(self._add_ns(key), value, elem)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 700, in _keyvalue_to_elem
    raise UnsupportedType(key, value)
UnsupportedType: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster-server01.qa.lab.tlv.redhat.com'}] for hosts
2017-08-14 11:54:24,809+0300 DEBUG (mailbox-spm) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 dd if=/rhev/data-center/de8c0e96-c0f8-4803-864e-ecf78f2ceb94/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) (commands:70)
2017-08-14 11:54:24,916+0300 DEBUG (mailbox-spm) [storage.Misc.excCmd] SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0667778 s, 15.3 MB/s\n'; <rc> = 0 (commands:94)



Version-Release number of selected component (if applicable):

Hypervisor:
vdsm-jsonrpc-4.20.2-60.git06231e5.el7.centos.noarch
vdsm-api-4.20.2-60.git06231e5.el7.centos.noarch
vdsm-client-4.20.2-60.git06231e5.el7.centos.noarch
vdsm-python-4.20.2-60.git06231e5.el7.centos.noarch
vdsm-hook-vmfex-dev-4.20.2-60.git06231e5.el7.centos.noarch
vdsm-yajsonrpc-4.20.2-60.git06231e5.el7.centos.noarch
vdsm-4.20.2-60.git06231e5.el7.centos.x86_64
vdsm-hook-ethtool-options-4.20.2-60.git06231e5.el7.centos.noarch
vdsm-http-4.20.2-60.git06231e5.el7.centos.noarch
libvirt-3.2.0-14.el7.x86_64
qemu-img-ev-2.6.0-28.el7.10.1.x86_64
qemu-kvm-ev-2.6.0-28.el7.10.1.x86_64
ipxe-roms-qemu-20170123-1.git4e85b27.el7.noarch
qemu-kvm-common-ev-2.6.0-28.el7.10.1.x86_64
sanlock-3.5.0-1.el7.x86_64
selinux-policy-3.13.1-166.el7.noarch
Red Hat Enterprise Linux Server 7.4 (Maipo)
kernel - 3.10.0-663.el7.x86_64

Gluster server:
glusterfs-libs-3.8.14-1.el7.x86_64
glusterfs-client-xlators-3.8.14-1.el7.x86_64
glusterfs-cli-3.8.14-1.el7.x86_64
glusterfs-api-3.8.14-1.el7.x86_64
glusterfs-server-3.8.14-1.el7.x86_64
glusterfs-fuse-3.8.14-1.el7.x86_64
glusterfs-3.8.14-1.el7.x86_64
kernel - 3.10.0-693.el7.x86_64
Red Hat Enterprise Linux Server 7.4 Beta (Maipo)


Engine:
ovirt-engine-4.2.0-0.0.master.20170811144920.gita423008.el7.centos.noarch

How reproducible:
Reproduced with both replica3 and distributed Gluster volumes  

Steps to Reproduce:
1. Create a Gluster storage domain (replica3 or distributed)
2. Create a VM with a disk resides on the Gluster domain
3. Start the VM

Actual results:
Start VM failed

Expected results:
Start VM should succeed

Additional info:
logs with envs that have replica3 and distributed Gluster domains where the bug is reproduced 



Distributed:

2017-08-14 11:54:23,441+0300 ERROR (vm/21209778) [root] FINISH thread <Thread(vm/21209778, started daemon 140205539231488)> failed (concurrent:198)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 191, in run
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 854, in _startUnderlyingVm
    self.setDownStatus(ERROR, vmexitreason.GENERIC_ERROR, str(e))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1541, in setDownStatus
    self._update_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4691, in _update_metadata
    self._sync_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4694, in _sync_metadata
    self._md_desc.dump(self._dom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 382, in dump
    md_xml = self._build_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 552, in _build_xml
    dev_elem = _dump_device(metadata_obj, data)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 609, in _dump_device
    elems.append(md_obj.dump(key, **value))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 198, in dump
    _keyvalue_to_elem(self._add_ns(key), value, elem)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 700, in _keyvalue_to_elem
    raise UnsupportedType(key, value)
UnsupportedType: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster-server01.qa.lab.tlv.redhat.com'}] for hosts
2017-08-14 11:54:24,809+0300 DEBUG (mailbox-spm) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 dd if=/rhev/data-center/de8c0e96-c0f8-4803-864e-ecf78f2ceb94/mastersd/dom_md/inbox ifl
ag=direct,fullblock count=1 bs=1024000 (cwd None) (commands:70)



Replica3:

2017-08-14 11:51:34,433+0300 INFO  (libvirt/events) [virt.vm] (vmId='5a836f8e-94d8-484e-ae1a-a0db4a5694c5') CPU running: onResume (vm:5566)
2017-08-14 11:51:34,502+0300 ERROR (vm/5a836f8e) [virt.vm] (vmId='5a836f8e-94d8-484e-ae1a-a0db4a5694c5') The vm start process failed (vm:853)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 782, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2550, in _run
    self._domDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2264, in _domDependentInit
    self._vmDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2274, in _vmDependentInit
    self._save_legacy_disk_conf_to_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2342, in _save_legacy_disk_conf_to_metadata
    self._sync_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4694, in _sync_metadata
    self._md_desc.dump(self._dom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 382, in dump
    md_xml = self._build_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 552, in _build_xml
    dev_elem = _dump_device(metadata_obj, data)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 609, in _dump_device
    elems.append(md_obj.dump(key, **value))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 198, in dump
    _keyvalue_to_elem(self._add_ns(key), value, elem)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 700, in _keyvalue_to_elem
    raise UnsupportedType(key, value)
UnsupportedType: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster01.scl.lab.tlv.redhat.com'}, {'port': '0', 'transport': 'tcp', 'name': '10.35.83.240'}, {'port': '0', 'transport': 'tcp', 'name': '10.35.83.241'}, {'port': '0', 'transport': 'tcp', 'name': '10.35.83.242'}] for hosts

Comment 1 Elad 2017-08-14 10:29:51 UTC
VM XML:

2017-08-14 11:51:32,652+0300 INFO  (vm/5a836f8e) [virt.vm] (vmId='5a836f8e-94d8-484e-ae1a-a0db4a5694c5') <?xml version='1.0' encoding='UTF-8'?>
<domain xmlns:ns0="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm">
    <name>test</name>
    <uuid>5a836f8e-94d8-484e-ae1a-a0db4a5694c5</uuid>
    <memory>1048576</memory>
    <currentMemory>1048576</currentMemory>
    <maxMemory slots="16">4194304</maxMemory>
    <vcpu current="1">16</vcpu>
    <metadata>
        <ns0:qos />
        <ovirt-vm:vm />
    </metadata>
    <sysinfo type="smbios">
        <system>
            <entry name="manufacturer">oVirt</entry>
            <entry name="product">oVirt Node</entry>
            <entry name="version">7.4-18.el7</entry>
            <entry name="serial">3406735E-57A3-4E51-87E5-7EBB275505EE</entry>
            <entry name="uuid">5a836f8e-94d8-484e-ae1a-a0db4a5694c5</entry>
        </system>
    </sysinfo>
    <clock adjustment="0" offset="variable">
        <timer name="rtc" tickpolicy="catchup" />
        <timer name="pit" tickpolicy="delay" />
        <timer name="hpet" present="no" />
    </clock>
    <features>
        <acpi />
    </features>
    <cpu match="exact">
        <model>Conroe</model>
        <topology cores="1" sockets="16" threads="1" />
        <numa>
            <cell cpus="0" memory="1048576" />
        </numa>
    </cpu>
    <devices>
        <input bus="ps2" type="mouse" />
        <channel type="unix">
            <target name="ovirt-guest-agent.0" type="virtio" />
            <source mode="bind" path="/var/lib/libvirt/qemu/channels/5a836f8e-94d8-484e-ae1a-a0db4a5694c5.ovirt-guest-agent.0" />
        </channel>
        <channel type="unix">
            <target name="org.qemu.guest_agent.0" type="virtio" />
            <source mode="bind" path="/var/lib/libvirt/qemu/channels/5a836f8e-94d8-484e-ae1a-a0db4a5694c5.org.qemu.guest_agent.0" />
        </channel>
        <rng model="virtio">
            <backend model="random">/dev/urandom</backend>
        </rng>
        <controller index="0" model="piix3-uhci" type="usb" />
        <video>
            <model heads="1" ram="65536" type="qxl" vgamem="16384" vram="8192" />
        </video>
        <memballoon model="virtio" />
        <controller index="0" ports="16" type="virtio-serial" />
        <controller index="0" model="virtio-scsi" type="scsi" />
        <channel type="spicevmc">
            <target name="com.redhat.spice.0" type="virtio" />
        </channel>
        <disk device="cdrom" snapshot="no" type="file">
            <source file="" startupPolicy="optional" />
            <target bus="ide" dev="hdc" />
            <readonly />
        </disk>
        <disk device="disk" snapshot="no" type="network">
            <address bus="0" controller="0" target="0" type="drive" unit="0" />
            <source name="storage_local_ge2_volume_0/0dea338b-199a-4bb8-a9ef-e2e953c0448d/images/76fa9207-92d7-47bb-94d6-3cb4545aea5e/e70b2478-883e-4567-8648-5cd683252692" protocol="gluster">
                <host name="gluster01.scl.lab.tlv.redhat.com" port="0" transport="tcp" />
            </source>
            <target bus="scsi" dev="sda" />
            <serial>76fa9207-92d7-47bb-94d6-3cb4545aea5e</serial>
            <boot order="1" />
            <driver cache="none" error_policy="stop" io="threads" name="qemu" type="raw" />
        </disk>
        <graphics autoport="yes" defaultMode="secure" passwd="*****" passwdValidTo="1970-01-01T00:00:01" port="-1" tlsPort="-1" type="spice">
            <listen network="vdsm-ovirtmgmt" type="network" />
        </graphics>
    </devices>
    <os>
        <type arch="x86_64" machine="pc-i440fx-rhel7.3.0">hvm</type>
        <smbios mode="sysinfo" />
    </os>
</domain>
 (vm:2530)

Comment 2 Allon Mureinik 2017-08-14 13:00:56 UTC
Denis, this seems related to the recent libgfapi changes.
Can you take a look please?

Comment 3 Nir Soffer 2017-08-14 19:46:23 UTC
Francesco, gluster support worked fine when we merged it. It seems that recent
changes in the way vm configuration is passed to vdsm are related to this issue.
Can you take a look?

Comment 4 Denis Chaplygin 2017-08-15 12:35:16 UTC
Allon, disk elements seems to be formed correctly. I agree with Nir.

Comment 5 Francesco Romani 2017-08-17 08:31:55 UTC
(In reply to Denis Chaplygin from comment #4)
> Allon, disk elements seems to be formed correctly. I agree with Nir.

It is indeed a virt bug. I believe that https://gerrit.ovirt.org/#/c/80699/2 fixes this regression[1]. I verified this with the relevant configuration snippets - see the tests, but I'd love a doublecheck with a real glusterfs environment, which I don't have handy.

The backstory for this bug is the following.

Vdsm wants to support the VM configuration sent as prebuilt libvirt domain XML Engine. In other words, Engine, not Vdsm, will be in charge of building the libvirt XML.
The storage setup is challenging, so Engine is still sending the storage configuration in the old format, together with the libvirt XML.
Vdsm still consumes the storage configuration in the old format, no changes here.

However, another task closely related to the engine XML effort (not dependent, but each influences and drives the other) is the removal of the recovery files.
We want to have one complete description of the VM in the form of the domain XML, so we don't need the recovery files anymore and we can drop them.

Please note that right now we *did not* remove the recovery files yet.

So Vdsm copies the storage configuration in the domain xml metadata.
To do so, it encodes/decodes the dictionary in XML. *here* is where it fails, hence this bug.


+++

[1] not sure we have this in even our pre-alpha snapshots, but still technically it is a regression.

Comment 6 Francesco Romani 2017-08-17 14:16:12 UTC
patches merged, moving to MODIFIED

Comment 7 Francesco Romani 2017-08-21 08:15:32 UTC
internal change, should not be user visible (once fixed of course), found on alpha snapshot -> no doc string needed

Comment 8 Raz Tamir 2017-08-21 08:27:12 UTC
This bug still here - I used ovirt-engine-4.2.0-0.0.master.20170818152502.gita8c361f.el7.centos.noarch

engine.log:

2017-08-20 14:52:37,028+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-8) [vms_syncAction_55afd08b-a3ed-4e85] EVENT_ID: USER_STARTED_VM(153), VM copy_disk_vm_glusterfs was started by admin@internal-authz (Host: host_mixed_1).
2017-08-20 14:52:45,137+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher] (DefaultQuartzScheduler3) [14b1582] Fetched 1 VMs from VDS 'c274e8bf-4f16-4d7d-ad48-5a53bf2843a7'
2017-08-20 14:52:45,137+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler3) [14b1582] VM 'c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e' was reported as Down on VDS 'c274e8bf-4f16-4d7d-ad48-5a53bf2843a7'(host_mixed_1)
2017-08-20 14:52:45,138+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler3) [14b1582] START, DestroyVDSCommand(HostName = host_mixed_1, DestroyVmVDSCommandParameters:{hostId='c274e8bf-4f16-4d7d-ad48-5a53bf2843a7', vmId='c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 4625cafb
2017-08-20 14:52:47,226+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler3) [14b1582] FINISH, DestroyVDSCommand, log id: 4625cafb
2017-08-20 14:52:47,227+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler3) [14b1582] VM 'c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e'(copy_disk_vm_glusterfs) moved from 'WaitForLaunch' --> 'Down'
2017-08-20 14:52:47,238+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [14b1582] EVENT_ID: VM_DOWN_ERROR(119), VM copy_disk_vm_glusterfs is down with error. Exit message: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster-server02.qa.lab.tlv.redhat.com'}] for hosts.
2017-08-20 14:52:47,239+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler3) [14b1582] add VM 'c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e'(copy_disk_vm_glusterfs) to rerun treatment
2017-08-20 14:52:47,244+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler3) [14b1582] Rerun VM 'c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e'. Called from VDS 'host_mixed_1'
2017-08-20 14:52:47,255+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-10) [14b1582] EVENT_ID: USER_INITIATED_RUN_VM_FAILED(151), Failed to run VM copy_disk_vm_glusterfs on Host host_mixed_1.
2017-08-20 14:52:47,261+03 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-10) [14b1582] Lock Acquired to object 'EngineLock:{exclusiveLocks='[c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e=VM]', sharedLocks=''}'


vdsm.log (host_mixed_1):

2017-08-20 14:52:44,800+0300 INFO  (libvirt/events) [virt.vm] (vmId='c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e') CPU running: onResume (vm:5568)
2017-08-20 14:52:44,843+0300 ERROR (vm/c2eec2c6) [virt.vm] (vmId='c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e') The vm start process failed (vm:853)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 782, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2550, in _run
    self._domDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2264, in _domDependentInit
    self._vmDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2292, in _vmDependentInit
    self._save_legacy_disk_conf_to_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2342, in _save_legacy_disk_conf_to_metadata
    self._sync_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4696, in _sync_metadata
    self._md_desc.dump(self._dom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 395, in dump
    md_xml = self._build_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 565, in _build_xml
    dev_elem = _dump_device(metadata_obj, data)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 623, in _dump_device
    elems.append(md_obj.dump(key, **value))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 211, in dump
    _keyvalue_to_elem(self._add_ns(key), value, elem)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 727, in _keyvalue_to_elem
    raise UnsupportedType(key, value)
UnsupportedType: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster-server02.qa.lab.tlv.redhat.com'}] for hosts
2017-08-20 14:52:44,863+0300 INFO  (vm/c2eec2c6) [virt.vm] (vmId='c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e') Changed state to Down: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster-server02.qa.lab.tlv.redhat.com'}] for hosts (code=1) (vm:1539)
2017-08-20 14:52:44,865+0300 ERROR (vm/c2eec2c6) [root] FINISH thread <Thread(vm/c2eec2c6, started daemon 140143843243776)> failed (concurrent:198)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 191, in run
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 854, in _startUnderlyingVm
    self.setDownStatus(ERROR, vmexitreason.GENERIC_ERROR, str(e))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1541, in setDownStatus
    self._update_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4693, in _update_metadata
    self._sync_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4696, in _sync_metadata
    self._md_desc.dump(self._dom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 395, in dump
    md_xml = self._build_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 565, in _build_xml
    dev_elem = _dump_device(metadata_obj, data)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 623, in _dump_device
    elems.append(md_obj.dump(key, **value))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 211, in dump
    _keyvalue_to_elem(self._add_ns(key), value, elem)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 727, in _keyvalue_to_elem
    raise UnsupportedType(key, value)
UnsupportedType: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster-server02.qa.lab.tlv.redhat.com'}] for hosts


log attached

Comment 9 Raz Tamir 2017-08-21 08:27:29 UTC
This bug still here - I used ovirt-engine-4.2.0-0.0.master.20170818152502.gita8c361f.el7.centos.noarch

engine.log:

2017-08-20 14:52:37,028+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-8) [vms_syncAction_55afd08b-a3ed-4e85] EVENT_ID: USER_STARTED_VM(153), VM copy_disk_vm_glusterfs was started by admin@internal-authz (Host: host_mixed_1).
2017-08-20 14:52:45,137+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher] (DefaultQuartzScheduler3) [14b1582] Fetched 1 VMs from VDS 'c274e8bf-4f16-4d7d-ad48-5a53bf2843a7'
2017-08-20 14:52:45,137+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler3) [14b1582] VM 'c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e' was reported as Down on VDS 'c274e8bf-4f16-4d7d-ad48-5a53bf2843a7'(host_mixed_1)
2017-08-20 14:52:45,138+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler3) [14b1582] START, DestroyVDSCommand(HostName = host_mixed_1, DestroyVmVDSCommandParameters:{hostId='c274e8bf-4f16-4d7d-ad48-5a53bf2843a7', vmId='c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 4625cafb
2017-08-20 14:52:47,226+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler3) [14b1582] FINISH, DestroyVDSCommand, log id: 4625cafb
2017-08-20 14:52:47,227+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler3) [14b1582] VM 'c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e'(copy_disk_vm_glusterfs) moved from 'WaitForLaunch' --> 'Down'
2017-08-20 14:52:47,238+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [14b1582] EVENT_ID: VM_DOWN_ERROR(119), VM copy_disk_vm_glusterfs is down with error. Exit message: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster-server02.qa.lab.tlv.redhat.com'}] for hosts.
2017-08-20 14:52:47,239+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler3) [14b1582] add VM 'c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e'(copy_disk_vm_glusterfs) to rerun treatment
2017-08-20 14:52:47,244+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (DefaultQuartzScheduler3) [14b1582] Rerun VM 'c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e'. Called from VDS 'host_mixed_1'
2017-08-20 14:52:47,255+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-10) [14b1582] EVENT_ID: USER_INITIATED_RUN_VM_FAILED(151), Failed to run VM copy_disk_vm_glusterfs on Host host_mixed_1.
2017-08-20 14:52:47,261+03 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (org.ovirt.thread.EE-ManagedThreadFactory-engine-Thread-10) [14b1582] Lock Acquired to object 'EngineLock:{exclusiveLocks='[c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e=VM]', sharedLocks=''}'


vdsm.log (host_mixed_1):

2017-08-20 14:52:44,800+0300 INFO  (libvirt/events) [virt.vm] (vmId='c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e') CPU running: onResume (vm:5568)
2017-08-20 14:52:44,843+0300 ERROR (vm/c2eec2c6) [virt.vm] (vmId='c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e') The vm start process failed (vm:853)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 782, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2550, in _run
    self._domDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2264, in _domDependentInit
    self._vmDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2292, in _vmDependentInit
    self._save_legacy_disk_conf_to_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2342, in _save_legacy_disk_conf_to_metadata
    self._sync_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4696, in _sync_metadata
    self._md_desc.dump(self._dom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 395, in dump
    md_xml = self._build_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 565, in _build_xml
    dev_elem = _dump_device(metadata_obj, data)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 623, in _dump_device
    elems.append(md_obj.dump(key, **value))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 211, in dump
    _keyvalue_to_elem(self._add_ns(key), value, elem)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 727, in _keyvalue_to_elem
    raise UnsupportedType(key, value)
UnsupportedType: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster-server02.qa.lab.tlv.redhat.com'}] for hosts
2017-08-20 14:52:44,863+0300 INFO  (vm/c2eec2c6) [virt.vm] (vmId='c2eec2c6-ed5c-463d-94cc-a6dc9e00dd7e') Changed state to Down: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster-server02.qa.lab.tlv.redhat.com'}] for hosts (code=1) (vm:1539)
2017-08-20 14:52:44,865+0300 ERROR (vm/c2eec2c6) [root] FINISH thread <Thread(vm/c2eec2c6, started daemon 140143843243776)> failed (concurrent:198)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 191, in run
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 854, in _startUnderlyingVm
    self.setDownStatus(ERROR, vmexitreason.GENERIC_ERROR, str(e))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1541, in setDownStatus
    self._update_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4693, in _update_metadata
    self._sync_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4696, in _sync_metadata
    self._md_desc.dump(self._dom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 395, in dump
    md_xml = self._build_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 565, in _build_xml
    dev_elem = _dump_device(metadata_obj, data)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 623, in _dump_device
    elems.append(md_obj.dump(key, **value))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 211, in dump
    _keyvalue_to_elem(self._add_ns(key), value, elem)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 727, in _keyvalue_to_elem
    raise UnsupportedType(key, value)
UnsupportedType: Unsupported [{'port': '0', 'transport': 'tcp', 'name': 'gluster-server02.qa.lab.tlv.redhat.com'}] for hosts


log attached

Comment 10 Raz Tamir 2017-08-21 08:28:05 UTC
Created attachment 1316161 [details]
engine and vdsm logs

Comment 11 Francesco Romani 2017-08-21 11:41:09 UTC
(In reply to Raz Tamir from comment #8)
> This bug still here - I used
> ovirt-engine-4.2.0-0.0.master.20170818152502.gita8c361f.el7.centos.noarch

It indeed was. I was looking in the wrong place. The issue was that the 'volumeInfo' key may in turn contain the nested 'hosts' key for network drives.
Not sure we need this data, but we will keep it around anyway.

Why it was gone unnoticed:
- not documented in the schema, and this makes sense, it is used internally by Vdsm. This is yet another case of Vm.conf abused as data store. Storage is not more guilty than anyone else here.
- debug log for metadata needs to be improved, on it.


scratch build RPMs: http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-on-demand-el7-x86_64/482/

I verified my last patch (80852) on the affected host_mixed_1, now a VM with a couple of glusterfs disk starts.

Comment 12 Nir Soffer 2017-08-21 18:22:48 UTC
(In reply to Francesco Romani from comment #11)
> (In reply to Raz Tamir from comment #8)
> > This bug still here - I used
> > ovirt-engine-4.2.0-0.0.master.20170818152502.gita8c361f.el7.centos.noarch
> 
> It indeed was. I was looking in the wrong place. The issue was that the
> 'volumeInfo' key may in turn contain the nested 'hosts' key for network
> drives.
> Not sure we need this data, but we will keep it around anyway.

I think we don't need Drive.volumeInfo please remove it instead of adding it to 
the vm metadata.

Comment 13 Francesco Romani 2017-08-22 08:42:51 UTC
(In reply to Nir Soffer from comment #12)
> (In reply to Francesco Romani from comment #11)
> > (In reply to Raz Tamir from comment #8)
> > > This bug still here - I used
> > > ovirt-engine-4.2.0-0.0.master.20170818152502.gita8c361f.el7.centos.noarch
> > 
> > It indeed was. I was looking in the wrong place. The issue was that the
> > 'volumeInfo' key may in turn contain the nested 'hosts' key for network
> > drives.
> > Not sure we need this data, but we will keep it around anyway.
> 
> I think we don't need Drive.volumeInfo please remove it instead of adding it
> to 
> the vm metadata.

Very interesting Nir, but I need just one more clarification:
You mean that drive.volumeInfo and its counterpart in drive config (vm.conf) is useless nowadays? I had strong suspects in that direction too, I can't find a valid usage for it (or any usage at all).
If you can confirm this, I will happily get rid of it very quickly.

Comment 14 Nir Soffer 2017-08-22 10:27:57 UTC
Drive.volumeInfo is leftover from historic gluster support removed before ny time. You can safely remove it. All the info is included now in Drive.volumeChain and other drive attributes.

Comment 15 Francesco Romani 2017-08-23 14:05:31 UTC
Should be fixed by http://gerrit.ovirt.org/80916 - waiting for CI to confirm. Will move to modified afterwards

Comment 16 Francesco Romani 2017-08-24 10:23:01 UTC
the HC basic test suite uses glusterfs storage. Checked that the suite passes with a build which includes 80916:
http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/1007/console

We don't need more patches, hence moving again to MODIFIED

Comment 17 Raz Tamir 2017-08-27 07:47:58 UTC
Verified on ovirt-engine-4.2.0-0.0.master.20170826130128.git27024d1.el7.centos

VM with disk on gluster storage domain was successfully started

Comment 18 Sandro Bonazzola 2017-12-20 11:23:22 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.