Bug 1524941 - Sometimes failed to start VM with direct lun disk on cluster 4.1 with engine 4.2
Summary: Sometimes failed to start VM with direct lun disk on cluster 4.1 with engine 4.2
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.20.15
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.2.2
: ---
Assignee: Francesco Romani
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-12 10:33 UTC by Israel Pinto
Modified: 2019-04-28 13:18 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-02-26 10:20:34 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.2+
ylavi: blocker-
ylavi: exception+


Attachments (Terms of Use)
direct lun event log sceenshot (225.43 KB, image/png)
2017-12-12 10:33 UTC, Israel Pinto
no flags Details
engine_log (10.29 MB, text/plain)
2017-12-12 11:51 UTC, Israel Pinto
no flags Details
host_42 (2.39 MB, application/x-xz)
2017-12-12 11:52 UTC, Israel Pinto
no flags Details
host_41 (2.91 MB, application/x-xz)
2017-12-12 11:53 UTC, Israel Pinto
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 86154 0 master MERGED clientIF: better exceptions on storage failure 2020-12-08 21:53:35 UTC

Description Israel Pinto 2017-12-12 10:33:56 UTC
Created attachment 1366546 [details]
direct lun event log sceenshot

Description of problem:
The setup is 4.2 engine with cluster 4.1 with 2 host: 4.1 and 4.2
Failed to start VM with direct lun.
On both hosts 4.1 and 4.2

The VM was running with previous engine version see attached events screenshot. 
VM name: direct_lun
VM id:fc4e9abb-73ea-4df7-bceb-1512a9bfd292

Version-Release number of selected component (if applicable):
Engine: Software Version:4.2.1-0.0.master.20171211205712.git7b1f4d1.el7.centos	
vdsm 4.1: vdsm-4.19.42-1.el7ev.x86_64	
vdsm 4.2: vdsm-4.20.9-30.gite026991.el7.centos


How reproducible:
all the time


Additional info:
vdsm log:
Failing with error:
2017-12-12 11:18:46,757+0200 ERROR (vm/fc4e9abb) [virt.vm] (vmId='fc4e9abb-73ea-4df7-bceb-1512a9bfd292') The vm start process failed (vm:631)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 2060, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1006, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3567, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error: qemu unexpectedly closed the monitor: oon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on
2017-12-12T09:18:46.593930Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2017-12-12T09:18:46.594063Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config
2017-12-12T09:18:46.607677Z qemu-kvm: -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1: cannot get SG_IO version number: Input/output error.  Is this a SCSI device?

logs (Engine, vdsm logs from both hosts):
https://drive.google.com/open?id=1CbcEOg41_i8_Ni7mtuzoH_M2xaXrWBVD

Comment 2 Tal Nisan 2017-12-12 11:08:55 UTC
Engine log, VDSM log?

Comment 3 Israel Pinto 2017-12-12 11:37:46 UTC
logs (Engine, vdsm logs from both hosts):
https://drive.google.com/open?id=1CbcEOg41_i8_Ni7mtuzoH_M2xaXrWBVD

Comment 4 Tal Nisan 2017-12-12 11:45:46 UTC
(In reply to Israel Pinto from comment #3)
> logs (Engine, vdsm logs from both hosts):
> https://drive.google.com/open?id=1CbcEOg41_i8_Ni7mtuzoH_M2xaXrWBVD

Why not in Bugzilla? They are not too big and this link will disappear at some point

Comment 5 Israel Pinto 2017-12-12 11:51:11 UTC
Created attachment 1366579 [details]
engine_log

Comment 6 Israel Pinto 2017-12-12 11:52:58 UTC
Created attachment 1366580 [details]
host_42

Comment 7 Israel Pinto 2017-12-12 11:53:52 UTC
Created attachment 1366581 [details]
host_41

Comment 9 Raz Tamir 2017-12-12 12:07:24 UTC
Hi,

I tested it on my environment and seems like it doesn't reproduce very easily.
The error "libvirtError: internal error: qemu unexpectedly closed the monitor" is very similar to bug #1497511 which is not reproduced 100% as well

Comment 10 Tal Nisan 2017-12-12 12:22:10 UTC
Seems like a virt issue, Tomas can you guys please have a look?

Comment 11 Tomas Jelinek 2017-12-12 14:39:08 UTC
Are you sure? For me it looks a lot like the root cause is:
2017-12-12T09:18:46.607677Z qemu-kvm: -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1: cannot get SG_IO version number: Input/output error.  Is this a SCSI device?

Can please provide some more info?

Comment 12 Yaniv Kaul 2017-12-12 18:35:31 UTC
Does it work with a cluster level 4.2? Can you compare the qemu command line between the two?

Comment 13 Israel Pinto 2017-12-12 19:32:36 UTC
I test it in host 42 in cluster 42 - VM is running 
recheck with same host 42 in cluster 41 - VM is running
So it not 100% reproduce, maybe the switch hosts to maintenance help the VM to run.

See start command for each host case:

Failed to start on host_41:

2017-12-12 11:18:46,241+0200 INFO  (jsonrpc/1) [vdsm.api] START __init__(cif=<clientIF.clientIF object at 0x3565cd0>, params={u'acpiEnable': u'true', u'emulatedMachine': u'pc-i440fx-rhel7.3.0', u'vmId': u'fc4e9abb-73ea-4df7-bceb-1512a9bfd292', u'memGuaranteedSize': 1024, u'transparentHugePages': u'true', u'timeOffset': u'0', u'cpuType': u'Conroe', u'smp': u'1', u'guestNumaNodes': [{u'nodeIndex': 0, u'cpus': u'0', u'memory': u'1024'}], u'custom': {}, u'vmType': u'kvm', u'memSize': 1024, u'smpCoresPerSocket': u'1', u'vmName': u'direct_lun', u'nice': u'0', u'maxMemSize': 4096, u'bootMenuEnable': u'true', u'smpThreadsPerCore': u'1', u'smartcardEnable': u'false', u'kvmEnable': u'true', u'displayNetwork': u'ovirtmgmt', u'devices': [{u'device': u'qxl', u'specParams': {u'vram': u'8192', u'vgamem': u'16384', u'heads': u'1', u'ram': u'65536'}, u'type': u'video', u'deviceId': u'473aa92c-5013-4633-a507-f096e0f2e0a5', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x02'}}, {u'device': u'spice', u'specParams': {u'fileTransferEnable': u'true', u'spiceSecureChannels': u'smain,sinputs,scursor,splayback,srecord,sdisplay,ssmartcard,susbredir', u'copyPasteEnable': u'true'}, u'type': u'graphics', u'deviceId': u'b63ad6a4-e84f-44c0-a85f-44ac5bf1d794'}, {u'index': u'2', u'iface': u'ide', u'specParams': {u'path': u''}, u'readonly': u'true', u'deviceId': u'a18b377a-19fe-4354-96ff-89c10ea82690', u'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'cdrom', u'shared': u'false', u'path': u'', u'type': u'disk'}, {u'index': 0, u'propagateErrors': u'off', u'format': u'raw', u'shared': u'false', u'type': u'disk', u'readonly': u'false', u'bootOrder': u'1', u'specParams': {}, u'sgio': u'filtered', u'iface': u'scsi', u'deviceId': u'7ebe6471-2ab3-4e5d-b6e8-591c84683593', u'address': {u'bus': u'0', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'lun', u'discard': False, u'GUID': u'3514f0c5a51601089', u'optional': u'false'}, {u'nicModel': u'pv', u'macAddr': u'00:1a:4a:16:01:0d', u'linkActive': u'true', u'network': u'ovirtmgmt', u'filterParameters': [], u'bootOrder': u'2', u'filter': u'vdsm-no-mac-spoofing', u'specParams': {u'inbound': {}, u'outbound': {}}, u'deviceId': u'c887a816-bd5f-4364-8307-ac9886ec25ce', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x03'}, u'device': u'bridge', u'type': u'interface'}, {u'index': u'0', u'specParams': {}, u'deviceId': u'3bdd9058-ba02-4965-b64f-564a1e2231af', u'address': {u'function': u'0x2', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x01'}, u'device': u'usb', u'model': u'piix3-uhci', u'type': u'controller'}, {u'device': u'memballoon', u'specParams': {u'model': u'virtio'}, u'type': u'balloon', u'deviceId': u'0111c076-d73c-4cc3-a96f-70aae4709722', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x06'}}, {u'index': u'0', u'specParams': {}, u'deviceId': u'3879a950-328e-45b8-9532-6b61e67e2e2f', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x04'}, u'device': u'scsi', u'model': u'virtio-scsi', u'type': u'controller'}, {u'device': u'virtio-serial', u'specParams': {}, u'type': u'controller', u'deviceId': u'b8d0e88e-0b2d-467c-8749-86498fb134f9', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x05'}}, {u'device': u'virtio', u'specParams': {u'source': u'urandom'}, u'model': u'virtio', u'type': u'rng', u'deviceId': u'fcfbfa8f-73cd-451b-980a-361b52da7e53'}], u'maxVCpus': u'16', u'maxMemSlots': 16}, recover=False) from=::ffff:10.35.161.73,33170, flow_id=a51925cf-c0da-49bd-afb0-e0a5ed040136 (api:46)


Start on host_42 in cluster 41:

2017-12-12 21:09:17,160+0200 INFO  (periodic/0) [vdsm.api] START repoStats(domains=()) from=internal, task_id=0e906a50-926c-4459-b2af-67d89fa0096b (api:46)
2017-12-12 21:09:17,160+0200 INFO  (periodic/0) [vdsm.api] FINISH repoStats return={u'32a13ee7-0ae6-40a5-95d4-d2064a749232': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000500592', 'lastCheck': '0.8', 'valid': True}, u'7fe6cce9-9a4c-4464-a260-85d305c2cb5d': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000508362', 'lastCheck': '0.2', 'valid': True}, u'7f309f60-80fb-47d6-8575-c5fc513f8175': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000411989', 'lastCheck': '2.7', 'valid': True}} from=internal, task_id=0e906a50-926c-4459-b2af-67d89fa0096b (api:52)
2017-12-12 21:24:26,695+0200 INFO  (jsonrpc/4) [vdsm.api] START connectStorageServer(domType=3, spUUID=u'3bbaa479-cba5-4c85-bc95-4c93f43aa299', conList=[{u'id': u'9cb1777b-34be-4b07-8a9d-9c13b84a823f', u'connection': u'10.35.146.225', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}, {u'id': u'b5995565-8852-4181-8cd2-fa499b04cc08', u'connection': u'10.35.146.193', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c04', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}, {u'id': u'b9d7a867-2a9d-46e0-a772-664daa6a3787', u'connection': u'10.35.146.129', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}, {u'id': u'eedf848a-981a-4aef-8225-5a6685cdb882', u'connection': u'10.35.146.161', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c01', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}], options=None) from=::ffff:10.35.161.73,45296, flow_id=0a4350b5-caa3-40ce-9b6c-ccccfcdf906a, task_id=58eb4bee-590e-485f-9e24-ee41799372a0 (api:46)
2017-12-12 21:24:28,658+0200 INFO  (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'status': 0, 'id': u'9cb1777b-34be-4b07-8a9d-9c13b84a823f'}, {'status': 0, 'id': u'b5995565-8852-4181-8cd2-fa499b04cc08'}, {'status': 0, 'id': u'b9d7a867-2a9d-46e0-a772-664daa6a3787'}, {'status': 0, 'id': u'eedf848a-981a-4aef-8225-5a6685cdb882'}]} from=::ffff:10.35.161.73,45296, flow_id=0a4350b5-caa3-40ce-9b6c-ccccfcdf906a, task_id=58eb4bee-590e-485f-9e24-ee41799372a0 (api:52)
2017-12-12 21:24:29,007+0200 INFO  (jsonrpc/1) [api.virt] START create(vmParams={u'acpiEnable': u'true', u'emulatedMachine': u'pc-i440fx-rhel7.3.0', u'vmId': u'fc4e9abb-73ea-4df7-bceb-1512a9bfd292', u'memGuaranteedSize': 1024, u'transparentHugePages': u'true', u'timeOffset': u'0', u'cpuType': u'Conroe', u'smp': u'1', u'guestNumaNodes': [{u'nodeIndex': 0, u'cpus': u'0', u'memory': u'1024'}], u'custom': {u'device_5a21c9e6-0064-4084-b820-df84ff17358ddevice_3f37f8c1-5d88-4f20-b123-c08841121f8bdevice_4510f78f-6803-4e87-8e96-94905a782cb4device_f13b8d2f-67ab-4c92-9f19-3a7ef05405f6': u"VmDevice:{id='VmDeviceId:{deviceId='f13b8d2f-67ab-4c92-9f19-3a7ef05405f6', vmId='fc4e9abb-73ea-4df7-bceb-1512a9bfd292'}', device='spicevmc', type='CHANNEL', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=3}', managed='false', plugged='true', readOnly='false', deviceAlias='channel2', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_5a21c9e6-0064-4084-b820-df84ff17358d': u"VmDevice:{id='VmDeviceId:{deviceId='5a21c9e6-0064-4084-b820-df84ff17358d', vmId='fc4e9abb-73ea-4df7-bceb-1512a9bfd292'}', device='ide', type='CONTROLLER', specParams='[]', address='{slot=0x01, bus=0x00, domain=0x0000, type=pci, function=0x1}', managed='false', plugged='true', readOnly='false', deviceAlias='ide', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_5a21c9e6-0064-4084-b820-df84ff17358ddevice_3f37f8c1-5d88-4f20-b123-c08841121f8bdevice_4510f78f-6803-4e87-8e96-94905a782cb4': u"VmDevice:{id='VmDeviceId:{deviceId='4510f78f-6803-4e87-8e96-94905a782cb4', vmId='fc4e9abb-73ea-4df7-bceb-1512a9bfd292'}', device='unix', type='CHANNEL', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=2}', managed='false', plugged='true', readOnly='false', deviceAlias='channel1', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_5a21c9e6-0064-4084-b820-df84ff17358ddevice_3f37f8c1-5d88-4f20-b123-c08841121f8b': u"VmDevice:{id='VmDeviceId:{deviceId='3f37f8c1-5d88-4f20-b123-c08841121f8b', vmId='fc4e9abb-73ea-4df7-bceb-1512a9bfd292'}', device='unix', type='CHANNEL', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=1}', managed='false', plugged='true', readOnly='false', deviceAlias='channel0', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}"}, u'vmType': u'kvm', u'memSize': 1024, u'smpCoresPerSocket': u'1', u'vmName': u'direct_lun', u'nice': u'0', u'maxMemSize': 4096, u'bootMenuEnable': u'false', u'smpThreadsPerCore': u'1', u'smartcardEnable': u'false', u'kvmEnable': u'true', u'displayNetwork': u'ovirtmgmt', u'devices': [{u'device': u'qxl', u'specParams': {u'vram': u'8192', u'vgamem': u'16384', u'heads': u'1', u'ram': u'65536'}, u'type': u'video', u'deviceId': u'473aa92c-5013-4633-a507-f096e0f2e0a5', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x02'}}, {u'device': u'spice', u'specParams': {u'fileTransferEnable': u'true', u'spiceSecureChannels': u'smain,sinputs,scursor,splayback,srecord,sdisplay,ssmartcard,susbredir', u'copyPasteEnable': u'true'}, u'type': u'graphics', u'deviceId': u'0d73f91d-3f1b-4b4e-ae51-e900cdee128a'}, {u'index': u'2', u'iface': u'ide', u'specParams': {u'path': u''}, u'readonly': u'true', u'deviceId': u'a18b377a-19fe-4354-96ff-89c10ea82690', u'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'cdrom', u'shared': u'false', u'path': u'', u'type': u'disk'}, {u'index': 0, u'propagateErrors': u'off', u'format': u'raw', u'shared': u'false', u'type': u'disk', u'readonly': u'false', u'bootOrder': u'1', u'specParams': {}, u'sgio': u'filtered', u'iface': u'scsi', u'deviceId': u'7ebe6471-2ab3-4e5d-b6e8-591c84683593', u'address': {u'bus': u'0', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'lun', u'discard': False, u'GUID': u'3514f0c5a51601089', u'optional': u'false'}, {u'nicModel': u'pv', u'macAddr': u'00:1a:4a:16:01:0d', u'linkActive': u'true', u'network': u'ovirtmgmt', u'filterParameters': [], u'filter': u'vdsm-no-mac-spoofing', u'specParams': {u'inbound': {}, u'outbound': {}}, u'deviceId': u'c887a816-bd5f-4364-8307-ac9886ec25ce', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x03'}, u'device': u'bridge', u'type': u'interface'}, {u'index': u'0', u'specParams': {}, u'deviceId': u'3bdd9058-ba02-4965-b64f-564a1e2231af', u'address': {u'function': u'0x2', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x01'}, u'device': u'usb', u'model': u'piix3-uhci', u'type': u'controller'}, {u'device': u'memballoon', u'specParams': {u'model': u'virtio'}, u'type': u'balloon', u'deviceId': u'0111c076-d73c-4cc3-a96f-70aae4709722', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x06'}}, {u'index': u'0', u'specParams': {}, u'deviceId': u'81d4211e-b929-4b76-99b3-a6769ddc081a', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x04'}, u'device': u'scsi', u'model': u'virtio-scsi', u'type': u'controller'}, {u'device': u'virtio-serial', u'specParams': {}, u'type': u'controller', u'deviceId': u'b8d0e88e-0b2d-467c-8749-86498fb134f9', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x05'}}, {u'device': u'virtio', u'specParams': {u'source': u'urandom'}, u'model': u'virtio', u'type': u'rng', u'deviceId': u'fcfbfa8f-73cd-451b-980a-361b52da7e53'}], u'maxVCpus': u'16', u'maxMemSlots': 16}) from=::ffff:10.35.161.73,45296, flow_id=0a4350b5-caa3-40ce-9b6c-ccccfcdf906a (api:46)
2017-12-12 21:24:29,012+0200 INFO  (jsonrpc/1) [api.virt] FINISH create return={'status': {'message': 'Done', 'code': 0}, 'vmList': {u'acpiEnable': u'true', u'emulatedMachine': u'pc-i440fx-rhel7.3.0', u'vmId': 'fc4e9abb-73ea-4df7-bceb-1512a9bfd292', 'guestDiskMapping': {}, u'transparentHugePages': u'true', u'timeOffset': u'0', u'cpuType': u'Conroe', u'smp': u'1', u'guestNumaNodes': [{u'nodeIndex': 0, u'cpus': u'0', u'memory': u'1024'}], u'smartcardEnable': u'false', u'custom': {u'device_5a21c9e6-0064-4084-b820-df84ff17358ddevice_3f37f8c1-5d88-4f20-b123-c08841121f8bdevice_4510f78f-6803-4e87-8e96-94905a782cb4device_f13b8d2f-67ab-4c92-9f19-3a7ef05405f6': u"VmDevice:{id='VmDeviceId:{deviceId='f13b8d2f-67ab-4c92-9f19-3a7ef05405f6', vmId='fc4e9abb-73ea-4df7-bceb-1512a9bfd292'}', device='spicevmc', type='CHANNEL', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=3}', managed='false', plugged='true', readOnly='false', deviceAlias='channel2', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_5a21c9e6-0064-4084-b820-df84ff17358d': u"VmDevice:{id='VmDeviceId:{deviceId='5a21c9e6-0064-4084-b820-df84ff17358d', vmId='fc4e9abb-73ea-4df7-bceb-1512a9bfd292'}', device='ide', type='CONTROLLER', specParams='[]', address='{slot=0x01, bus=0x00, domain=0x0000, type=pci, function=0x1}', managed='false', plugged='true', readOnly='false', deviceAlias='ide', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_5a21c9e6-0064-4084-b820-df84ff17358ddevice_3f37f8c1-5d88-4f20-b123-c08841121f8bdevice_4510f78f-6803-4e87-8e96-94905a782cb4': u"VmDevice:{id='VmDeviceId:{deviceId='4510f78f-6803-4e87-8e96-94905a782cb4', vmId='fc4e9abb-73ea-4df7-bceb-1512a9bfd292'}', device='unix', type='CHANNEL', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=2}', managed='false', plugged='true', readOnly='false', deviceAlias='channel1', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_5a21c9e6-0064-4084-b820-df84ff17358ddevice_3f37f8c1-5d88-4f20-b123-c08841121f8b': u"VmDevice:{id='VmDeviceId:{deviceId='3f37f8c1-5d88-4f20-b123-c08841121f8b', vmId='fc4e9abb-73ea-4df7-bceb-1512a9bfd292'}', device='unix', type='CHANNEL', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=1}', managed='false', plugged='true', readOnly='false', deviceAlias='channel0', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}"}, u'vmType': u'kvm', u'memSize': 1024, u'smpCoresPerSocket': u'1', u'vmName': 'direct_lun', u'nice': u'0', 'status': 'WaitForLaunch', u'maxMemSize': 4096, u'bootMenuEnable': u'false', u'smpThreadsPerCore': u'1', u'memGuaranteedSize': 1024, 'arch': 'x86_64', u'kvmEnable': u'true', u'displayNetwork': u'ovirtmgmt', 'devices': [{u'device': u'qxl', u'specParams': {u'vram': u'8192', u'vgamem': u'16384', u'heads': u'1', u'ram': u'65536'}, u'type': u'video', u'deviceId': u'473aa92c-5013-4633-a507-f096e0f2e0a5', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x02'}}, {u'device': u'spice', u'specParams': {u'fileTransferEnable': u'true', u'spiceSecureChannels': u'smain,sinputs,scursor,splayback,srecord,sdisplay,ssmartcard,susbredir', u'copyPasteEnable': u'true'}, u'type': u'graphics', u'deviceId': u'0d73f91d-3f1b-4b4e-ae51-e900cdee128a'}, {u'index': u'2', u'iface': u'ide', u'specParams': {u'path': u''}, u'readonly': u'true', u'deviceId': u'a18b377a-19fe-4354-96ff-89c10ea82690', u'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'cdrom', u'shared': u'false', u'path': u'', u'type': u'disk'}, {u'index': 0, u'iface': u'scsi', u'format': u'raw', u'bootOrder': u'1', u'discard': False, u'sgio': u'filtered', u'GUID': u'3514f0c5a51601089', u'specParams': {}, u'readonly': u'false', u'optional': u'false', u'deviceId': u'7ebe6471-2ab3-4e5d-b6e8-591c84683593', u'address': {u'bus': u'0', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'lun', u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'}, {u'nicModel': u'pv', u'macAddr': u'00:1a:4a:16:01:0d', u'linkActive': u'true', u'network': u'ovirtmgmt', u'filterParameters': [], u'filter': u'vdsm-no-mac-spoofing', u'specParams': {u'inbound': {}, u'outbound': {}}, u'deviceId': u'c887a816-bd5f-4364-8307-ac9886ec25ce', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x03'}, u'device': u'bridge', u'type': u'interface'}, {u'index': u'0', u'specParams': {}, u'deviceId': u'3bdd9058-ba02-4965-b64f-564a1e2231af', u'address': {u'function': u'0x2', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x01'}, u'device': u'usb', u'model': u'piix3-uhci', u'type': u'controller'}, {u'device': u'memballoon', u'specParams': {u'model': u'virtio'}, u'type': u'balloon', u'deviceId': u'0111c076-d73c-4cc3-a96f-70aae4709722', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x06'}}, {u'index': u'0', u'specParams': {}, u'deviceId': u'81d4211e-b929-4b76-99b3-a6769ddc081a', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x04'}, u'device': u'scsi', u'model': u'virtio-scsi', u'type': u'controller'}, {u'device': u'virtio-serial', u'specParams': {}, u'type': u'controller', u'deviceId': u'b8d0e88e-0b2d-467c-8749-86498fb134f9', u'address': {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'slot': u'0x05'}}, {u'device': u'virtio', u'specParams': {u'source': u'urandom'}, u'model': u'virtio', u'type': u'rng', u'deviceId': u'fcfbfa8f-73cd-451b-980a-361b52da7e53'}], u'maxVCpus': u'16', 'clientIp': '', 'statusTime': '4407859670', u'maxMemSlots': 16}} from=::ffff:10.35.161.73,45296, flow_id=0a4350b5-caa3-40ce-9b6c-ccccfcdf906a (api:52)


Start on host_42 in cluster 42:

2017-12-12 21:02:39,343+0200 INFO  (jsonrpc/1) [vdsm.api] START connectStorageServer(domType=3, spUUID=u'3bbaa479-cba5-4c85-bc95-4c93f43aa299', conList=[{u'id': u'9cb1777b-34be-4b07-8a9d-9c13b84a823f', u'connection': u'10.35.146.225', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}, {u'id': u'b5995565-8852-4181-8cd2-fa499b04cc08', u'connection': u'10.35.146.193', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c04', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}, {u'id': u'b9d7a867-2a9d-46e0-a772-664daa6a3787', u'connection': u'10.35.146.129', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}, {u'id': u'eedf848a-981a-4aef-8225-5a6685cdb882', u'connection': u'10.35.146.161', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c01', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}], options=None) from=::ffff:10.35.161.73,45164, flow_id=0a986482-c804-4827-b3af-a48ba1b07b53, task_id=6c7a2bbd-5b0a-47f5-a36c-a19119aaf879 (api:46)
2017-12-12 21:02:41,235+0200 INFO  (jsonrpc/1) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'status': 0, 'id': u'9cb1777b-34be-4b07-8a9d-9c13b84a823f'}, {'status': 0, 'id': u'b5995565-8852-4181-8cd2-fa499b04cc08'}, {'status': 0, 'id': u'b9d7a867-2a9d-46e0-a772-664daa6a3787'}, {'status': 0, 'id': u'eedf848a-981a-4aef-8225-5a6685cdb882'}]} from=::ffff:10.35.161.73,45164, flow_id=0a986482-c804-4827-b3af-a48ba1b07b53, task_id=6c7a2bbd-5b0a-47f5-a36c-a19119aaf879 (api:52)
2017-12-12 21:02:41,547+0200 INFO  (jsonrpc/7) [api.virt] START create(vmParams={u'xml': u'<?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0"><name>direct_lun</name><uuid>fc4e9abb-73ea-4df7-bceb-1512a9bfd292</uuid><memory>1048576</memory><currentMemory>1048576</currentMemory><maxMemory slots="16">4194304</maxMemory><vcpu current="1">16</vcpu><sysinfo type="smbios"><system><entry name="manufacturer">oVirt</entry><entry name="product">OS-NAME:</entry><entry name="version">OS-VERSION:</entry><entry name="serial">HOST-SERIAL:</entry><entry name="uuid">fc4e9abb-73ea-4df7-bceb-1512a9bfd292</entry></system></sysinfo><clock offset="variable" adjustment="0"><timer name="rtc" tickpolicy="catchup"></timer><timer name="pit" tickpolicy="delay"></timer><timer name="hpet" present="no"></timer></clock><features><acpi></acpi></features><cpu match="exact"><model>Conroe</model><topology cores="1" threads="1" sockets="16"></topology><numa><cell cpus="0" memory="1048576"></cell></numa></cpu><cputune></cputune><devices><input type="mouse" bus="ps2"></input><channel type="unix"><target type="virtio" name="ovirt-guest-agent.0"></target><source mode="bind" path="/var/lib/libvirt/qemu/channels/fc4e9abb-73ea-4df7-bceb-1512a9bfd292.ovirt-guest-agent.0"></source></channel><channel type="unix"><target type="virtio" name="org.qemu.guest_agent.0"></target><source mode="bind" path="/var/lib/libvirt/qemu/channels/fc4e9abb-73ea-4df7-bceb-1512a9bfd292.org.qemu.guest_agent.0"></source></channel><memballoon model="virtio"><stats period="5"></stats><address bus="0x00" domain="0x0000" function="0x0" slot="0x06" type="pci"></address></memballoon><graphics type="spice" port="-1" autoport="yes" passwd="*****" passwdValidTo="1970-01-01T00:00:01" tlsPort="-1"><channel name="main" mode="secure"></channel><channel name="inputs" mode="secure"></channel><channel name="cursor" mode="secure"></channel><channel name="playback" mode="secure"></channel><channel name="record" mode="secure"></channel><channel name="display" mode="secure"></channel><channel name="smartcard" mode="secure"></channel><channel name="usbredir" mode="secure"></channel><listen type="network" network="vdsm-ovirtmgmt"></listen></graphics><controller type="usb" model="piix3-uhci" index="0"><address bus="0x00" domain="0x0000" function="0x2" slot="0x01" type="pci"></address></controller><video><model type="qxl" vram="8192" heads="1" ram="65536" vgamem="16384"></model><address bus="0x00" domain="0x0000" function="0x0" slot="0x02" type="pci"></address></video><controller type="scsi" model="virtio-scsi" index="0"></controller><controller type="virtio-serial" index="0" ports="16"><address bus="0x00" domain="0x0000" function="0x0" slot="0x05" type="pci"></address></controller><rng model="virtio"><backend model="random">/dev/urandom</backend></rng><channel type="spicevmc"><target type="virtio" name="com.redhat.spice.0"></target></channel><interface type="bridge"><model type="virtio"></model><link state="up"></link><source bridge="ovirtmgmt"></source><address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"></address><mac address="00:1a:4a:16:01:0d"></mac><filterref filter="vdsm-no-mac-spoofing"></filterref><bandwidth></bandwidth></interface><disk type="file" device="cdrom" snapshot="no"><driver name="qemu" type="raw" error_policy="report"></driver><source file="" startupPolicy="optional"></source><target dev="hdc" bus="ide"></target><readonly></readonly><address bus="1" controller="0" unit="0" type="drive" target="0"></address></disk><disk snapshot="no" type="block" device="lun" sgio="filtered"><target dev="sda" bus="scsi"></target><source dev="/dev/mapper/3514f0c5a51601089"></source><driver name="qemu" io="native" type="raw" error_policy="stop" cache="none"></driver><address bus="0" controller="0" unit="0" type="drive" target="0"></address><boot order="1"></boot></disk></devices><pm><suspend-to-disk enabled="no"></suspend-to-disk><suspend-to-mem enabled="no"></suspend-to-mem></pm><os><type arch="x86_64" machine="pc-i440fx-rhel7.3.0">hvm</type><smbios mode="sysinfo"></smbios></os><metadata><ovirt-tune:qos></ovirt-tune:qos><ovirt-vm:vm><minGuaranteedMemoryMb type="int">1024</minGuaranteedMemoryMb><clusterVersion>4.2</clusterVersion><ovirt-vm:custom></ovirt-vm:custom><ovirt-vm:device mac_address="00:1a:4a:16:01:0d"><ovirt-vm:custom></ovirt-vm:custom></ovirt-vm:device><ovirt-vm:device devtype="disk" name="sda"><ovirt-vm:GUID>3514f0c5a51601089</ovirt-vm:GUID></ovirt-vm:device><launchPaused>false</launchPaused><resumeBehavior>auto_resume</resumeBehavior></ovirt-vm:vm></metadata></domain>'}) from=::ffff:10.35.161.73,45164, flow_id=0a986482-c804-4827-b3af-a48ba1b07b53 (api:46)
2017-12-12 21:02:41,558+0200 INFO  (jsonrpc/7) [api.virt] FINISH create return={'status': {'message': 'Done', 'code': 0}, 'vmList': {u'xml': u'<?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0"><name>direct_lun</name><uuid>fc4e9abb-73ea-4df7-bceb-1512a9bfd292</uuid><memory>1048576</memory><currentMemory>1048576</currentMemory><maxMemory slots="16">4194304</maxMemory><vcpu current="1">16</vcpu><sysinfo type="smbios"><system><entry name="manufacturer">oVirt</entry><entry name="product">OS-NAME:</entry><entry name="version">OS-VERSION:</entry><entry name="serial">HOST-SERIAL:</entry><entry name="uuid">fc4e9abb-73ea-4df7-bceb-1512a9bfd292</entry></system></sysinfo><clock offset="variable" adjustment="0"><timer name="rtc" tickpolicy="catchup"></timer><timer name="pit" tickpolicy="delay"></timer><timer name="hpet" present="no"></timer></clock><features><acpi></acpi></features><cpu match="exact"><model>Conroe</model><topology cores="1" threads="1" sockets="16"></topology><numa><cell cpus="0" memory="1048576"></cell></numa></cpu><cputune></cputune><devices><input type="mouse" bus="ps2"></input><channel type="unix"><target type="virtio" name="ovirt-guest-agent.0"></target><source mode="bind" path="/var/lib/libvirt/qemu/channels/fc4e9abb-73ea-4df7-bceb-1512a9bfd292.ovirt-guest-agent.0"></source></channel><channel type="unix"><target type="virtio" name="org.qemu.guest_agent.0"></target><source mode="bind" path="/var/lib/libvirt/qemu/channels/fc4e9abb-73ea-4df7-bceb-1512a9bfd292.org.qemu.guest_agent.0"></source></channel><memballoon model="virtio"><stats period="5"></stats><address bus="0x00" domain="0x0000" function="0x0" slot="0x06" type="pci"></address></memballoon><graphics type="spice" port="-1" autoport="yes" passwd="*****" passwdValidTo="1970-01-01T00:00:01" tlsPort="-1"><channel name="main" mode="secure"></channel><channel name="inputs" mode="secure"></channel><channel name="cursor" mode="secure"></channel><channel name="playback" mode="secure"></channel><channel name="record" mode="secure"></channel><channel name="display" mode="secure"></channel><channel name="smartcard" mode="secure"></channel><channel name="usbredir" mode="secure"></channel><listen type="network" network="vdsm-ovirtmgmt"></listen></graphics><controller type="usb" model="piix3-uhci" index="0"><address bus="0x00" domain="0x0000" function="0x2" slot="0x01" type="pci"></address></controller><video><model type="qxl" vram="8192" heads="1" ram="65536" vgamem="16384"></model><address bus="0x00" domain="0x0000" function="0x0" slot="0x02" type="pci"></address></video><controller type="scsi" model="virtio-scsi" index="0"></controller><controller type="virtio-serial" index="0" ports="16"><address bus="0x00" domain="0x0000" function="0x0" slot="0x05" type="pci"></address></controller><rng model="virtio"><backend model="random">/dev/urandom</backend></rng><channel type="spicevmc"><target type="virtio" name="com.redhat.spice.0"></target></channel><interface type="bridge"><model type="virtio"></model><link state="up"></link><source bridge="ovirtmgmt"></source><address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"></address><mac address="00:1a:4a:16:01:0d"></mac><filterref filter="vdsm-no-mac-spoofing"></filterref><bandwidth></bandwidth></interface><disk type="file" device="cdrom" snapshot="no"><driver name="qemu" type="raw" error_policy="report"></driver><source file="" startupPolicy="optional"></source><target dev="hdc" bus="ide"></target><readonly></readonly><address bus="1" controller="0" unit="0" type="drive" target="0"></address></disk><disk snapshot="no" type="block" device="lun" sgio="filtered"><target dev="sda" bus="scsi"></target><source dev="/dev/mapper/3514f0c5a51601089"></source><driver name="qemu" io="native" type="raw" error_policy="stop" cache="none"></driver><address bus="0" controller="0" unit="0" type="drive" target="0"></address><boot order="1"></boot></disk></devices><pm><suspend-to-disk enabled="no"></suspend-to-disk><suspend-to-mem enabled="no"></suspend-to-mem></pm><os><type arch="x86_64" machine="pc-i440fx-rhel7.3.0">hvm</type><smbios mode="sysinfo"></smbios></os><metadata><ovirt-tune:qos></ovirt-tune:qos><ovirt-vm:vm><minGuaranteedMemoryMb type="int">1024</minGuaranteedMemoryMb><clusterVersion>4.2</clusterVersion><ovirt-vm:custom></ovirt-vm:custom><ovirt-vm:device mac_address="00:1a:4a:16:01:0d"><ovirt-vm:custom></ovirt-vm:custom></ovirt-vm:device><ovirt-vm:device devtype="disk" name="sda"><ovirt-vm:GUID>3514f0c5a51601089</ovirt-vm:GUID></ovirt-vm:device><launchPaused>false</launchPaused><resumeBehavior>auto_resume</resumeBehavior></ovirt-vm:vm></metadata></domain>', 'status': 'WaitForLaunch', 'vmName': 'direct_lun', 'vmId': 'fc4e9abb-73ea-4df7-bceb-1512a9bfd292', 'devices': [], 'guestDiskMapping': {}, 'clientIp': '', 'statusTime': '4406552220', 'arch': 'x86_64'}} from=::ffff:10.35.161.73,45164, flow_id=0a986482-c804-4827-b3af-a48ba1b07b53 (api:52)

Comment 14 Eyal Shenitzky 2017-12-13 11:07:08 UTC
Israel can you please supply the steps to reproduce this issue?

Comment 15 Israel Pinto 2017-12-13 11:48:40 UTC
I don't have exact scenario.
I happen after I checked the migration BZ 1522901 with the following steps:
1. Run vm on 4.1 host
2. Migrate to 4.2 host
3. Restart vdsmd on host 4.2
4. Migrate to host 4.2 
I did after it maintenance to host 4.1
All the steps pass
Day after the VM I update the engine check the BZ 1524330 
And rerun all the scenario again
Then we saw it, we stop the VM and it did not start on any host after it.

So you can see it don't have 1..N steps to give here but one thing for sure 
it did not start again any more.

Comment 16 Francesco Romani 2017-12-13 15:50:58 UTC
So, I'm looking at the 4.2 logs from here https://bugzilla.redhat.com/attachment.cgi?id=1366580

I see indeed this twice:

vdsm log:
Failing with error:
2017-12-12 11:18:46,757+0200 ERROR (vm/fc4e9abb) [virt.vm] (vmId='fc4e9abb-73ea-4df7-bceb-1512a9bfd292') The vm start process failed (vm:631)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 2060, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1006, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3567, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error: qemu unexpectedly closed the monitor: oon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on
2017-12-12T09:18:46.593930Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2017-12-12T09:18:46.594063Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config
2017-12-12T09:18:46.607677Z qemu-kvm: -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1: cannot get SG_IO version number: Input/output error.  Is this a SCSI device?

In both cases VM is started using vm.conf (aka no engine XML), so we can rule out engine xml flow. In both cases, it seems that we can't reach the storage.
Other logs seems to confirm that

2017-12-12 10:01:05,574+0200 INFO  (jsonrpc/1) [vdsm.api] START multipath_health() from=::ffff:10.35.161.73,42012, task_id=fd3ceba3-5b51-4914-8c80-f61b482e3f5b (api:46)
2017-12-12 10:01:05,574+0200 INFO  (jsonrpc/1) [vdsm.api] FINISH multipath_health return={u'3514f0c5a51601089': {'valid_paths': 3, 'failed_paths': [u'sdd', u'sde', u'sdh', u'sdi']}}

and

2017-12-12 10:16:18,251+0200 INFO  (jsonrpc/3) [storage.Multipath] Resizing map '3514f0c5a51601089' (map_size=32212254720, slave_size=0) (multipath:119)
2017-12-12 10:16:18,260+0200 ERROR (jsonrpc/3) [storage.Multipath] Could not resize device 3514f0c5a51601089 (multipath:98)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/multipath.py", line 96, in resize_devices
    _resize_if_needed(guid)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/multipath.py", line 120, in _resize_if_needed
    supervdsm.getProxy().resizeMap(name)
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in <lambda>
    **kwargs)
  File "<string>", line 2, in resizeMap
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
Error: Resizing map 'dm-9' failed: out='fail\n' err=''

So this seems to explain why the VM could not start, and we must make sure the storage is working ok before to investigate Vdsm (no evidence of Vdsm bug yet).

Now I'll check inbound migrations to that host.

Comment 17 Francesco Romani 2017-12-13 16:23:34 UTC
I see one migration attempt around

and it seems to fail for the same reasons: 2017-12-12T09:45:06.402430Z qemu-kvm: -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1: cannot get SG_IO version number: Input/output error.  Is this a SCSI device?
2017-12-12 09:45:06.487+0000: shutting down, reason=failed

2017-12-12 10:34:41,497+0200 INFO  (jsonrpc/3) [api.virt] START migrationCreate(params={u'username': u'Unknown', u'afterMigrationStatus': u'', u'enableGuestEvents': False, u'vmId': u'fc4e9abb-73ea-4df7-bceb-1512a9bfd292'

It is worth continuing the investigation, but only after we are sure that storage is working fine on the environment.
I don't see 4.2.0 blockers yet, or hard evidence of Vdsm bugs.

Comment 18 Israel Pinto 2017-12-13 19:34:14 UTC
(In reply to Francesco Romani from comment #17)
> I see one migration attempt around
> 
> and it seems to fail for the same reasons: 2017-12-12T09:45:06.402430Z
> qemu-kvm: -device
> scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,
> id=scsi0-0-0-0,bootindex=1: cannot get SG_IO version number: Input/output
> error.  Is this a SCSI device?
> 2017-12-12 09:45:06.487+0000: shutting down, reason=failed
> 
> 2017-12-12 10:34:41,497+0200 INFO  (jsonrpc/3) [api.virt] START
> migrationCreate(params={u'username': u'Unknown', u'afterMigrationStatus':
> u'', u'enableGuestEvents': False, u'vmId':
> u'fc4e9abb-73ea-4df7-bceb-1512a9bfd292'
> 
> It is worth continuing the investigation, but only after we are sure that
> storage is working fine on the environment.
> I don't see 4.2.0 blockers yet, or hard evidence of Vdsm bugs.

maybe the info in the BZ 1525375
Will put light on this problem, since in both first it works ok then it start failing.

Comment 19 Nir Soffer 2017-12-13 21:15:20 UTC
Israel, this multipath status looks strange - do you have 7 paths for your
multipath devices? I expect 1, 4, or 8.

Please attach the output of:
lsblk
multipath -ll

And /var/log/messages.

On the host used for testing this bug and bug 1525375.

Comment 20 Nir Soffer 2017-12-13 21:25:26 UTC
(In reply to Francesco Romani from comment #16)
> 2017-12-12 10:16:18,251+0200 INFO  (jsonrpc/3) [storage.Multipath] Resizing
> map '3514f0c5a51601089' (map_size=32212254720, slave_size=0) (multipath:119)

This looks suspicious, looks like slaves size is incorrect.

> 2017-12-12 10:16:18,260+0200 ERROR (jsonrpc/3) [storage.Multipath] Could not
> resize device 3514f0c5a51601089 (multipath:98)
> Traceback (most recent call last):
...
> Error: Resizing map 'dm-9' failed: out='fail\n' err=''

This probably explained by the zero slave size.

Ben, this is the way we get multipath slaves sizes:

101 def _resize_if_needed(guid):
102     name = devicemapper.getDmId(guid)
103     slaves = [(slave, getDeviceSize(slave))
104               for slave in devicemapper.getSlaves(name)]
105 
106     if len(slaves) == 0:
107         log.warning("Map %r has no slaves" % guid)
108         return False
109 
110     if len(set(size for slave, size in slaves)) != 1:
111         raise Error("Map %r slaves size differ %s" % (guid, slaves))
112 
113     map_size = getDeviceSize(name)
114     slave_size = slaves[0][1]                                                                                                                                                             
115     if map_size == slave_size:
116         return False
117 
118     log.info("Resizing map %r (map_size=%d, slave_size=%d)",
119              guid, map_size, slave_size)
120     supervdsm.getProxy().resizeMap(name)
121     return True

And this is the code resizing the map:

124 def _resize_map(name):
125     """
126     Invoke multipathd to resize a device
127     Must run as root
128 
129     Raises Error if multipathd failed to resize the map.
130     """
131     log.debug("Resizing map %r", name)
132     cmd = [_MULTIPATHD.cmd, "resize", "map", name]
133     with utils.stopwatch("Resized map %r" % name, log=log):
134         rc, out, err = commands.execCmd(cmd, raw=True, execCmdLogger=log)
135         # multipathd reports some errors using non-zero exit code and stderr
136         # (need to be root), but the command may return 0, and the result is
137         # reported using stdout.                                                                                                                                                          
138         if rc != 0 or out != "ok\n":
139             raise Error("Resizing map %r failed: out=%r err=%r"
140                         % (name, out, err))

Looking at this code, all slaves had size=0, and resizing the map failed.

Is this expected? how do you suggest to continue to debug this in the scsi
level?

Comment 21 Nir Soffer 2017-12-13 22:05:00 UTC
Francesco, just before the failure to prepare the path, we have this:

2017-12-13 09:24:47,150+0200 INFO  (vm/c524a3ed) [vdsm.api] FINISH getDevicesVisibility return={'visible': {u'3514f0c5a51601089': False}} from=internal, task_id=6d9c60ad-87fb-4d8a-a786-2f0ad497703c (api:52)

Which means that virt call getDeviceVisibility, and getDeviceVisiblity tell us
that that multipath device is not visible on the host.

The error does not mention anything about this, we need to check if we ignore
the value returned from this call.

Comment 22 Nir Soffer 2017-12-13 22:10:53 UTC
Francesco, this looks very clear:

333             # GUID drive format
334             elif "GUID" in drive:                                                                                                                                                         
335                 res = self.irs.getDevicesVisibility([drive["GUID"]])
336                 if not res["visible"][drive["GUID"]]:
337                     raise vm.VolumeError(drive)

We check if the device is visble, and if not, we raise VolumeError. However the
error is totally useless, and does not say anything about the reason for the
failure. This is from 4.1 but the code in 4.2 is just the same.

Including the drive in the error is great, but we must have a meaning full error
message like "Device %r is not visible".

We can use this bug to improve these errors.

Comment 23 Ben Marzinski 2017-12-14 17:43:09 UTC
(In reply to Nir Soffer from comment #20)
> Ben, this is the way we get multipath slaves sizes:

<snip>

> Looking at this code, all slaves had size=0, and resizing the map failed.
> 
> Is this expected? how do you suggest to continue to debug this in the scsi
> level?

No, the scsi devices should all have the correct size (assuming that the correct size isn't 0).  I'm not sure how getDeviceSize(slave) is checking the device size, but are you able to see if the device actually has a zero size when this returns that it does? You could check /proc/partitions, or /sys/block/<dev>/size or the output of "blockdev --getsz <devpath>". They should all return the same thing (with the caveat that /proc/partitions is in 1k blocks and the others are in 512 byte sectors).  Also, multipathd should say something in the syslog when you run the resize command. At the very least, you should see

"multipathd: <name>: resize map (operator)"

but on failure, it should log a reason.

Comment 24 Allon Mureinik 2018-02-08 10:29:17 UTC
Nir, Francesco, what's up with this BZ?
Are we waiting for something? What's the next AI here?

Comment 25 Francesco Romani 2018-02-09 13:54:45 UTC
(In reply to Allon Mureinik from comment #24)
> Nir, Francesco, what's up with this BZ?
> Are we waiting for something? What's the next AI here?

From my perspective, nothing is wrong on virt side (https://bugzilla.redhat.com/show_bug.cgi?id=1524941#c17)

We merged one minor log improvement patch related to this bug (https://gerrit.ovirt.org/#/c/86154/), but that's it.

I suggest to move under storage, or close.

Comment 26 Francesco Romani 2018-02-09 13:55:29 UTC
restoring NEEDINFO on Nir.

Comment 27 Nir Soffer 2018-02-18 12:25:59 UTC
Based on comment 21 and comment 22 the LUN was not visible on the host when we
tried to start the vm, and the system behavior is expected.

We can use this bug to verify Francesco improved log added in the attached patch.

Comment 28 Nir Soffer 2018-02-18 12:26:58 UTC
Moving to virt, starting a vm is a virt flow.

Comment 29 Michal Skrivanek 2018-02-26 10:20:34 UTC
there doesn't seem to be anything wrong anymore, closing


Note You need to log in before you can comment on or make changes to this bug.