Bug 1525375

Summary: [Engine 4.1] Failed to start VM with direct lun on both 4.1 and 4.2 host
Product: [oVirt] vdsm Reporter: Israel Pinto <ipinto>
Component: GeneralAssignee: Eyal Shenitzky <eshenitz>
Status: CLOSED NOTABUG QA Contact: Israel Pinto <ipinto>
Severity: high Docs Contact:
Priority: high    
Version: 4.19.41CC: amureini, bugs, ipinto, mavital, mzamazal, nsoffer, ylavi
Target Milestone: ovirt-4.2.1Flags: rule-engine: ovirt-4.2?
ylavi: planning_ack+
ipinto: devel_ack?
ipinto: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-02 09:21:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs(engine,vdsm - both hosts)
none
Events - vm is up
none
Events - Failed to start VM none

Description Israel Pinto 2017-12-13 08:36:21 UTC
Description of problem:
The setup is 4.1 engine with cluster 4.1 with 2 host: 4.1 and 4.2
Failed to start VM with direct lun on both hosts 4.1 and 4.2

The VM was running and succeed migrate between host see attached events screenshots.
After migrating between hosts it failed to migrate and I stop it and try to rerun it on both hosts and it failed to run.
 
VM name: direct_lun
VM id: c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4

Version-Release number of selected component (if applicable):
Engine: Red Hat Virtualization Manager Version: 4.1.8.2-0.1.el7
vdsm 4.1: vdsm-4.19.42-1.el7ev.x86_64	
vdsm 4.2: vdsm-4.20.9-30.gite026991.el7.centos

Steps:
1. Run VM on host 4.1
2. Migrate to host 4.2 host
3. Restart vdsm on host 4.2 
4. Migrate back to 4.1 
5. Put host to maintenance (failed here)
6. Stop VM 
7. Start VM on host (4.1/4.2) - Vm failed to start.

Additional info:
Engine log:
2017-12-13 09:56:48,913+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-38) [204ca1cf] EVENT_ID: USER_FAILED_RUN_VM(54), Correlation ID: 101f892e-b5b0-45eb-9904-fda9b9a9378b, Job ID: 73cabd36-77ba-4f1e-b600-49eda1a1d2b7, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Failed to run VM direct_lun (User: admin@internal-authz).

4.1 vdsm log:

2017-12-13 09:56:07,276+0200 ERROR (vm/c524a3ed) [virt.vm] (vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4') The vm start process failed (vm:631)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1978, in _run
    self._preparePathsForDrives(dev_spec_map[hwclass.DISK])
  File "/usr/share/vdsm/virt/vm.py", line 657, in _preparePathsForDrives
    drive['path'] = self.cif.prepareVolumePath(drive, self.id)
  File "/usr/share/vdsm/clientIF.py", line 337, in prepareVolumePath
    raise vm.VolumeError(drive)
VolumeError: Bad volume specification {u'index': 0, u'iface': u'scsi', u'format': u'raw', u'bootOrder': u'1', u'discard': False, u'sgio': u'filtered', u'GUID': u'3514f0c5a51601089', u'specParams': {}, u'readonly': u'false', u'optional': u'false', u'deviceId': u'eb7b2891-7f87-4be0-a233-55ffd3993398', u'address': {u'bus': u'0', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'lun', u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'}
2017-12-13 09:56:07,277+0200 INFO  (vm/c524a3ed) [virt.vm] (vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4') Changed state to Down: Bad volume specification {u'index': 0, u'iface': u'scsi', u'format': u'raw', u'bootOrder': u'1', u'discard': False, u'sgio': u'filtered', u'GUID': u'3514f0c5a51601089', u'specParams': {}, u'readonly': u'false', u'optional': u'false', u'deviceId': u'eb7b2891-7f87-4be0-a233-55ffd3993398', u'address': {u'bus': u'0', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'lun', u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'} (code=1) (vm:1259)


4.2 host vdsm log:
2017-12-13 09:56:37,224+0200 ERROR (vm/c524a3ed) [virt.vm] (vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4') The vm start process failed (vm:914)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 843, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2632, in _run
    self._devices = self._make_devices()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2540, in _make_devices
    self._preparePathsForDrives(dev_spec_map[hwclass.DISK])
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 988, in _preparePathsForDrives
    drive['path'] = self.cif.prepareVolumePath(drive, self.id)
  File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 407, in prepareVolumePath
    raise vm.VolumeError(drive)
VolumeError: Bad volume specification {u'index': 0, u'iface': u'scsi', 'vm_custom': {u'device_08b0c6a6-a0ea-470b-9bdb-938695a4d676device_62bae4e5-d01d-4951-b21d-115c8510860bdevice_b138dc19-a6e0-4960-975d-900cc99b624a': u"VmDevice:{id='VmDeviceId:{deviceId='b138dc19-a6e0-4960-975d-900cc99b624a', vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4'}', device='spicevmc', type='CHANNEL', bootOrder='0', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=3}', managed='false', plugged='true', readOnly='false', deviceAlias='channel2', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_08b0c6a6-a0ea-470b-9bdb-938695a4d676': u"VmDevice:{id='VmDeviceId:{deviceId='08b0c6a6-a0ea-470b-9bdb-938695a4d676', vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4'}', device='unix', type='CHANNEL', bootOrder='0', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=1}', managed='false', plugged='true', readOnly='false', deviceAlias='channel0', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_08b0c6a6-a0ea-470b-9bdb-938695a4d676device_62bae4e5-d01d-4951-b21d-115c8510860b': u"VmDevice:{id='VmDeviceId:{deviceId='62bae4e5-d01d-4951-b21d-115c8510860b', vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4'}', device='unix', type='CHANNEL', bootOrder='0', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=2}', managed='false', plugged='true', readOnly='false', deviceAlias='channel1', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_08b0c6a6-a0ea-470b-9bdb-938695a4d676device_62bae4e5-d01d-4951-b21d-115c8510860bdevice_b138dc19-a6e0-4960-975d-900cc99b624adevice_1eadae15-2af4-4515-94b4-1f73aadd8836': u"VmDevice:{id='VmDeviceId:{deviceId='1eadae15-2af4-4515-94b4-1f73aadd8836', vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4'}', device='ide', type='CONTROLLER', bootOrder='0', specParams='[]', address='{slot=0x01, bus=0x00, domain=0x0000, type=pci, function=0x1}', managed='false', plugged='true', readOnly='false', deviceAlias='ide', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}"}, u'format': u'raw', u'bootOrder': u'1', u'discard': False, 'vmid': 'c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4', u'sgio': u'filtered', u'GUID': u'3514f0c5a51601089', u'specParams': {}, u'readonly': u'false', u'optional': u'false', u'deviceId': u'eb7b2891-7f87-4be0-a233-55ffd3993398', u'address': {u'bus': u'0', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'lun', u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'}
2017-12-13 09:56:37,224+0200 INFO  (vm/c524a3ed) [virt.vm] (vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4') Changed state to Down: Bad volume specification {u'index': 0, u'iface': u'scsi', 'vm_custom': {u'device_08b0c6a6-a0ea-470b-9bdb-938695a4d676device_62bae4e5-d01d-4951-b21d-115c8510860bdevice_b138dc19-a6e0-4960-975d-900cc99b624a': u"VmDevice:{id='VmDeviceId:{deviceId='b138dc19-a6e0-4960-975d-900cc99b624a', vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4'}', device='spicevmc', type='CHANNEL', bootOrder='0', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=3}', managed='false', plugged='true', readOnly='false', deviceAlias='channel2', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_08b0c6a6-a0ea-470b-9bdb-938695a4d676': u"VmDevice:{id='VmDeviceId:{deviceId='08b0c6a6-a0ea-470b-9bdb-938695a4d676', vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4'}', device='unix', type='CHANNEL', bootOrder='0', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=1}', managed='false', plugged='true', readOnly='false', deviceAlias='channel0', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_08b0c6a6-a0ea-470b-9bdb-938695a4d676device_62bae4e5-d01d-4951-b21d-115c8510860b': u"VmDevice:{id='VmDeviceId:{deviceId='62bae4e5-d01d-4951-b21d-115c8510860b', vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4'}', device='unix', type='CHANNEL', bootOrder='0', specParams='[]', address='{bus=0, controller=0, type=virtio-serial, port=2}', managed='false', plugged='true', readOnly='false', deviceAlias='channel1', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}", u'device_08b0c6a6-a0ea-470b-9bdb-938695a4d676device_62bae4e5-d01d-4951-b21d-115c8510860bdevice_b138dc19-a6e0-4960-975d-900cc99b624adevice_1eadae15-2af4-4515-94b4-1f73aadd8836': u"VmDevice:{id='VmDeviceId:{deviceId='1eadae15-2af4-4515-94b4-1f73aadd8836', vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4'}', device='ide', type='CONTROLLER', bootOrder='0', specParams='[]', address='{slot=0x01, bus=0x00, domain=0x0000, type=pci, function=0x1}', managed='false', plugged='true', readOnly='false', deviceAlias='ide', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}"}, u'format': u'raw', u'bootOrder': u'1', u'discard': False, 'vmid': 'c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4', u'sgio': u'filtered', u'GUID': u'3514f0c5a51601089', u'specParams': {}, u'readonly': u'false', u'optional': u'false', u'deviceId': u'eb7b2891-7f87-4be0-a233-55ffd3993398', u'address': {u'bus': u'0', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, u'device': u'lun', u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'} (code=1) (vm:1633)
2017-12-13 09:56:37,253+0200 INFO  (vm/c524a3ed) [virt.vm] (vmId='c524a3ed-d3f0-4f1b-bb7c-6ff31c1a49d4') Stopping connection (guestagent:435)

Comment 1 Israel Pinto 2017-12-13 08:37:22 UTC
Created attachment 1367175 [details]
logs(engine,vdsm - both hosts)

Comment 2 Israel Pinto 2017-12-13 08:38:08 UTC
Created attachment 1367176 [details]
Events - vm is up

Comment 3 Israel Pinto 2017-12-13 08:38:40 UTC
Created attachment 1367177 [details]
Events - Failed to start VM

Comment 4 Milan Zamazal 2017-12-13 19:54:54 UTC
I couldn't reproduce the bug with the steps above on my setup. I created a VM and attached an iSCSI disk to it. In step 5. the VM had automatically and successfully migrated to the 4.2 host, so I put that host to maintenance to perform step 5. with failure. Even then the VM starts successfully in step 7., on any of the hosts.

Considering the weird errors reported from libvirt in bug 1524941, I'd suggest double checking that there is no problem outside oVirt, e.g. with storage.

Comment 5 Sandro Bonazzola 2017-12-14 04:01:34 UTC
Dropping blocker flag according to https://lists.ovirt.org/pipermail/devel/2017-December/032213.html

Comment 6 Allon Mureinik 2017-12-14 10:31:09 UTC
(In reply to Sandro Bonazzola from comment #5)
> Dropping blocker flag according to
> https://lists.ovirt.org/pipermail/devel/2017-December/032213.html

Pushing out to 4.2.1 accordingly.

Comment 7 Yaniv Kaul 2017-12-27 15:03:12 UTC
(In reply to Milan Zamazal from comment #4)
> I couldn't reproduce the bug with the steps above on my setup. I created a
> VM and attached an iSCSI disk to it. In step 5. the VM had automatically and
> successfully migrated to the 4.2 host, so I put that host to maintenance to
> perform step 5. with failure. Even then the VM starts successfully in step
> 7., on any of the hosts.
> 
> Considering the weird errors reported from libvirt in bug 1524941, I'd
> suggest double checking that there is no problem outside oVirt, e.g. with
> storage.

Can you please check?

Comment 15 Israel Pinto 2018-01-02 09:21:18 UTC
Closing this BZ after check with storage DEV,
There was problem with the storage settings that cause this problem.