1650981 – Live migration failed - VDSErrorException: Failed to VmReplicateDiskFinishVDS (error = Drive replication error, code = 55)

Bug 1650981 - Live migration failed - VDSErrorException: Failed to VmReplicateDiskFinishVDS (error = Drive replication error, code = 55)

Summary: Live migration failed - VDSErrorException: Failed to VmReplicateDiskFinishVDS...

Keywords:
Status:	CLOSED DUPLICATE of bug 1597019
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	BLL.Storage
Sub Component:
Version:	4.2.7
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Tal Nisan
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-11-18 12:59 UTC by Evelina Shames
Modified:	2019-09-09 10:47 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-12-10 09:51:00 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
logs (10.69 MB, application/zip) 2018-11-18 12:59 UTC, Evelina Shames	no flags	Details
View All

Description Evelina Shames 2018-11-18 12:59:53 UTC

Created attachment 1506953 [details]
logs

Description of problem:
Live disk migration failed with error code 55 (Drive replication error) while running automation TC.

Engine:
2018-11-15 04:51:06,028+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-69) [disks_syncAction_638a64b0-2650-4ae2] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_3 command VmReplicateDiskFinishVDS failed: Failed to change disk image
2018-11-15 04:51:06,029+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-69) [disks_syncAction_638a64b0-2650-4ae2] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_3, VmReplicateDiskParameters:{hostId='d2eefbe7-0478-469f-a7ab-4d398e48fd0e', vmId='755dd2d2-5bc9-45af-9b5a-e41f8894fe25', storagePoolId='c685357e-e832-11e8-ac80-001a4a168bfc', srcStorageDomainId='76583133-aafe-4c74-950a-7b9451c0905a', targetStorageDomainId='7153e2ad-2568-4ee8-8329-7a4e309742c4', imageGroupId='4e0b11c8-2f1c-4da0-a7ef-2b6a748b0f1e', imageId='624dcd6d-4138-40f4-8171-d4f75ffa7b8a'})' execution failed: VDSGenericException: VDSErrorException: Failed in vdscommand to VmReplicateDiskFinishVDS, error = Failed to change disk image
2018-11-15 04:51:06,041+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-69) [disks_syncAction_638a64b0-2650-4ae2] Failed VmReplicateDiskFinish (Disk '4e0b11c8-2f1c-4da0-a7ef-2b6a748b0f1e', VM '755dd2d2-5bc9-45af-9b5a-e41f8894fe25')
2018-11-15 04:51:06,042+02 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-69) [disks_syncAction_638a64b0-2650-4ae2] Command 'LiveMigrateDisk' id: '5381ade4-1374-415f-a505-38eaa7cb3961' with children [d3e480d1-4772-483f-b5b3-622b96607a07, 6484bfab-185b-4ca6-a6d2-a2621a97e964, 1939ed2a-ac46-4ddc-81e2-f660f9b5a63b] failed when attempting to perform the next operation, marking as 'ACTIVE'
2018-11-15 04:51:06,042+02 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-69) [disks_syncAction_638a64b0-2650-4ae2] EngineException: Failed to change disk image (Failed with error FAILED_CHANGE_CD_IS_MOUNTED and code 41): org.ovirt.engine.core.common.errors.EngineException: EngineException: Failed to change disk image (Failed with error FAILED_CHAN
GE_CD_IS_MOUNTED and code 41)
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.replicateDiskFinish(LiveMigrateDiskCommand.java:402) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.completeLiveMigration(LiveMigrateDiskCommand.java:374) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.performNextOperation(LiveMigrateDiskCommand.java:242) [bll.jar:]
        at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32) [bll.jar:]
        at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:68) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:146) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:107) [bll.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_191]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_191]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_191]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_191]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_191]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)
2018-11-15 04:51:06,043+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-69) [disks_syncAction_638a64b0-2650-4ae2] Command 'LiveMigrateDisk' id: '5381ade4-1374-415f-a505-38eaa7cb3961' child commands '[d3e480d1-4772-483f-b5b3-622b96607a07, 6484bfab-185b-4ca6-a6d2-a2621a97e964, 1939ed2a-ac46-4ddc-81e2-f660f9b5a63b]' executions were completed, status 'FAILED' ocks='[b2598ab1-0039-4f4a-b94e-bcbc9702f54f=PROVIDER]', sharedLocks=''}'
2018-11-15 04:51:07,077+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [disks_syncAction_638a64b0-2650-4ae2] Ending command 'org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand' with failure.
2018-11-15 04:51:07,077+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [disks_syncAction_638a64b0-2650-4ae2] Failed during live storage migration of disk '4e0b11c8-2f1c-4da0-a7ef-2b6a748b0f1e' of vm '755dd2d2-5bc9-45af-9b5a-e41f8894fe25', attempting to end replication before deleting the target disk
2018-11-15 04:51:07,098+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [disks_syncAction_638a64b0-2650-4ae2] Failed in 'VmReplicateDiskFinishVDS' method
2018-11-15 04:51:07,098+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [disks_syncAction_638a64b0-2650-4ae2] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_3, VmReplicateDiskParameters:{hostId='d2eefbe7-0478-469f-a7ab-4d398e48fd0e', vmId='755dd2d2-5bc9-45af-9b5a-e41f8894fe25', storagePoolId='c685357e-e832-11e8-ac80-001a4a168bfc', srcStorageDomainId='76583133-aafe-4c74-950a-7b9451c0905a', targetStorageDomainId='76583133-aafe-4c74-950a-7b9451c0905a', imageGroupId='4e0b11c8-2f1c-4da0-a7ef-2b6a748b0f1e', imageId='624dcd6d-4138-40f4-8171-d4f75ffa7b8a'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Drive replication error, code = 55
2018-11-15 04:51:07,098+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [disks_syncAction_638a64b0-2650-4ae2] Replication end of disk '4e0b11c8-2f1c-4da0-a7ef-2b6a748b0f1e' in vm '755dd2d2-5bc9-45af-9b5a-e41f8894fe25' back to the source failed, skipping deletion of the target disk
2018-11-15 04:51:07,104+02 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [disks_syncAction_638a64b0-2650-4ae2] Trying to release exclusive lock which does not exist, lock key: '755dd2d2-5bc9-45af-9b5a-e41f8894fe25LIVE_STORAGE_MIGRATION'
2018-11-15 04:51:07,119+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [disks_syncAction_638a64b0-2650-4ae2] EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2,011), User admin@internal-authz have failed to move disk vm_TestCase6004_1504183047_Disk_0 to domain iscsi_0.

vdsm:
G?2018-11-15 04:51:05,590+0200 ERROR (jsonrpc/2) [virt.vm] (vmId='755dd2d2-5bc9-45af-9b5a-e41f8894fe25') Unable to stop the replication for the drive: vda (vm:4807)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4804, in diskReplicateFinish
    self._dom.blockJobAbort(drive.name, blockJobFlags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 784, in blockJobAbort
    if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
libvirtError: internal error: unable to execute QEMU command 'block-job-complete': The active block job 'drive-ua-4e0b11c8-2f1c-4da0-a7ef-2b6a748b0f1e' cannot be completed
018-11-15 04:51:07,087+0200 ERROR (jsonrpc/1) [virt.vm] (vmId='755dd2d2-5bc9-45af-9b5a-e41f8894fe25') Replication job not found (drive: 'vda', srcDisk: {u'device': u'disk', u'poolID': u'c685357e-e832-11e8-ac80-
001a4a168bfc', u'volumeID': u'624dcd6d-4138-40f4-8171-d4f75ffa7b8a', u'domainID': u'76583133-aafe-4c74-950a-7b9451c0905a', u'imageID': u'4e0b11c8-2f1c-4da0-a7ef-2b6a748b0f1e'}, job: {}) (vm:4759)


Version-Release number of selected component (if applicable):
ovirt-engine-4.2.7.5-0.1.el7ev.noarch
vdsm-4.20.43-1.el7ev.x86_64
libvirt-4.5.0-10.el7_6.2.x86_64
qemu-img-rhev-2.12.0-18.el7_6.1.x86_64


How reproducible:
Once.

Steps to Reproduce:
1. Have 2 or more iSCSI domains in the DC
2. Create a VM with a disk attached to it on one of the iSCSI domains and start it
3. Move the disk while the VM is running and the disk is active

Actual results:
Live storage migration failed.

Expected results:
Live storage migration should succeed on block storage.

Additional info:
Logs are attached.

Comment 1 Tal Nisan 2018-12-10 09:51:00 UTC


*** This bug has been marked as a duplicate of bug 1597019 ***

Note You need to log in before you can comment on or make changes to this bug.