Created attachment 1363711 [details] engine and vdsm logs Description of problem: LSM with continuous I/O (dd) to the VM with 2 disks failed. I executed dd if=/dev/vda of=/dev/sda bs=8MB and started LSM of the second disk without the OS. /dev/vda is with OS. engine.log: 2017-12-06 17:15:25,074+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] START, VmReplicateDiskFinishVDSCommand(HostName = host_mixed_3, VmReplicateDiskParameters:{runAsync='true', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', srcStorageDomainId='7d12be36-3538-4e59-94da-a1a 6edef01fb', targetStorageDomainId='13ecab19-539f-444b-a7a4-cd9b149791cb', imageGroupId='e682ce74-07fb-426a-9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5'}), log id: 58691998 2017-12-06 17:15:25,101+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed in 'VmReplicateDiskFinishVDS' method 2017-12-06 17:15:25,112+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM host_mixed_3 command VmReplicateDiskFinishVDS failed: Resource unavailable 2017-12-06 17:15:25,113+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateD iskFinishVDSCommand' return value 'StatusOnlyReturn [status=Status [code=40, message=Resource unavailable]]' 2017-12-06 17:15:25,113+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] HostName = host_mixed_3 2017-12-06 17:15:25,113+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_ 3, VmReplicateDiskParameters:{runAsync='true', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', srcStorageDomainId='7d12be36-3538-4e59-94da-a 1a6edef01fb', targetStorageDomainId='13ecab19-539f-444b-a7a4-cd9b149791cb', imageGroupId='e682ce74-07fb-426a-9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5'})' execution failed: VDSGenericException: VDSErrorException: F ailed to VmReplicateDiskFinishVDS, error = Resource unavailable, code = 40 2017-12-06 17:15:25,113+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] FINISH, VmReplicateDiskFinishVDSCommand, log id: 58691998 2017-12-06 17:15:25,123+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed VmReplicateDiskFinish (Disk 'e682ce74-07fb-426a-9451-ead88dbd48cb', V M 'f443a53d-b29e-4d15-a0bf-1560f89a40a9') 2017-12-06 17:15:25,123+02 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'LiveMigrateDisk' id: 'be8314ab-9cfd-4794-9b33-df875420383b' faile d when attempting to perform the next operation, marking as FAILED '[56366b2a-d99b-4a63-b678-0c151b6b803d, cfed5424-5305-4709-8f08-27e7a0845e8f]' 2017-12-06 17:15:25,123+02 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'LiveMigrateDisk' id: 'be8314ab-9cfd-4794-9b33-df875420383b' child commands '[56366b2a-d99b-4a63-b678-0c151b6b803d, cfed5424-5305-4709-8f08-27e7a0845e8f]' executions were completed, status 'FAILED' 2017-12-06 17:15:26,157+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Ending command 'org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand' with failure. 2017-12-06 17:15:26,158+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed during live storage migration of disk 'e682ce74-07fb-426a-9451-ead88db d48cb' of vm 'f443a53d-b29e-4d15-a0bf-1560f89a40a9', attempting to end replication before deleting the target disk 2017-12-06 17:15:26,159+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] START, VmReplicateDiskFinishVDSCommand(HostName = host_mixed_3, VmReplicateDiskParameters:{runAsync='true', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', srcStorageDomainId='7d12be36-3538-4e59-94da-a1a6 edef01fb', targetStorageDomainId='7d12be36-3538-4e59-94da-a1a6edef01fb', imageGroupId='e682ce74-07fb-426a-9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5'}), log id: 6f5e3e8d 2017-12-06 17:15:26,522+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed in 'VmReplicateDiskFinishVDS' method 2017-12-06 17:15:26,533+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, C all Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM host_mixed_3 command VmReplicateDiskFinishVDS failed: Resource unavailable 2017-12-06 17:15:26,533+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDi skFinishVDSCommand' return value 'StatusOnlyReturn [status=Status [code=40, message=Resource unavailable]]' 2017-12-06 17:15:26,533+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] HostName = host_mixed_3 2017-12-06 17:15:26,533+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_3 , VmReplicateDiskParameters:{runAsync='true', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', srcStorageDomainId='7d12be36-3538-4e59-94da-a1 a6edef01fb', targetStorageDomainId='7d12be36-3538-4e59-94da-a1a6edef01fb', imageGroupId='e682ce74-07fb-426a-9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5'})' execution failed: VDSGenericException: VDSErrorException: Fa iled to VmReplicateDiskFinishVDS, error = Resource unavailable, code = 40 2017-12-06 17:15:26,533+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] FINISH, VmReplicateDiskFinishVDSCommand, log id: 6f5e3e8d 2017-12-06 17:15:26,533+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Replication end of disk 'e682ce74-07fb-426a-9451-ead88dbd48cb' in vm 'f443a53 d-b29e-4d15-a0bf-1560f89a40a9' back to the source failed, skipping deletion of the target disk 2017-12-06 17:15:26,559+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2,011), Correlation ID: 1e eb30a5-e872-4eea-a5c1-d88aaa7fd2df, Job ID: 42e11d57-91b8-4e6e-94fc-5cfef9fd7038, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: User admin@internal-authz have failed to move disk test_Disk1 to domain iscsi_1. ... *** remove the auto generated stapshot *** 2017-12-06 17:15:31,691+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] START, MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{runAsync='tru e', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', storageDomainId='7d12be36-3538-4e59-94da-a1a6edef01fb', imageGroupId='e682ce74-07fb-426a -9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5', baseImageId='02b1ed15-bee0-4aec-bd0e-542427e2d54c', topImageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5', bandwidth='0'}), log id: 6e20a27e 2017-12-06 17:15:33,662+02 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (DefaultQuartzScheduler7) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'RemoveSnapshotSingleDiskLive' (id: '26ddddf6-0b5d -4d38-ae52-cae4dc516235') waiting on child command id: '0d474538-fb24-4ee1-9fb9-0f3ba3d819aa' type:'Merge' to complete 2017-12-06 17:15:34,239+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed in 'MergeVDS' method 2017-12-06 17:15:34,254+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM host_mixed_3 command MergeVDS failed: Merge failed 2017-12-06 17:15:34,254+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturn [status=Status [code=52, message=Merge failed]]' 2017-12-06 17:15:34,254+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] HostName = host_mixed_3 2017-12-06 17:15:34,254+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{runAsync='true', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', storageDomainId='7d12be36-3538-4e59-94da-a1a6edef01fb', imageGroupId='e682ce74-07fb-426a-9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5', baseImageId='02b1ed15-bee0-4aec-bd0e-542427e2d54c', topImageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52 2017-12-06 17:15:34,254+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] FINISH, MergeVDSCommand, log id: 6e20a27e 2017-12-06 17:15:34,254+02 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52 (Failed with error mergeErr and code 52) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2166) [bll.jar:] at org.ovirt.engine.core.bll.MergeCommand.executeCommand(MergeCommand.java:45) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:202) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInRequired(TransactionSupport.java:137) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:105) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) [bll.jar:] ... 2017-12-06 17:15:35,863+02 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (pool-5-thread-4) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed to live merge, still in volume chain: [02b1ed15-bee0-4aec-bd0e-542427e2d54c, 6669b25c-8256-4f4c-9b13-7d4d38fafaf5] 2017-12-06 17:15:36,821+02 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler6) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command id: '26ddddf6-0b5d-4d38-ae52-cae4dc516235 failed child command status for step 'MERGE_STATUS' vdsm.log: 2017-12-06 17:15:34,093+0200 INFO (jsonrpc/3) [virt.vm] (vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9') Starting merge with jobUUID='eae87370-1b5c-4d4b-9a4e-5432ab171440', original chain=02b1ed15-bee0-4aec-bd0e-542427e2d54c < 6669b25c-8256 -4f4c-9b13-7d4d38fafaf5 (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:4943) 2017-12-06 17:15:34,123+0200 ERROR (jsonrpc/3) [virt.vm] (vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9') Live merge failed (job: eae87370-1b5c-4d4b-9a4e-5432ab171440) (vm:4949) Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 4947, in merge bandwidth, flags) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1006, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 678, in blockCommit if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self) libvirtError: block copy still active: disk 'sda' already in active block job 2017-12-06 17:15:34,236+0200 INFO (jsonrpc/3) [vdsm.api] FINISH merge return={'status': {'message': 'Merge failed', 'code': 52}} from=::ffff:10.35.161.131,36142, flow_id=1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df (api:52) 2017-12-06 17:15:34,238+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call VM.merge failed (error 52) in 2.54 seconds (__init__:539) Version-Release number of selected component (if applicable): rhevm-4.1.8.2-0.1.el7.noarch vdsm-4.19.41-1.el7ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create VM with OS disk + 1 more thin provision disk 2. Start the VM and execute #dd if=/dev/vda of=/dev/sda 2.1 /dev/vda disk with OS, /dev/sda 2nd thin disk 3. LSM the 2nd disk to different storage domain (I tested same type - iscsi) Actual results: LSM fails and the auto-generated snapshot remains Expected results: Additional info: Opened as a regression check for bug #1520546
How do we intend to solve this?
(In reply to Yaniv Kaul from comment #1) > How do we intend to solve this? The plan is to introduce a new "step" to LSM which polls the status of the block job instead of assuming it was finished before the sync stage, and calls the VmReplicateDiskFinishVDSCommand only when it's done
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
*** This bug has been marked as a duplicate of bug 1312909 ***