Bug 1522851 - [RFE] - Add LSM policies for VMs with continuous I/O to ensure that disk migration completes.
Summary: [RFE] - Add LSM policies for VMs with continuous I/O to ensure that disk migr...
Keywords:
Status: CLOSED DUPLICATE of bug 1312909
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.1.7.6
Hardware: Unspecified
OS: Unspecified
high
high vote
Target Milestone: ---
: ---
Assignee: Benny Zlotnik
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-06 15:19 UTC by Raz Tamir
Modified: 2019-07-10 08:52 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-10 08:52:18 UTC
oVirt Team: Storage
ylavi: ovirt-4.3?
ylavi: planning_ack+
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
engine and vdsm logs (928.38 KB, application/x-gzip)
2017-12-06 15:19 UTC, Raz Tamir
no flags Details

Description Raz Tamir 2017-12-06 15:19:36 UTC
Created attachment 1363711 [details]
engine and vdsm logs

Description of problem:
LSM with continuous I/O (dd) to the VM with 2 disks failed.
I executed dd if=/dev/vda of=/dev/sda bs=8MB and started LSM of the second disk without the OS.
/dev/vda is with OS.

engine.log:
2017-12-06 17:15:25,074+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] START, VmReplicateDiskFinishVDSCommand(HostName = host_mixed_3,
 VmReplicateDiskParameters:{runAsync='true', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', srcStorageDomainId='7d12be36-3538-4e59-94da-a1a
6edef01fb', targetStorageDomainId='13ecab19-539f-444b-a7a4-cd9b149791cb', imageGroupId='e682ce74-07fb-426a-9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5'}), log id: 58691998
2017-12-06 17:15:25,101+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed in 'VmReplicateDiskFinishVDS' method
2017-12-06 17:15:25,112+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, 
Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM host_mixed_3 command VmReplicateDiskFinishVDS failed: Resource unavailable
2017-12-06 17:15:25,113+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateD
iskFinishVDSCommand' return value 'StatusOnlyReturn [status=Status [code=40, message=Resource unavailable]]'
2017-12-06 17:15:25,113+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] HostName = host_mixed_3
2017-12-06 17:15:25,113+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_
3, VmReplicateDiskParameters:{runAsync='true', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', srcStorageDomainId='7d12be36-3538-4e59-94da-a
1a6edef01fb', targetStorageDomainId='13ecab19-539f-444b-a7a4-cd9b149791cb', imageGroupId='e682ce74-07fb-426a-9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5'})' execution failed: VDSGenericException: VDSErrorException: F
ailed to VmReplicateDiskFinishVDS, error = Resource unavailable, code = 40
2017-12-06 17:15:25,113+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] FINISH, VmReplicateDiskFinishVDSCommand, log id: 58691998
2017-12-06 17:15:25,123+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed VmReplicateDiskFinish (Disk 'e682ce74-07fb-426a-9451-ead88dbd48cb', V
M 'f443a53d-b29e-4d15-a0bf-1560f89a40a9')
2017-12-06 17:15:25,123+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'LiveMigrateDisk' id: 'be8314ab-9cfd-4794-9b33-df875420383b' faile
d when attempting to perform the next operation, marking as FAILED '[56366b2a-d99b-4a63-b678-0c151b6b803d, cfed5424-5305-4709-8f08-27e7a0845e8f]'
2017-12-06 17:15:25,123+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler10) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'LiveMigrateDisk' id: 'be8314ab-9cfd-4794-9b33-df875420383b' child
 commands '[56366b2a-d99b-4a63-b678-0c151b6b803d, cfed5424-5305-4709-8f08-27e7a0845e8f]' executions were completed, status 'FAILED'
2017-12-06 17:15:26,157+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Ending command 'org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand'
 with failure.
2017-12-06 17:15:26,158+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed during live storage migration of disk 'e682ce74-07fb-426a-9451-ead88db
d48cb' of vm 'f443a53d-b29e-4d15-a0bf-1560f89a40a9', attempting to end replication before deleting the target disk
2017-12-06 17:15:26,159+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] START, VmReplicateDiskFinishVDSCommand(HostName = host_mixed_3, 
VmReplicateDiskParameters:{runAsync='true', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', srcStorageDomainId='7d12be36-3538-4e59-94da-a1a6
edef01fb', targetStorageDomainId='7d12be36-3538-4e59-94da-a1a6edef01fb', imageGroupId='e682ce74-07fb-426a-9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5'}), log id: 6f5e3e8d
2017-12-06 17:15:26,522+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed in 'VmReplicateDiskFinishVDS' method
2017-12-06 17:15:26,533+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, C
all Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM host_mixed_3 command VmReplicateDiskFinishVDS failed: Resource unavailable
2017-12-06 17:15:26,533+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDi
skFinishVDSCommand' return value 'StatusOnlyReturn [status=Status [code=40, message=Resource unavailable]]'
2017-12-06 17:15:26,533+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] HostName = host_mixed_3
2017-12-06 17:15:26,533+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_3
, VmReplicateDiskParameters:{runAsync='true', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', srcStorageDomainId='7d12be36-3538-4e59-94da-a1
a6edef01fb', targetStorageDomainId='7d12be36-3538-4e59-94da-a1a6edef01fb', imageGroupId='e682ce74-07fb-426a-9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5'})' execution failed: VDSGenericException: VDSErrorException: Fa
iled to VmReplicateDiskFinishVDS, error = Resource unavailable, code = 40
2017-12-06 17:15:26,533+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] FINISH, VmReplicateDiskFinishVDSCommand, log id: 6f5e3e8d
2017-12-06 17:15:26,533+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Replication end of disk 'e682ce74-07fb-426a-9451-ead88dbd48cb' in vm 'f443a53
d-b29e-4d15-a0bf-1560f89a40a9' back to the source failed, skipping deletion of the target disk
2017-12-06 17:15:26,559+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2,011), Correlation ID: 1e
eb30a5-e872-4eea-a5c1-d88aaa7fd2df, Job ID: 42e11d57-91b8-4e6e-94fc-5cfef9fd7038, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: User admin@internal-authz have failed to move disk test_Disk1 to domain iscsi_1.


... 

*** remove the auto generated stapshot ***

2017-12-06 17:15:31,691+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] START, MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{runAsync='tru
e', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', storageDomainId='7d12be36-3538-4e59-94da-a1a6edef01fb', imageGroupId='e682ce74-07fb-426a
-9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5', baseImageId='02b1ed15-bee0-4aec-bd0e-542427e2d54c', topImageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5', bandwidth='0'}), log id: 6e20a27e
2017-12-06 17:15:33,662+02 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (DefaultQuartzScheduler7) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'RemoveSnapshotSingleDiskLive' (id: '26ddddf6-0b5d
-4d38-ae52-cae4dc516235') waiting on child command id: '0d474538-fb24-4ee1-9fb9-0f3ba3d819aa' type:'Merge' to complete
2017-12-06 17:15:34,239+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed in 'MergeVDS' method
2017-12-06 17:15:34,254+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM host_mixed_3 command MergeVDS failed: Merge failed
2017-12-06 17:15:34,254+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturn [status=Status [code=52, message=Merge failed]]'
2017-12-06 17:15:34,254+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] HostName = host_mixed_3
2017-12-06 17:15:34,254+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command 'MergeVDSCommand(HostName = host_mixed_3, MergeVDSCommandParameters:{runAsync='true', hostId='aed9971c-963e-4fd7-a652-a7839563d538', vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9', storagePoolId='0ef229cd-a746-4a25-be21-39ffb68e1c5e', storageDomainId='7d12be36-3538-4e59-94da-a1a6edef01fb', imageGroupId='e682ce74-07fb-426a-9451-ead88dbd48cb', imageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5', baseImageId='02b1ed15-bee0-4aec-bd0e-542427e2d54c', topImageId='6669b25c-8256-4f4c-9b13-7d4d38fafaf5', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52
2017-12-06 17:15:34,254+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] FINISH, MergeVDSCommand, log id: 6e20a27e
2017-12-06 17:15:34,254+02 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-5-thread-3) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52 (Failed with error mergeErr and code 52)
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2166) [bll.jar:]
        at org.ovirt.engine.core.bll.MergeCommand.executeCommand(MergeCommand.java:45) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:202) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInRequired(TransactionSupport.java:137) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:105) [utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) [bll.jar:]

...

2017-12-06 17:15:35,863+02 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (pool-5-thread-4) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Failed to live merge, still in volume chain: [02b1ed15-bee0-4aec-bd0e-542427e2d54c, 6669b25c-8256-4f4c-9b13-7d4d38fafaf5]
2017-12-06 17:15:36,821+02 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler6) [1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df] Command id: '26ddddf6-0b5d-4d38-ae52-cae4dc516235 failed child command status for step 'MERGE_STATUS'



vdsm.log:

2017-12-06 17:15:34,093+0200 INFO  (jsonrpc/3) [virt.vm] (vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9') Starting merge with jobUUID='eae87370-1b5c-4d4b-9a4e-5432ab171440', original chain=02b1ed15-bee0-4aec-bd0e-542427e2d54c < 6669b25c-8256
-4f4c-9b13-7d4d38fafaf5 (top), disk='sda', base='sda[1]', top=None, bandwidth=0, flags=12 (vm:4943)
2017-12-06 17:15:34,123+0200 ERROR (jsonrpc/3) [virt.vm] (vmId='f443a53d-b29e-4d15-a0bf-1560f89a40a9') Live merge failed (job: eae87370-1b5c-4d4b-9a4e-5432ab171440) (vm:4949)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 4947, in merge
    bandwidth, flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1006, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 678, in blockCommit
    if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self)
libvirtError: block copy still active: disk 'sda' already in active block job
2017-12-06 17:15:34,236+0200 INFO  (jsonrpc/3) [vdsm.api] FINISH merge return={'status': {'message': 'Merge failed', 'code': 52}} from=::ffff:10.35.161.131,36142, flow_id=1eeb30a5-e872-4eea-a5c1-d88aaa7fd2df (api:52)
2017-12-06 17:15:34,238+0200 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call VM.merge failed (error 52) in 2.54 seconds (__init__:539)



Version-Release number of selected component (if applicable):
rhevm-4.1.8.2-0.1.el7.noarch
vdsm-4.19.41-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create VM with OS disk + 1 more thin provision disk
2. Start the VM and execute #dd if=/dev/vda of=/dev/sda
  2.1 /dev/vda disk with OS, /dev/sda 2nd thin disk
3. LSM the 2nd disk to different storage domain (I tested same type - iscsi)

Actual results:
LSM fails and the auto-generated snapshot remains

Expected results:


Additional info:
Opened as a regression check for bug #1520546

Comment 1 Yaniv Kaul 2018-03-01 12:56:25 UTC
How do we intend to solve this?

Comment 2 Benny Zlotnik 2018-03-01 14:08:21 UTC
(In reply to Yaniv Kaul from comment #1)
> How do we intend to solve this?

The plan is to introduce a new "step" to LSM which polls the status of the block job instead of assuming it was finished before the sync stage, and calls the VmReplicateDiskFinishVDSCommand only when it's done

Comment 3 Sandro Bonazzola 2019-01-28 09:34:15 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 4 Benny Zlotnik 2019-07-10 08:52:18 UTC

*** This bug has been marked as a duplicate of bug 1312909 ***


Note You need to log in before you can comment on or make changes to this bug.