Description of problem: Start with a VM based on a template (thin) 1. Start the VM 2. Copy template from SD A to SD B 3. While it is still copying, engine allows to migrate the VM Disk from SD A to SD B. Do it. 4. Disk copy succeeds, Disk move fails 5. Live move snapshot in illegal state, retry doesn't work. Snapshot again doesn't work as well. Version-Release number of selected component (if applicable): rhevm-4.1.9.2-0.1.el7.noarch vdsm-4.19.31-1.el7ev.x86_64 How reproducible: Did not try again yet. Steps to Reproduce: 1. Have 2 Storage Domains, A and B 2. Create Template on A 3. Create VM based on template (thin) on A 4. Copy template to B 5. While 4 is still in execution, live move disk of VM A to SD B. Actual results: Live Move fails, illegal snapshot in chain, retry doesnt work. cold retry also fails. Expected results: Don't allow 4 and 5 at the same time? Or somehow succeed. Additional info: * germano-he1 disk lives on rhevh5-nfs, it's based on RHEL-H-7.4-template which is also on rhevh5-nfs. 1. Start copy template Feb 14, 2018 2:22:46 PM User admin@internal is copying disk RHEL-H-7.4-template to domain rhevh6-nfs. 2. Start live disk move Feb 14, 2018 2:24:56 PM Snapshot 'Auto-generated for Live Storage Migration' creation for VM 'germano-he1' was initiated by admin@internal. Feb 14, 2018 2:26:15 PM Snapshot 'Auto-generated for Live Storage Migration' creation for VM 'germano-he1' has been completed. Feb 14, 2018 2:26:17 PM User admin@internal moving disk RHEL-H-7.4-template to domain rhevh6-nfs. 3. Copy finishes, live move still going... Feb 14, 2018 2:26:34 PM User admin@internal finished copying disk RHEL-H-7.4-template to domain rhevh6-nfs. 4. Live move fails, so does snapshot remove Feb 14, 2018 2:35:03 PM User admin@internal have failed to move disk RHEL-H-7.4-template to domain rhevh6-nfs. Feb 14, 2018 2:35:03 PM Snapshot 'Auto-generated for Live Storage Migration' deletion for VM 'germano-he1' was initiated by admin@internal. Feb 14, 2018 2:35:13 PM Failed to delete snapshot 'Auto-generated for Live Storage Migration' for VM 'germano-he1'. 5. Retry remove: Feb 14, 2018 2:36:25 PM Failed to delete snapshot 'Auto-generated for Live Storage Migration' for VM 'germano-he1'. Feb 14, 2018 2:37:05 PM Failed to delete snapshot 'Auto-generated for Live Storage Migration' for VM 'germano-he1'. 6. Power off and try cold: Feb 14, 2018 2:41:54 PM VM germano-he1 powered off by admin@internal (Host: rhevh7). Feb 14, 2018 2:42:22 PM Failed to delete snapshot 'Auto-generated for Live Storage Migration' for VM 'germano-he1'. Not really sure if this is actually supposed to be allowed, but the live move failed here: 2018-02-14 14:35:01,028+10 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler7) [217aa54b-3a2e-4d2e-b46c-f34c294b7a84] START, VmReplicateDiskFinishVDSCommand(HostName = rhevh7, VmReplicateDiskParameters:{runAsync='true', hostId='cc54ddf1-507e-443e-a688-06e37290d2f0', vmId='2fcf1180-1193-4d9e-b432-d9e48885e195', storagePoolId='8922eadb-09a6-4a42-88ca-e6298e95b605', srcStorageDomainId='f50a1e6e-5b88-4d1d-ab44-0c0b2bb804f8', targetStorageDomainId='a22db68a-00e5-43e2-afd1-b42e5689629f', imageGroupId='045b89e9-23c6-4e64-86db-aef96478a008', imageId='1a3520e4-b8c1-48d9-a52b-84c15cecbbb7'}), log id: 3b181270 2018-02-14 14:35:01,800+10 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler7) [217aa54b-3a2e-4d2e-b46c-f34c294b7a84] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM rhevh7 command VmReplicateDiskFinishVDS failed: Resource unavailable VDSM side: 2018-02-14 14:35:01,510+1000 ERROR (jsonrpc/1) [virt.vm] (vmId='2fcf1180-1193-4d9e-b432-d9e48885e195') Replication job unfinished (drive: 'vda', srcDisk: {u'device': u'disk', u'poolID': u'8922eadb-09a6-4a42-88ca-e6298e95b605', u'volumeID': u'1a3520e4-b8c1-48d9-a52b-84c15cecbbb7', u'domainID': u'f50a1e6e-5b88-4d1d-ab44-0c0b2bb804f8', u'imageID': u'045b89e9-23c6-4e64-86db-aef96478a008'}, job: {'end': 1097203712L, 'bandwidth': 0L, 'type': 2, 'cur': 1092878336L}) (vm:3839) Then the Live Merge failed like this: 2018-02-14 14:35:08,080+10 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-5) [217aa54b-3a2e-4d2e-b46c-f34c294b7a84] FINISH, MergeVDSCommand, log id: 2af67cab 2018-02-14 14:35:08,080+10 ERROR [org.ovirt.engine.core.bll.MergeCommand] (pool-5-thread-5) [217aa54b-3a2e-4d2e-b46c-f34c294b7a84] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52 (Failed with error mergeErr and code 52) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2170) [bll.jar:] at org.ovirt.engine.core.bll.MergeCommand.executeCommand(MergeCommand.java:45) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1255) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1395) [bll.jar:] 2018-02-14 14:35:10,103+10 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (pool-5-thread-6) [217aa54b-3a2e-4d2e-b46c-f34c294b7a84] Failed to live merge, still in volume chain: [1a3520e4-b8c1-48d9-a52b-84c15cecbbb7, f1790894-c977-4878-a0d6-3e8d16faf41a] VDSM side: 2018-02-14 14:35:07,016+1000 ERROR (jsonrpc/3) [virt.vm] (vmId='2fcf1180-1193-4d9e-b432-d9e48885e195') Live merge failed (job: f7c40e44-ddcf-4691-9f8b-26ff176395bb) (vm:4926) Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 4924, in merge bandwidth, flags) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1006, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 678, in blockCommit if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self) libvirtError: block copy still active: disk 'vda' already in active block job
IMHO the problem is on step 3 - this should be blocked till copying the template's disk is done
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops
Verified at 4.2.2.4-0.1.el7
BZ<2>Jira Resync