Created attachment 1287676 [details] engine and vdsm logs Description of problem: When trying to live migrate multiple VM disks to different storage domain the operation failed due to: 2017-06-14 16:35:16,839+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [disks_syncAction_d9e8f8b2-fa1b-42fa] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_1, VmReplicateDiskParameters:{hostId='c4450f28-138d-4dd0-91bc-c38127e07aab', vmId='245be99a-3a63-481e-8324-5eb876e3e008', storagePoolId='6a031167-b8fd-4989-8985-ff0e0907cf7b', srcStorageDomainId='5737d729-23b9-4fb8-ad26-b443b970c908', targetStorageDomainId='5737d729-23b9-4fb8-ad26-b443b970c908', imageGroupId='b8b50f9a-c858-4e00-b48b-7d2a40fb0e2d', imageId='a0e23f01-abc3-4d94-8d22-d8ebda30df2d'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Drive replication error, code = 55 2017-06-14 16:35:16,839+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [disks_syncAction_d9e8f8b2-fa1b-42fa] FINISH, VmReplicateDiskFinishVDSCommand, log id: 49119030 2017-06-14 16:35:16,839+03 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (DefaultQuartzScheduler3) [disks_syncAction_d9e8f8b2-fa1b-42fa] Replication end of disk 'b8b50f9a-c858-4e00-b48b-7d2a40fb0e2d' in vm '245be99a-3a63-481e-8324-5eb876e3e008' back to the source failed, skipping deletion of the target disk 2017-06-14 16:35:16,878+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [disks_syncAction_d9e8f8b2-fa1b-42fa] EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2,011), User admin@internal-authz have failed to move disk disk_type_1416224541 to domain iscsi_0. VDSM error: 2017-06-14 16:36:52,475+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH getVolumeInfo error=Volume does not exist: ('a0e23f01-abc3-4d94-8d22-d8ebda30df2d',) (dispatcher:79) 2017-06-14 16:39:06,384+0300 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='3b503955-c5b3-445d-b32a-efe35d6876b2') Unexpected error (task:870) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "<string>", line 2, in getVolumeInfo File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3016, in getVolumeInfo volUUID=volUUID).getInfo() File "/usr/share/vdsm/storage/sd.py", line 757, in produceVolume volUUID) File "/usr/share/vdsm/storage/volume.py", line 794, in __init__ self._manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID) File "/usr/share/vdsm/storage/blockVolume.py", line 73, in __init__ volUUID) File "/usr/share/vdsm/storage/volume.py", line 90, in __init__ self.validate() File "/usr/share/vdsm/storage/blockVolume.py", line 163, in validate raise se.VolumeDoesNotExist(self.volUUID) Version-Release number of selected component (if applicable): Engine - 4.2.0-0.0.master.20170612192318.gitf773263.el7.centos VDSM - 4.20.0-1049.gite64d80e.el7.centos.x86_64 How reproducible: 100% Steps to Reproduce: 1.Create a VM with OS and multiple disks from different permutation (raw/sparse - thin/preallocated) 2. Run the VM 3. Migrate the VM disks concurrently (migrate each disk right after the Auto-generated snapshot creation finished) Actual results: Disks failed to move to different storage domain Expected results: Disk Live storage migration should complete successfully Additional info: Engine and VDSM logs attached
Does it work properly with a single disk?
No, is there a known issue/bug?
(In reply to Eyal Shenitzky from comment #2) > No, is there a known issue/bug? We're trying to narrow down the scenario to the minimal reproducer. Doing "Create a VM with OS and multiple disks from different permutation (raw/sparse - thin/preallocated)" is way too noisy.
Also, can you share libvirt/qemu versions?
*** Bug 1461329 has been marked as a duplicate of this bug. ***
Steps to reproduce: 1. Create a VM from a template with a disk on file-based storage domain as thin. 2. Run the VM 3. Move the VM's disk to different storage domain Result: Operation Failed with the described error. *There is a bug that prevents from checking on block-based storage domain - 1458548
libvirt-daemon-2.2.0-2.fc24.x86_64 qemu-kvm-2.7.0-8.fc24.x86_64
Created attachment 1288836 [details] engine and vdsm new logs
Eyal, The reported exception is an expected exception. Unfortunately, it is logged while shouldn't. Please note following log at engine: 2017-06-18 14:50:30,092+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (pool-5-thread-10) [5a5c759c-f5e4-4f00-bb6c-8ea4cec579ce] Command 'GetVolumeInfoVDSCommand(HostName = host_mixed_3, GetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]', hostId='f5d2fa3d-14b6-47a1-95b3-1d1f64281b26', storagePoolId='6a031167-b8fd-4989-8985-ff0e0907cf7b', storageDomainId='50c4a7b4-2567-4710-9c92-34a58daf5d3e', imageGroupId='83388a88-e57b-4543-a8cf-a777c2bd632d', imageId='8cd01108-832d-4483-bce0-b747e389b7bd'})' execution failed: VDSGenericException: VDSErrorException: Failed to GetVolumeInfoVDS, error = Volume does not exist: ('8cd01108-832d-4483-bce0-b747e389b7bd',), code = 201 You can see that we expect to get VolumeDoesNotExist when getting info of volume 8cd01108-832d-4483-bce0-b747e389b7bd. At the same time, the exception reported in vdsm log file: 2017-06-18 14:50:30,083+0300 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='35991b68-8ea7-444d-b1b1-4bf42bb6b7c3') Unexpected error (task:870) Traceback (most recent call last): ... VolumeDoesNotExist: Volume does not exist: ('8cd01108-832d-4483-bce0-b747e389b7bd',) BTW, 8cd01108-832d-4483-bce0-b747e389b7bd is the auto-generated snapshot that created before LSM starts and deleted once LSM completes. Besides this exception, do you see any other error? Was the move of the disk successfully completed?
(In reply to Ala Hino from comment #9) > Eyal, > > The reported exception is an expected exception. Unfortunately, it is logged > while shouldn't. > Please note following log at engine: > > 2017-06-18 14:50:30,092+03 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] > (pool-5-thread-10) [5a5c759c-f5e4-4f00-bb6c-8ea4cec579ce] Command > 'GetVolumeInfoVDSCommand(HostName = host_mixed_3, > GetVolumeInfoVDSCommandParameters: > {expectedEngineErrors='[VolumeDoesNotExist]', > hostId='f5d2fa3d-14b6-47a1-95b3-1d1f64281b26', > storagePoolId='6a031167-b8fd-4989-8985-ff0e0907cf7b', > storageDomainId='50c4a7b4-2567-4710-9c92-34a58daf5d3e', > imageGroupId='83388a88-e57b-4543-a8cf-a777c2bd632d', > imageId='8cd01108-832d-4483-bce0-b747e389b7bd'})' execution failed: > VDSGenericException: VDSErrorException: Failed to GetVolumeInfoVDS, error = > Volume does not exist: ('8cd01108-832d-4483-bce0-b747e389b7bd',), code = 201 > > > You can see that we expect to get VolumeDoesNotExist when getting info of > volume 8cd01108-832d-4483-bce0-b747e389b7bd. > > At the same time, the exception reported in vdsm log file: > > 2017-06-18 14:50:30,083+0300 ERROR (jsonrpc/1) [storage.TaskManager.Task] > (Task='35991b68-8ea7-444d-b1b1-4bf42bb6b7c3') Unexpected error (task:870) > Traceback (most recent call last): > ... > VolumeDoesNotExist: Volume does not exist: > ('8cd01108-832d-4483-bce0-b747e389b7bd',) > > BTW, 8cd01108-832d-4483-bce0-b747e389b7bd is the auto-generated snapshot > that created before LSM starts and deleted once LSM completes. > > Besides this exception, do you see any other error? Was the move of the disk > successfully completed? Thinking about it, when creating a VM from template, the disk cannot be moved because it is shared. Did you create a new disk and try to move it?
(In reply to Ala Hino from comment #10) > Thinking about it, when creating a VM from template, the disk cannot be > moved because it is shared. Did you create a new disk and try to move it? If the VM's disk is based on a template, LSM requires that the destination storage domain have a copy of the disk. If it has, LSM should work. If it doesn't, you should get a validation error, not a failure from VDSM.
(In reply to Allon Mureinik from comment #11) > (In reply to Ala Hino from comment #10) > > Thinking about it, when creating a VM from template, the disk cannot be > > moved because it is shared. Did you create a new disk and try to move it? > > If the VM's disk is based on a template, LSM requires that the destination > storage domain have a copy of the disk. > If it has, LSM should work. > If it doesn't, you should get a validation error, not a failure from VDSM. If the sd doesn't have a copy of the disk, there is the following warning and LSM cannot be performed: The following disks cannot be moved: vm01_Disk1, hover over a specific target domain selection for more info When hovering over the source sd, I see the following message: The template that the VM is based on does not exist on any active Storage Domain I tried the following scenario and still cannot perform LSM, still get the above messages: 1. created a VM, vm01, with a disk from sd1 2. copied the disk to sd2 3. created a template from the vm 4. created another VM, vm02 5. tried to move vm02 disk but operation is disabled I am probably missing something here.
Created attachment 1289652 [details] reproduced engine log
Created attachment 1291670 [details] new logs
The VolumeDoesNotExist originally reported is the unfortunate exception that we see after a successful live merge. This is documented in BZ 1442688. This bug is a duplicate of BZ 1458548 based on the errors reported in the logs: Engine ====== 2017-06-25 12:31:36,227+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand] (DefaultQuartzScheduler4) [2b82cb43-bf61-4e52-bab3-dabf96d58181] Command 'VmReplic ateDiskStartVDSCommand(HostName = host_mixed_1, VmReplicateDiskParameters:{hostId='c4450f28-138d-4dd0-91bc-c38127e07aab', vmId='71881f8a-fb5e-4205-b005-a5a2e84c5839', storagePoolId='6a031167 -b8fd-4989-8985-ff0e0907cf7b', srcStorageDomainId='50c4a7b4-2567-4710-9c92-34a58daf5d3e', targetStorageDomainId='5737d729-23b9-4fb8-ad26-b443b970c908', imageGroupId='cc492478-b7a9-479f-93c2- 45279747522b', imageId='b63f7c49-c016-46f2-acab-d84ce54b143d'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskStartVDS, error = Drive replication error, code = 55 Vdsm ==== 2017-06-25 12:31:36,159+0300 ERROR (jsonrpc/0) [virt.vm] (vmId='71881f8a-fb5e-4205-b005-a5a2e84c5839') Unable to start replication for sda to {'domainID': '5737d729-23b9-4fb8-ad26-b443b970c9 08', 'volumeInfo': {'domainID': '5737d729-23b9-4fb8-ad26-b443b970c908', 'volType': 'path', 'leaseOffset': 119537664, 'path': '/rhev/data-center/mnt/blockSD/5737d729-23b9-4fb8-ad26-b443b970c9 08/images/cc492478-b7a9-479f-93c2-45279747522b/b63f7c49-c016-46f2-acab-d84ce54b143d', 'volumeID': 'b63f7c49-c016-46f2-acab-d84ce54b143d', 'leasePath': '/dev/5737d729-23b9-4fb8-ad26-b443b970c 908/leases', 'imageID': 'cc492478-b7a9-479f-93c2-45279747522b'}, 'diskType': 'block', 'format': 'cow', 'cache': 'none', 'volumeID': 'b63f7c49-c016-46f2-acab-d84ce54b143d', 'imageID': 'cc4924 78-b7a9-479f-93c2-45279747522b', 'poolID': '6a031167-b8fd-4989-8985-ff0e0907cf7b', 'device': 'disk', 'path': '/rhev/data-center/6a031167-b8fd-4989-8985-ff0e0907cf7b/5737d729-23b9-4fb8-ad26-b 443b970c908/images/cc492478-b7a9-479f-93c2-45279747522b/b63f7c49-c016-46f2-acab-d84ce54b143d', 'propagateErrors': 'off', 'volumeChain': [{'domainID': '5737d729-23b9-4fb8-ad26-b443b970c908', 'volType': 'path', 'leaseOffset': 118489088, 'path': '/rhev/data-center/mnt/blockSD/5737d729-23b9-4fb8-ad26-b443b970c908/images/cc492478-b7a9-479f-93c2-45279747522b/9b1fc472-a111-4384-b04b-9 aa38766f15e', 'volumeID': '9b1fc472-a111-4384-b04b-9aa38766f15e', 'leasePath': '/dev/5737d729-23b9-4fb8-ad26-b443b970c908/leases', 'imageID': 'cc492478-b7a9-479f-93c2-45279747522b'}, {'domai nID': '5737d729-23b9-4fb8-ad26-b443b970c908', 'volType': 'path', 'leaseOffset': 119537664, 'path': '/rhev/data-center/mnt/blockSD/5737d729-23b9-4fb8-ad26-b443b970c908/images/cc492478-b7a9-47 9f-93c2-45279747522b/b63f7c49-c016-46f2-acab-d84ce54b143d', 'volumeID': 'b63f7c49-c016-46f2-acab-d84ce54b143d', 'leasePath': '/dev/5737d729-23b9-4fb8-ad26-b443b970c908/leases', 'imageID': 'c c492478-b7a9-479f-93c2-45279747522b'}]} (vm:3985) Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 3979, in diskReplicateStart self._startDriveReplication(drive) File "/usr/share/vdsm/virt/vm.py", line 4104, in _startDriveReplication self._dom.blockCopy(drive.name, destxml, flags=flags) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 86, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 751, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 684, in blockCopy if ret == -1: raise libvirtError ('virDomainBlockCopy() failed', dom=self) libvirtError: Requested operation is not valid: domain is not transient This bug found in 4.2.0 and is targeted (probably by mistake to 4.1.3). For the sake of accuracy, changing the target to 4.2.0. *** This bug has been marked as a duplicate of bug 1458548 ***