Bug 1461468 - Live storage migration failed, GetVolumeInfoVDSCommand failed on GetVolumeInfoVDS volume does not exist
Summary: Live storage migration failed, GetVolumeInfoVDSCommand failed on GetVolumeInf...
Keywords:
Status: CLOSED DUPLICATE of bug 1458548
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.0
: ---
Assignee: Ala Hino
QA Contact: Raz Tamir
URL:
Whiteboard:
: 1461329 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-14 13:48 UTC by Eyal Shenitzky
Modified: 2017-06-25 12:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-25 12:44:18 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2?


Attachments (Terms of Use)
engine and vdsm logs (1.49 MB, application/x-gzip)
2017-06-14 13:48 UTC, Eyal Shenitzky
no flags Details
engine and vdsm new logs (2.51 MB, application/x-gzip)
2017-06-18 11:56 UTC, Eyal Shenitzky
no flags Details
reproduced engine log (835.33 KB, application/x-gzip)
2017-06-20 13:56 UTC, Eyal Shenitzky
no flags Details
new logs (1.26 MB, application/x-gzip)
2017-06-25 09:43 UTC, Eyal Shenitzky
no flags Details

Description Eyal Shenitzky 2017-06-14 13:48:04 UTC
Created attachment 1287676 [details]
engine and vdsm logs

Description of problem:

When trying to live migrate multiple VM disks to different storage domain the operation failed due to:  


2017-06-14 16:35:16,839+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [disks_syncAction_d9e8f8b2-fa1b-42fa] Command 'VmReplicateDiskFinishVDSCommand(HostName = host_mixed_1, VmReplicateDiskParameters:{hostId='c4450f28-138d-4dd0-91bc-c38127e07aab', vmId='245be99a-3a63-481e-8324-5eb876e3e008', storagePoolId='6a031167-b8fd-4989-8985-ff0e0907cf7b', srcStorageDomainId='5737d729-23b9-4fb8-ad26-b443b970c908', targetStorageDomainId='5737d729-23b9-4fb8-ad26-b443b970c908', imageGroupId='b8b50f9a-c858-4e00-b48b-7d2a40fb0e2d', imageId='a0e23f01-abc3-4d94-8d22-d8ebda30df2d'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Drive replication error, code = 55
2017-06-14 16:35:16,839+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (DefaultQuartzScheduler3) [disks_syncAction_d9e8f8b2-fa1b-42fa] FINISH, VmReplicateDiskFinishVDSCommand, log id: 49119030
2017-06-14 16:35:16,839+03 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (DefaultQuartzScheduler3) [disks_syncAction_d9e8f8b2-fa1b-42fa] Replication end of disk 'b8b50f9a-c858-4e00-b48b-7d2a40fb0e2d' in vm '245be99a-3a63-481e-8324-5eb876e3e008' back to the source failed, skipping deletion of the target disk
2017-06-14 16:35:16,878+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler3) [disks_syncAction_d9e8f8b2-fa1b-42fa] EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2,011), User admin@internal-authz have failed to move disk disk_type_1416224541 to domain iscsi_0.

VDSM error:
2017-06-14 16:36:52,475+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH getVolumeInfo error=Volume does not exist: ('a0e23f01-abc3-4d94-8d22-d8ebda30df2d',) (dispatcher:79)

2017-06-14 16:39:06,384+0300 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='3b503955-c5b3-445d-b32a-efe35d6876b2') Unexpected error (task:870)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 877, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in getVolumeInfo
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3016, in getVolumeInfo
    volUUID=volUUID).getInfo()
  File "/usr/share/vdsm/storage/sd.py", line 757, in produceVolume
    volUUID)
  File "/usr/share/vdsm/storage/volume.py", line 794, in __init__
    self._manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/blockVolume.py", line 73, in __init__
    volUUID)
  File "/usr/share/vdsm/storage/volume.py", line 90, in __init__
    self.validate()
  File "/usr/share/vdsm/storage/blockVolume.py", line 163, in validate
    raise se.VolumeDoesNotExist(self.volUUID)


Version-Release number of selected component (if applicable):
Engine - 4.2.0-0.0.master.20170612192318.gitf773263.el7.centos
VDSM - 4.20.0-1049.gite64d80e.el7.centos.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Create a VM with OS and multiple disks from different permutation (raw/sparse - thin/preallocated)
2. Run the VM
3. Migrate the VM disks concurrently (migrate each disk right after the Auto-generated snapshot creation finished)

Actual results:
Disks failed to move to different storage domain

Expected results:
Disk Live storage migration should complete successfully

Additional info:
Engine and VDSM logs attached

Comment 1 Allon Mureinik 2017-06-14 14:16:11 UTC
Does it work properly with a single disk?

Comment 2 Eyal Shenitzky 2017-06-14 20:05:37 UTC
No, is there a known issue/bug?

Comment 3 Allon Mureinik 2017-06-15 10:20:08 UTC
(In reply to Eyal Shenitzky from comment #2)
> No, is there a known issue/bug?
We're trying to narrow down the scenario to the minimal reproducer. Doing "Create a VM with OS and multiple disks from different permutation (raw/sparse - thin/preallocated)" is way too noisy.

Comment 4 Allon Mureinik 2017-06-15 10:20:49 UTC
Also, can you share libvirt/qemu versions?

Comment 5 Allon Mureinik 2017-06-15 12:18:01 UTC
*** Bug 1461329 has been marked as a duplicate of this bug. ***

Comment 6 Eyal Shenitzky 2017-06-18 11:52:35 UTC
Steps to reproduce:

1. Create a VM from a template with a disk on file-based storage domain as thin.
2. Run the VM
3. Move the VM's disk to different storage domain

Result:
Operation Failed with the described error.

*There is a bug that prevents from checking on block-based storage domain - 1458548

Comment 7 Eyal Shenitzky 2017-06-18 11:55:28 UTC
libvirt-daemon-2.2.0-2.fc24.x86_64
qemu-kvm-2.7.0-8.fc24.x86_64

Comment 8 Eyal Shenitzky 2017-06-18 11:56:25 UTC
Created attachment 1288836 [details]
engine and vdsm new logs

Comment 9 Ala Hino 2017-06-18 18:45:43 UTC
Eyal,

The reported exception is an expected exception. Unfortunately, it is logged while shouldn't.
Please note following log at engine:

2017-06-18 14:50:30,092+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (pool-5-thread-10) [5a5c759c-f5e4-4f00-bb6c-8ea4cec579ce] Command 'GetVolumeInfoVDSCommand(HostName = host_mixed_3, GetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]', hostId='f5d2fa3d-14b6-47a1-95b3-1d1f64281b26', storagePoolId='6a031167-b8fd-4989-8985-ff0e0907cf7b', storageDomainId='50c4a7b4-2567-4710-9c92-34a58daf5d3e', imageGroupId='83388a88-e57b-4543-a8cf-a777c2bd632d', imageId='8cd01108-832d-4483-bce0-b747e389b7bd'})' execution failed: VDSGenericException: VDSErrorException: Failed to GetVolumeInfoVDS, error = Volume does not exist: ('8cd01108-832d-4483-bce0-b747e389b7bd',), code = 201


You can see that we expect to get VolumeDoesNotExist when getting info of volume 8cd01108-832d-4483-bce0-b747e389b7bd.

At the same time, the exception reported in vdsm log file:

2017-06-18 14:50:30,083+0300 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='35991b68-8ea7-444d-b1b1-4bf42bb6b7c3') Unexpected error (task:870)
Traceback (most recent call last):
...
VolumeDoesNotExist: Volume does not exist: ('8cd01108-832d-4483-bce0-b747e389b7bd',)

BTW, 8cd01108-832d-4483-bce0-b747e389b7bd is the auto-generated snapshot that created before LSM starts and deleted once LSM completes.

Besides this exception, do you see any other error? Was the move of the disk successfully completed?

Comment 10 Ala Hino 2017-06-19 05:59:07 UTC
(In reply to Ala Hino from comment #9)
> Eyal,
> 
> The reported exception is an expected exception. Unfortunately, it is logged
> while shouldn't.
> Please note following log at engine:
> 
> 2017-06-18 14:50:30,092+03 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand]
> (pool-5-thread-10) [5a5c759c-f5e4-4f00-bb6c-8ea4cec579ce] Command
> 'GetVolumeInfoVDSCommand(HostName = host_mixed_3,
> GetVolumeInfoVDSCommandParameters:
> {expectedEngineErrors='[VolumeDoesNotExist]',
> hostId='f5d2fa3d-14b6-47a1-95b3-1d1f64281b26',
> storagePoolId='6a031167-b8fd-4989-8985-ff0e0907cf7b',
> storageDomainId='50c4a7b4-2567-4710-9c92-34a58daf5d3e',
> imageGroupId='83388a88-e57b-4543-a8cf-a777c2bd632d',
> imageId='8cd01108-832d-4483-bce0-b747e389b7bd'})' execution failed:
> VDSGenericException: VDSErrorException: Failed to GetVolumeInfoVDS, error =
> Volume does not exist: ('8cd01108-832d-4483-bce0-b747e389b7bd',), code = 201
> 
> 
> You can see that we expect to get VolumeDoesNotExist when getting info of
> volume 8cd01108-832d-4483-bce0-b747e389b7bd.
> 
> At the same time, the exception reported in vdsm log file:
> 
> 2017-06-18 14:50:30,083+0300 ERROR (jsonrpc/1) [storage.TaskManager.Task]
> (Task='35991b68-8ea7-444d-b1b1-4bf42bb6b7c3') Unexpected error (task:870)
> Traceback (most recent call last):
> ...
> VolumeDoesNotExist: Volume does not exist:
> ('8cd01108-832d-4483-bce0-b747e389b7bd',)
> 
> BTW, 8cd01108-832d-4483-bce0-b747e389b7bd is the auto-generated snapshot
> that created before LSM starts and deleted once LSM completes.
> 
> Besides this exception, do you see any other error? Was the move of the disk
> successfully completed?

Thinking about it, when creating a VM from template, the disk cannot be moved because it is shared. Did you create a new disk and try to move it?

Comment 11 Allon Mureinik 2017-06-19 06:26:38 UTC
(In reply to Ala Hino from comment #10)
> Thinking about it, when creating a VM from template, the disk cannot be
> moved because it is shared. Did you create a new disk and try to move it?

If the VM's disk is based on a template, LSM requires that the destination storage domain have a copy of the disk.
If it has, LSM should work.
If it doesn't, you should get a validation error, not a failure from VDSM.

Comment 12 Ala Hino 2017-06-19 07:04:34 UTC
(In reply to Allon Mureinik from comment #11)
> (In reply to Ala Hino from comment #10)
> > Thinking about it, when creating a VM from template, the disk cannot be
> > moved because it is shared. Did you create a new disk and try to move it?
> 
> If the VM's disk is based on a template, LSM requires that the destination
> storage domain have a copy of the disk.
> If it has, LSM should work.
> If it doesn't, you should get a validation error, not a failure from VDSM.

If the sd doesn't have a copy of the disk, there is the following warning and LSM cannot be performed:

The following disks cannot be moved: vm01_Disk1, hover over a specific target domain selection for more info

When hovering over the source sd, I see the following message:

The template that the VM is based on does not exist on any active Storage Domain

I tried the following scenario and still cannot perform LSM, still get the above messages:

1. created a VM, vm01, with a disk from sd1
2. copied the disk to sd2
3. created a template from the vm
4. created another VM, vm02
5. tried to move vm02 disk but operation is disabled

I am probably missing something here.

Comment 13 Eyal Shenitzky 2017-06-20 13:56:01 UTC
Created attachment 1289652 [details]
reproduced engine log

Comment 14 Eyal Shenitzky 2017-06-25 09:43:55 UTC
Created attachment 1291670 [details]
new logs

Comment 15 Ala Hino 2017-06-25 12:44:18 UTC
The VolumeDoesNotExist originally reported is the unfortunate exception that we see after a successful live merge. This is documented in BZ 1442688.

This bug is a duplicate of BZ 1458548 based on the errors reported in the logs:

Engine
======
2017-06-25 12:31:36,227+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand] (DefaultQuartzScheduler4) [2b82cb43-bf61-4e52-bab3-dabf96d58181] Command 'VmReplic
ateDiskStartVDSCommand(HostName = host_mixed_1, VmReplicateDiskParameters:{hostId='c4450f28-138d-4dd0-91bc-c38127e07aab', vmId='71881f8a-fb5e-4205-b005-a5a2e84c5839', storagePoolId='6a031167
-b8fd-4989-8985-ff0e0907cf7b', srcStorageDomainId='50c4a7b4-2567-4710-9c92-34a58daf5d3e', targetStorageDomainId='5737d729-23b9-4fb8-ad26-b443b970c908', imageGroupId='cc492478-b7a9-479f-93c2-
45279747522b', imageId='b63f7c49-c016-46f2-acab-d84ce54b143d'})' execution failed: VDSGenericException: VDSErrorException: Failed to VmReplicateDiskStartVDS, error = Drive replication error,
 code = 55

Vdsm
====
2017-06-25 12:31:36,159+0300 ERROR (jsonrpc/0) [virt.vm] (vmId='71881f8a-fb5e-4205-b005-a5a2e84c5839') Unable to start replication for sda to {'domainID': '5737d729-23b9-4fb8-ad26-b443b970c9
08', 'volumeInfo': {'domainID': '5737d729-23b9-4fb8-ad26-b443b970c908', 'volType': 'path', 'leaseOffset': 119537664, 'path': '/rhev/data-center/mnt/blockSD/5737d729-23b9-4fb8-ad26-b443b970c9
08/images/cc492478-b7a9-479f-93c2-45279747522b/b63f7c49-c016-46f2-acab-d84ce54b143d', 'volumeID': 'b63f7c49-c016-46f2-acab-d84ce54b143d', 'leasePath': '/dev/5737d729-23b9-4fb8-ad26-b443b970c
908/leases', 'imageID': 'cc492478-b7a9-479f-93c2-45279747522b'}, 'diskType': 'block', 'format': 'cow', 'cache': 'none', 'volumeID': 'b63f7c49-c016-46f2-acab-d84ce54b143d', 'imageID': 'cc4924
78-b7a9-479f-93c2-45279747522b', 'poolID': '6a031167-b8fd-4989-8985-ff0e0907cf7b', 'device': 'disk', 'path': '/rhev/data-center/6a031167-b8fd-4989-8985-ff0e0907cf7b/5737d729-23b9-4fb8-ad26-b
443b970c908/images/cc492478-b7a9-479f-93c2-45279747522b/b63f7c49-c016-46f2-acab-d84ce54b143d', 'propagateErrors': 'off', 'volumeChain': [{'domainID': '5737d729-23b9-4fb8-ad26-b443b970c908', 
'volType': 'path', 'leaseOffset': 118489088, 'path': '/rhev/data-center/mnt/blockSD/5737d729-23b9-4fb8-ad26-b443b970c908/images/cc492478-b7a9-479f-93c2-45279747522b/9b1fc472-a111-4384-b04b-9
aa38766f15e', 'volumeID': '9b1fc472-a111-4384-b04b-9aa38766f15e', 'leasePath': '/dev/5737d729-23b9-4fb8-ad26-b443b970c908/leases', 'imageID': 'cc492478-b7a9-479f-93c2-45279747522b'}, {'domai
nID': '5737d729-23b9-4fb8-ad26-b443b970c908', 'volType': 'path', 'leaseOffset': 119537664, 'path': '/rhev/data-center/mnt/blockSD/5737d729-23b9-4fb8-ad26-b443b970c908/images/cc492478-b7a9-47
9f-93c2-45279747522b/b63f7c49-c016-46f2-acab-d84ce54b143d', 'volumeID': 'b63f7c49-c016-46f2-acab-d84ce54b143d', 'leasePath': '/dev/5737d729-23b9-4fb8-ad26-b443b970c908/leases', 'imageID': 'c
c492478-b7a9-479f-93c2-45279747522b'}]} (vm:3985)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 3979, in diskReplicateStart
    self._startDriveReplication(drive)
  File "/usr/share/vdsm/virt/vm.py", line 4104, in _startDriveReplication
    self._dom.blockCopy(drive.name, destxml, flags=flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 86, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 751, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 684, in blockCopy
    if ret == -1: raise libvirtError ('virDomainBlockCopy() failed', dom=self)
libvirtError: Requested operation is not valid: domain is not transient

This bug found in 4.2.0 and is targeted (probably by mistake to 4.1.3). For the sake of accuracy, changing the target to 4.2.0.

*** This bug has been marked as a duplicate of bug 1458548 ***


Note You need to log in before you can comment on or make changes to this bug.