Bug 1434927 - HSM - Cold Move - failed to move disk after cold merge of snapshot
Summary: HSM - Cold Move - failed to move disk after cold merge of snapshot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.19.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.1.2
: 4.19.11
Assignee: Fred Rolland
QA Contact: Eyal Shenitzky
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-22 15:45 UTC by Kevin Alon Goldblatt
Modified: 2018-09-16 12:32 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-23 08:19:10 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1+


Attachments (Terms of Use)
server, engine, vdsm logs (600.19 KB, application/x-gzip)
2017-03-22 18:53 UTC, Kevin Alon Goldblatt
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 74652 0 'None' MERGED blockVolume: validate initial size with max size 2020-08-18 08:15:18 UTC
oVirt gerrit 75144 0 'None' MERGED block: add failing test for initial size 2020-08-18 08:15:18 UTC
oVirt gerrit 75633 0 'None' MERGED blockvolume: tidy blockvolume_test 2020-08-18 08:15:18 UTC
oVirt gerrit 75804 0 'None' MERGED blockvolume: tidy blockvolume_test 2020-08-18 08:15:18 UTC
oVirt gerrit 75805 0 'None' MERGED block: add failing test for initial size 2020-08-18 08:15:18 UTC
oVirt gerrit 75806 0 'None' MERGED blockVolume: validate initial size with max size 2020-08-18 08:15:18 UTC

Description Kevin Alon Goldblatt 2017-03-22 15:45:50 UTC
Description of problem:
Failed to Cold Move Preallocated ISCSI disks to block SD - CreateVolumeContainerCommand' ended with failure

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.1.5-0.1.el7.noarch
rhevm-4.1.1.5-0.1.el7.noarch
vdsm-4.19.9-1.el7ev.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create VM from Template with one disk
2. Add other disks including iscsi preallocated, iscsi thin, nfs preallocated and thin
3. Write to disks and created snapshots
4. Performed one cold merge successfully
5. Performed a cold move of all disks - iscsi - thin, nfs preallocated, nfs - thin all moved successfully. iscsi preallocated failed to move with:
CreateVolumeContainerCommand' ended with failure

Actual results:
Should work

Expected results:


Additional info:
Engine log
--------------------------------------------------------
2017-03-22 17:14:35,188+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [fc2a14d] Command 'CloneImageGroupVolumesStructure' (id: '34914a98-a220-47a6-bd35-843f2
03978ba') waiting on child command id: '042908c5-bd9f-45bf-9c15-63642d63dd25' type:'CreateVolumeContainer' to complete
2017-03-22 17:14:35,196+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [3dcab615] Command 'CopyImageGroupWithData' (id: '131d206d-73aa-4d29-905c-ba9e56066a8d'
) waiting on child command id: '34914a98-a220-47a6-bd35-843f203978ba' type:'CloneImageGroupVolumesStructure' to complete
2017-03-22 17:14:35,214+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [7a88fb] Command 'MoveOrCopyDisk' (id: 'b472d9c3-02ca-4e8d-8083-bb40a6c7901b') wait
ing on child command id: '62a0e2d0-a383-4889-805f-3fba57e841f6' type:'MoveImageGroup' to complete
2017-03-22 17:14:35,311+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler1) [50277c51] Failed in 'HSMGetAllTasksStatusesVDS' method
2017-03-22 17:14:35,326+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [50277c51] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM purple-vds2.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Error creating a new volume
2017-03-22 17:14:35,327+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler1) [50277c51] SPMAsyncTask::PollTask: Polling task '85939eb2-4600-4331-9892-2db529a29211' (Parent Command 'CreateVolumeContainer', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'.
2017-03-22 17:14:35,331+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler1) [50277c51] BaseAsyncTask::logEndTaskFailure: Task '85939eb2-4600-4331-9892-2db529a29211' (Parent Command 'CreateVolumeContainer', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205'
2017-03-22 17:14:35,333+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler1) [50277c51] CommandAsyncTask::endActionIfNecessary: All tasks of command '042908c5-bd9f-45bf-9c15-63642d63dd25' has ended -> executing 'endAction'
2017-03-22 17:14:35,333+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler1) [50277c51] CommandAsyncTask::endAction: Ending action for '1' tasks (command ID: '042908c5-bd9f-45bf-9c15-63642d63dd25'): calling endAction '.
2017-03-22 17:14:35,334+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-30) [50277c51] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'CreateVolumeContainer',
2017-03-22 17:14:35,350+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand' with failure.
2017-03-22 17:14:35,367+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'CreateVolumeContainer' completed, handling the result.
2017-03-22 17:14:35,367+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'CreateVolumeContainer' succeeded, clearing tasks.
2017-03-22 17:14:35,367+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] SPMAsyncTask::ClearAsyncTask: Attempting to clear task '85939eb2-4600-4331-9892-2db529a29211'
2017-03-22 17:14:35,370+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] START, SPMClearTaskVDSCommand( SPMTaskGuidBaseVDSCommandParameters:{runAsync='true', storagePoolId='f906c5b2-96d5-4592-b16c-7e7e60af37d6', ignoreFailoverLimit='false', taskId='85939eb2-4600-4331-9892-2db529a29211'}), log id: 4bfa86c9
2017-03-22 17:14:35,373+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] START, HSMClearTaskVDSCommand(HostName = purple-vds2.qa.lab.tlv.redhat.com, HSMTaskGuidBaseVDSCommandParameters:{runAsync='true', hostId='c82979b9-9c8c-4f66-92f8-2172ec54f99e', taskId='85939eb2-4600-4331-9892-2db529a29211'}), log id: 17cb4690
2017-03-22 17:14:36,220+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] FINISH, HSMClearTaskVDSCommand, log id: 17cb4690
2017-03-22 17:14:36,220+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] FINISH, SPMClearTaskVDSCommand, log id: 4bfa86c9
2017-03-22 17:14:36,226+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] BaseAsyncTask::removeTaskFromDB: Removed task '85939eb2-4600-4331-9892-2db529a29211' from DataBase
2017-03-22 17:14:36,227+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] CommandAsyncTask::HandleEndActionResult [within thread]: Removing CommandMultiAsyncTasks object for entity '042908c5-bd9f-45bf-9c15-63642d63dd25'
2017-03-22 17:14:45,244+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [68fc3f87] Command 'MoveImageGroup' (id: '62a0e2d0-a383-4889-805f-3fba57e841f6') waiting on child command id: '131d206d-73aa-4d29-905c-ba9e56066a8d' type:'CopyImageGroupWithData' to complete
2017-03-22 17:14:45,261+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [fc2a14d] Command 'CloneImageGroupVolumesStructure' id: '34914a98-a220-47a6-bd35-843f203978ba' child commands '[0459e6ef-03bf-4428-a4a9-e21045b8f939, 042908c5-bd9f-45bf-9c15-63642d63dd25]' executions were completed, status 'FAILED'
2017-03-22 17:14:45,284+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [3dcab615] Command 'CopyImageGroupWithData' (id: '131d206d-73aa-4d29-905c-ba9e56066a8d':

Comment 1 Tal Nisan 2017-03-22 16:58:17 UTC
Kevin, the subject states that it's a local domain (i.e. local storage if I understand correctly) yet in the steps to reproduce you describe the scenario with a block domain, which one is true?

Aside for that you are adding 4 more disks, I'm quite sure this scenario can be reproduced by using less than that, can you please try and find the minimal reproducer?

Comment 2 Kevin Alon Goldblatt 2017-03-22 18:52:07 UTC
(In reply to Tal Nisan from comment #1)
> Kevin, the subject states that it's a local domain (i.e. local storage if I
> understand correctly) yet in the steps to reproduce you describe the
> scenario with a block domain, which one is true?
> 
> Aside for that you are adding 4 more disks, I'm quite sure this scenario can
> be reproduced by using less than that, can you please try and find the
> minimal reproducer?

Its a preallocated block device on a Local Data Centre

Comment 3 Kevin Alon Goldblatt 2017-03-22 18:53:16 UTC
Created attachment 1265470 [details]
server, engine, vdsm logs

Adding logs

Comment 4 Kevin Alon Goldblatt 2017-03-22 19:37:29 UTC
(In reply to Tal Nisan from comment #1)
> Kevin, the subject states that it's a local domain (i.e. local storage if I
> understand correctly) yet in the steps to reproduce you describe the
> scenario with a block domain, which one is true?
> 
> Aside for that you are adding 4 more disks, I'm quite sure this scenario can
> be reproduced by using less than that, can you please try and find the
> minimal reproducer?

I ran several scenarios again in order to narrow the reproducible scenario

The minimum is as follows:

Created Vm with 2 iscisi disks, one virtio-scsi and one ide preallocated on a Local Data Centre
Created 3 snapshots 1,2,3
Deleted snapshot 2 and performed a cold merge
Moved both the disks to another iscsi domain - fails as before with error creating container

Comment 5 Allon Mureinik 2017-03-23 08:37:46 UTC
The relevant part of the log:

2017-03-22 17:14:24,296+0200 ERROR (tasks/5) [storage.TaskManager.Task] (Task='85939eb2-4600-4331-9892-2db529a29211') Unexpected error (task:870)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 877, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 333, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1929, in createVolume
    initialSize=initialSize)
  File "/usr/share/vdsm/storage/sd.py", line 763, in createVolume
    initialSize=initialSize)
  File "/usr/share/vdsm/storage/volume.py", line 1089, in create
    (volUUID, e))
VolumeCreationError: Error creating a new volume: (u"Volume creation 9fd45008-c1f3-47b1-a089-9c8aa64cd284 failed: Invalid parameter: 'initial size=2144815'",)

Fred - can you take a look?

Comment 6 Kevin Alon Goldblatt 2017-03-23 10:27:30 UTC
Just to clarify:

The Data Center is Local (not Shared)
The Storage Domains are iscsi and nfs
The specific disks were iscsi

The scenario is as follows:

Created Vm with 2 iscisi disks, one virtio-scsi and one ide preallocated on a Local Data Centre (Not shared)
Created 3 snapshots 1,2,3
Deleted snapshot 2 and performed a cold merge
Moved both the disks to another iscsi domain - fails as before with error creating container

NB! The same scenario without the cold merge after creating the snapshots works fine.
So the cold merge is key to the scenario.

Comment 7 Tal Nisan 2017-03-23 11:58:57 UTC
Just to make sure Kevin, did you try to reproduce the same scenario on a shared DC?

Comment 8 Fred Rolland 2017-03-23 15:14:12 UTC
I managed to reproduce on local storage DC.
Though I did not try on shared, I guess it would be the same but need to check that.

When we create the volume on the VDSM before copying the data, we are first getting the volume info from the VDSM to check for the size of the volume (capacity and truesize)
This is done in order to create the volume for COW with initial size as the truesize in order to save increasing the LV size every chunk.

After the cold merge, one of the volume info returns with truesize > capacity, then the volume creation fails on a check that the initial size is less than the size.

Still need to understand the cause for this....

'capacity': '2147483648','truesize': '2415919104'

Volume.getInfo
{'info': {'status': 'OK', 'domain': '51871a7a-75cf-4e50-a223-8f95f4107274', 'voltype': 'INTERNAL', 'description': '', 'parent': 'f4c68218-2792-47f8-98a1-380454416257', 'format': 'COW', 'generation': 1, 'image': 'ac636495-24b4-460b-91b1-7cdcc222b15a', 'ctime': '1490270261', 'disktype': '2', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '2415919104', 'children': [], 'pool': '', 'capacity': '2147483648', 'uuid': u'2fa1c4ce-e510-4a58-a248-ec5466aa09ad', 'truesize': '2415919104', 'type': 'SPARSE', 'lease': {'owners': [], 'version': None}}}

Comment 9 Fred Rolland 2017-03-23 15:15:02 UTC
Kevin, can you reproduce the same scenario on a shared DC?

Comment 10 Kevin Alon Goldblatt 2017-04-04 14:44:33 UTC
(In reply to Fred Rolland from comment #9)
> Kevin, can you reproduce the same scenario on a shared DC?

Reproduce the same on a shared DC

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.1.6-0.1.el7.noarch
rhevm-4.1.1.6-0.1.el7.noarch
vdsm-4.19.10-1.el7ev.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create VM from Template with one disk
2. Add other disks including iscsi preallocated, iscsi thin, nfs preallocated and thin
3. Write to disks and created snapshots
4. Performed one cold merge successfully
5. Performed a cold move of all disks - iscsi - thin, nfs preallocated, nfs - thin all moved successfully. iscsi preallocated failed to move with:
CreateVolumeContainerCommand' ended with failure

Actual results:
Should work

Expected results:

Comment 11 Red Hat Bugzilla Rules Engine 2017-04-23 06:14:54 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 12 Eyal Shenitzky 2017-04-26 16:02:20 UTC
Verified with the following code:
---------------------------------
vdsm- 4.19.11-1.el7ev.noarch
rhevm- 4.1.2-0.1.el7

Verified with the following scenario:
------------------------------------------
Steps to reproduce:
------------------------------------------
Steps to Reproduce:
1. Create VM from Template with one disk
2. Add other disks including iscsi preallocated, iscsi thin, nfs preallocated and thin
3. Write to disks and created snapshots
4. Performed one cold merge successfully
5. Performed a cold move of all disks

Moving to VERIFIED

Comment 13 Eyal Shenitzky 2017-04-26 16:02:37 UTC
Verified with the following code:
---------------------------------
vdsm- 4.19.11-1.el7ev.noarch
rhevm- 4.1.2-0.1.el7

Verified with the following scenario:
------------------------------------------
Steps to reproduce:
------------------------------------------
Steps to Reproduce:
1. Create VM from Template with one disk
2. Add other disks including iscsi preallocated, iscsi thin, nfs preallocated and thin
3. Write to disks and created snapshots
4. Performed one cold merge successfully
5. Performed a cold move of all disks

Moving to VERIFIED

Comment 14 Tal Nisan 2018-09-16 12:32:02 UTC
*** Bug 1625240 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.