Description of problem: Failed to Cold Move Preallocated ISCSI disks to block SD - CreateVolumeContainerCommand' ended with failure Version-Release number of selected component (if applicable): ovirt-engine-4.1.1.5-0.1.el7.noarch rhevm-4.1.1.5-0.1.el7.noarch vdsm-4.19.9-1.el7ev.x86_64 How reproducible: Always Steps to Reproduce: 1. Create VM from Template with one disk 2. Add other disks including iscsi preallocated, iscsi thin, nfs preallocated and thin 3. Write to disks and created snapshots 4. Performed one cold merge successfully 5. Performed a cold move of all disks - iscsi - thin, nfs preallocated, nfs - thin all moved successfully. iscsi preallocated failed to move with: CreateVolumeContainerCommand' ended with failure Actual results: Should work Expected results: Additional info: Engine log -------------------------------------------------------- 2017-03-22 17:14:35,188+02 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [fc2a14d] Command 'CloneImageGroupVolumesStructure' (id: '34914a98-a220-47a6-bd35-843f2 03978ba') waiting on child command id: '042908c5-bd9f-45bf-9c15-63642d63dd25' type:'CreateVolumeContainer' to complete 2017-03-22 17:14:35,196+02 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [3dcab615] Command 'CopyImageGroupWithData' (id: '131d206d-73aa-4d29-905c-ba9e56066a8d' ) waiting on child command id: '34914a98-a220-47a6-bd35-843f203978ba' type:'CloneImageGroupVolumesStructure' to complete 2017-03-22 17:14:35,214+02 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [7a88fb] Command 'MoveOrCopyDisk' (id: 'b472d9c3-02ca-4e8d-8083-bb40a6c7901b') wait ing on child command id: '62a0e2d0-a383-4889-805f-3fba57e841f6' type:'MoveImageGroup' to complete 2017-03-22 17:14:35,311+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler1) [50277c51] Failed in 'HSMGetAllTasksStatusesVDS' method 2017-03-22 17:14:35,326+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [50277c51] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM purple-vds2.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Error creating a new volume 2017-03-22 17:14:35,327+02 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler1) [50277c51] SPMAsyncTask::PollTask: Polling task '85939eb2-4600-4331-9892-2db529a29211' (Parent Command 'CreateVolumeContainer', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'. 2017-03-22 17:14:35,331+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler1) [50277c51] BaseAsyncTask::logEndTaskFailure: Task '85939eb2-4600-4331-9892-2db529a29211' (Parent Command 'CreateVolumeContainer', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'cleanSuccess' -- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205', -- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205' 2017-03-22 17:14:35,333+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler1) [50277c51] CommandAsyncTask::endActionIfNecessary: All tasks of command '042908c5-bd9f-45bf-9c15-63642d63dd25' has ended -> executing 'endAction' 2017-03-22 17:14:35,333+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler1) [50277c51] CommandAsyncTask::endAction: Ending action for '1' tasks (command ID: '042908c5-bd9f-45bf-9c15-63642d63dd25'): calling endAction '. 2017-03-22 17:14:35,334+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-30) [50277c51] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'CreateVolumeContainer', 2017-03-22 17:14:35,350+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand' with failure. 2017-03-22 17:14:35,367+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'CreateVolumeContainer' completed, handling the result. 2017-03-22 17:14:35,367+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'CreateVolumeContainer' succeeded, clearing tasks. 2017-03-22 17:14:35,367+02 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] SPMAsyncTask::ClearAsyncTask: Attempting to clear task '85939eb2-4600-4331-9892-2db529a29211' 2017-03-22 17:14:35,370+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] START, SPMClearTaskVDSCommand( SPMTaskGuidBaseVDSCommandParameters:{runAsync='true', storagePoolId='f906c5b2-96d5-4592-b16c-7e7e60af37d6', ignoreFailoverLimit='false', taskId='85939eb2-4600-4331-9892-2db529a29211'}), log id: 4bfa86c9 2017-03-22 17:14:35,373+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] START, HSMClearTaskVDSCommand(HostName = purple-vds2.qa.lab.tlv.redhat.com, HSMTaskGuidBaseVDSCommandParameters:{runAsync='true', hostId='c82979b9-9c8c-4f66-92f8-2172ec54f99e', taskId='85939eb2-4600-4331-9892-2db529a29211'}), log id: 17cb4690 2017-03-22 17:14:36,220+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] FINISH, HSMClearTaskVDSCommand, log id: 17cb4690 2017-03-22 17:14:36,220+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] FINISH, SPMClearTaskVDSCommand, log id: 4bfa86c9 2017-03-22 17:14:36,226+02 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] BaseAsyncTask::removeTaskFromDB: Removed task '85939eb2-4600-4331-9892-2db529a29211' from DataBase 2017-03-22 17:14:36,227+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-30) [fc2a14d] CommandAsyncTask::HandleEndActionResult [within thread]: Removing CommandMultiAsyncTasks object for entity '042908c5-bd9f-45bf-9c15-63642d63dd25' 2017-03-22 17:14:45,244+02 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [68fc3f87] Command 'MoveImageGroup' (id: '62a0e2d0-a383-4889-805f-3fba57e841f6') waiting on child command id: '131d206d-73aa-4d29-905c-ba9e56066a8d' type:'CopyImageGroupWithData' to complete 2017-03-22 17:14:45,261+02 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [fc2a14d] Command 'CloneImageGroupVolumesStructure' id: '34914a98-a220-47a6-bd35-843f203978ba' child commands '[0459e6ef-03bf-4428-a4a9-e21045b8f939, 042908c5-bd9f-45bf-9c15-63642d63dd25]' executions were completed, status 'FAILED' 2017-03-22 17:14:45,284+02 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [3dcab615] Command 'CopyImageGroupWithData' (id: '131d206d-73aa-4d29-905c-ba9e56066a8d':
Kevin, the subject states that it's a local domain (i.e. local storage if I understand correctly) yet in the steps to reproduce you describe the scenario with a block domain, which one is true? Aside for that you are adding 4 more disks, I'm quite sure this scenario can be reproduced by using less than that, can you please try and find the minimal reproducer?
(In reply to Tal Nisan from comment #1) > Kevin, the subject states that it's a local domain (i.e. local storage if I > understand correctly) yet in the steps to reproduce you describe the > scenario with a block domain, which one is true? > > Aside for that you are adding 4 more disks, I'm quite sure this scenario can > be reproduced by using less than that, can you please try and find the > minimal reproducer? Its a preallocated block device on a Local Data Centre
Created attachment 1265470 [details] server, engine, vdsm logs Adding logs
(In reply to Tal Nisan from comment #1) > Kevin, the subject states that it's a local domain (i.e. local storage if I > understand correctly) yet in the steps to reproduce you describe the > scenario with a block domain, which one is true? > > Aside for that you are adding 4 more disks, I'm quite sure this scenario can > be reproduced by using less than that, can you please try and find the > minimal reproducer? I ran several scenarios again in order to narrow the reproducible scenario The minimum is as follows: Created Vm with 2 iscisi disks, one virtio-scsi and one ide preallocated on a Local Data Centre Created 3 snapshots 1,2,3 Deleted snapshot 2 and performed a cold merge Moved both the disks to another iscsi domain - fails as before with error creating container
The relevant part of the log: 2017-03-22 17:14:24,296+0200 ERROR (tasks/5) [storage.TaskManager.Task] (Task='85939eb2-4600-4331-9892-2db529a29211') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 333, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1929, in createVolume initialSize=initialSize) File "/usr/share/vdsm/storage/sd.py", line 763, in createVolume initialSize=initialSize) File "/usr/share/vdsm/storage/volume.py", line 1089, in create (volUUID, e)) VolumeCreationError: Error creating a new volume: (u"Volume creation 9fd45008-c1f3-47b1-a089-9c8aa64cd284 failed: Invalid parameter: 'initial size=2144815'",) Fred - can you take a look?
Just to clarify: The Data Center is Local (not Shared) The Storage Domains are iscsi and nfs The specific disks were iscsi The scenario is as follows: Created Vm with 2 iscisi disks, one virtio-scsi and one ide preallocated on a Local Data Centre (Not shared) Created 3 snapshots 1,2,3 Deleted snapshot 2 and performed a cold merge Moved both the disks to another iscsi domain - fails as before with error creating container NB! The same scenario without the cold merge after creating the snapshots works fine. So the cold merge is key to the scenario.
Just to make sure Kevin, did you try to reproduce the same scenario on a shared DC?
I managed to reproduce on local storage DC. Though I did not try on shared, I guess it would be the same but need to check that. When we create the volume on the VDSM before copying the data, we are first getting the volume info from the VDSM to check for the size of the volume (capacity and truesize) This is done in order to create the volume for COW with initial size as the truesize in order to save increasing the LV size every chunk. After the cold merge, one of the volume info returns with truesize > capacity, then the volume creation fails on a check that the initial size is less than the size. Still need to understand the cause for this.... 'capacity': '2147483648','truesize': '2415919104' Volume.getInfo {'info': {'status': 'OK', 'domain': '51871a7a-75cf-4e50-a223-8f95f4107274', 'voltype': 'INTERNAL', 'description': '', 'parent': 'f4c68218-2792-47f8-98a1-380454416257', 'format': 'COW', 'generation': 1, 'image': 'ac636495-24b4-460b-91b1-7cdcc222b15a', 'ctime': '1490270261', 'disktype': '2', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '2415919104', 'children': [], 'pool': '', 'capacity': '2147483648', 'uuid': u'2fa1c4ce-e510-4a58-a248-ec5466aa09ad', 'truesize': '2415919104', 'type': 'SPARSE', 'lease': {'owners': [], 'version': None}}}
Kevin, can you reproduce the same scenario on a shared DC?
(In reply to Fred Rolland from comment #9) > Kevin, can you reproduce the same scenario on a shared DC? Reproduce the same on a shared DC Version-Release number of selected component (if applicable): ovirt-engine-4.1.1.6-0.1.el7.noarch rhevm-4.1.1.6-0.1.el7.noarch vdsm-4.19.10-1.el7ev.x86_64 How reproducible: Always Steps to Reproduce: 1. Create VM from Template with one disk 2. Add other disks including iscsi preallocated, iscsi thin, nfs preallocated and thin 3. Write to disks and created snapshots 4. Performed one cold merge successfully 5. Performed a cold move of all disks - iscsi - thin, nfs preallocated, nfs - thin all moved successfully. iscsi preallocated failed to move with: CreateVolumeContainerCommand' ended with failure Actual results: Should work Expected results:
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Verified with the following code: --------------------------------- vdsm- 4.19.11-1.el7ev.noarch rhevm- 4.1.2-0.1.el7 Verified with the following scenario: ------------------------------------------ Steps to reproduce: ------------------------------------------ Steps to Reproduce: 1. Create VM from Template with one disk 2. Add other disks including iscsi preallocated, iscsi thin, nfs preallocated and thin 3. Write to disks and created snapshots 4. Performed one cold merge successfully 5. Performed a cold move of all disks Moving to VERIFIED
*** Bug 1625240 has been marked as a duplicate of this bug. ***