1416113 – Cold move disk - No cleanup after failure (rare usecase)

Bug 1416113 - Cold move disk - No cleanup after failure (rare usecase)

Summary: Cold move disk - No cleanup after failure (rare usecase)

Keywords:
Status:	CLOSED DUPLICATE of bug 978975
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	BLL.Storage
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.2.0
Target Release:	---
Assignee:	Fred Rolland
QA Contact:	Raz Tamir
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-24 15:48 UTC by Kevin Alon Goldblatt
Modified:	2017-02-02 16:28 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-02-02 16:28:49 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.2+

Attachments	(Terms of Use)
server, vdsm, engine.log (1.69 MB, application/x-gzip) 2017-01-24 15:50 UTC, Kevin Alon Goldblatt	no flags	Details
View All

Description Kevin Alon Goldblatt 2017-01-24 15:48:59 UTC

Description of problem:
After failure during Move operation the source images are not cleaned up

Version-Release number of selected component (if applicable):
vdsm-4.19.2-1.gitd9c3ccb.el7.centos.x86_64
ovirt-engine-4.1.0.3-0.0.master.20170122091652.gitc6fc2c2.el7.centos.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create VM with preallocated disk and power VM off
2. Select to Move the block disk to another domain
3. Restart the VDSM on the SPM as soon as the CopyDataCommand is reported on the engine. The Move fails and Disk is reported as locked, progress bar also stuck on 16%. Images on the source volume are not removed

Actual results:


Expected results:
Images should be removed as soon as new SPM comes up


Additional info:
2017-01-22 21:27:35,287+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand] (DefaultQuartzScheduler9) [271b7880] Command 'org.ovirt.engine.core.bll.storage.disk.imag
e.RemoveImageCommand' failed: EngineException: Cannot allocate IRS server (Failed with error IRS_REPOSITORY_NOT_FOUND and code 5009)
2017-01-22 21:27:35,453+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler9) [271b7880] Removed task 'de9129c2-d4b9-4e56-8d58-af7bfd4a2cc7' from DataBase
2017-01-22 21:27:35,531+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler9) [271b7880] EVENT_ID: USER_MOVE_IMAGE_GROUP_FAILED_TO_DELET
E_SRC_IMAGE(2,025), Correlation ID: 4062dba9, Job ID: 8af37221-a36e-4a4f-85f7-4892c90a3dd6, Call Stack: null, Custom Event ID: -1, Message: Possible failure while deleting vm_from_tp2_Disk
1 from the source Storage Domain block1 during the move operation. The Storage Domain may be manually cleaned-up from possible leftovers (User:SYSTEM).
2017-01-22 21:27:35,612+02 INFO  [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (DefaultQuartzScheduler7) [4f65c18d] Running command: HandleVdsCpuFlagsOrClusterChange
dCommand internal: true. Entities affected :  ID: f60aff6c-fc3a-4e48-b001-08f0a1ac1423 Type: VDS
2017-01-22 21:27:35,663+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (DefaultQuartzScheduler9) [271b7880] START, GetImageInfoVDSCommand( GetImageInfoVDSComma
ndParameters:{runAsync='true', storagePoolId='00000001-0001-0001-0001-000000000311', ignoreFailoverLimit='false', storageDomainId='da75f452-b997-46ed-8e62-b5b45a81a2a5', imageGroupId='5ff3
7a40-fe96-43ec-a4f3-b743b746c417', imageId='e27d279a-ee7d-4211-88b8-7f1dfec313c0'}), log id: 1324b79e

Comment 1 Kevin Alon Goldblatt 2017-01-24 15:50:42 UTC

Created attachment 1243957 [details]
server, vdsm, engine.log

Adding logs

Comment 2 Liron Aravot 2017-01-25 17:34:01 UTC

Thanks Kevin,
I removed "New HSM infrastructure" from the header as this behavior isn't related or caused by the SPDM feature.
The probability of encountering this scenario is very low (usually only when the SPM is non responsive exactly when the engine attempts to remove the source image).
We would be able to solve it after we'll support having non template disk images on multiple domain (I suppose that this will happen after/as part of our disk/images db scheme planned refactor) or when a move operation will assign new ids to the destination image/volumes.

Tal - Can you please look at the target milestone for it?  imo that shouldn't be a 4.1 material.

Thanks,
Liron

Comment 3 Liron Aravot 2017-02-02 16:28:49 UTC


*** This bug has been marked as a duplicate of bug 978975 ***

Note You need to log in before you can comment on or make changes to this bug.