Bug 949283 - RHEVM - Backend: No roll-forward in engine on failed removal of VM
Summary: RHEVM - Backend: No roll-forward in engine on failed removal of VM
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.2.0
Assignee: Liron Aravot
QA Contact: Elad
URL:
Whiteboard: storage
Depends On: 884635
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-07 13:01 UTC by Daniel Paikov
Modified: 2016-02-10 20:28 UTC (History)
11 users (show)

Fixed In Version: sf16
Doc Type: Bug Fix
Doc Text:
Cause: When attempting to delete a vm and failing to create a deletion task for the "first" image, rollforward wasn't done - the disk was marked as illegal and the vm wasn't deleted. Consequence: vm left, the user possibly can't remove it, disk possibly can't be removed it it doesn't exist on the storage domain. Fix: when disk doesn't exist on the storage domain, it will be removed from the engine when attempting to delete it. when attempting to remove a vm, the vm will be removed as the first step, any of it's disks that are failed to be removed would be floating in illegal status and can be removed afterwards. Result: when attempting to remove a vm - it will be removed, it's disks which weren't removed would remain floating with illegal status.
Clone Of:
Environment:
Last Closed:
oVirt Team: Storage
Target Upstream Version:
Embargoed:
amureini: Triaged+


Attachments (Terms of Use)
engine.log (19.37 KB, application/x-gunzip)
2013-04-07 13:01 UTC, Daniel Paikov
no flags Details
vdsm.log (1.21 MB, application/x-gunzip)
2013-04-07 13:02 UTC, Daniel Paikov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 13611 0 None None None Never

Description Daniel Paikov 2013-04-07 13:01:14 UTC
Created attachment 732338 [details]
engine.log

* iSCSI/FCP DC with template and VM based on template.
* Open the child VM's device with python to cause VM removal to fail:
[root@orange-vdse ~]# python 
Python 2.6.6 (r266:84292, Oct 12 2012, 14:23:48) 
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> v = open('/dev/<vg name>/<lv name>', "r")
* Try to remove the VM.
* Removal fails for:
Thread-2140::ERROR::2013-04-07 15:36:04,046::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': 'Cannot remove Logical Volume: (\'d3842
6a3-fe2d-42ab-800e-0ce7c5ffb95a\', "{\'58d60d63-393e-4b5b-b688-a11e5ccf59b7\': ImgsPar(imgs=[\'750bfe61-34f3-454d-92f8-80fce9419e13\'], parent=\'ae0f4f90-b15
c-4f78-8d3a-7f9b048d2ff9\')}")', 'code': 551}}
* There is no roll-forward in engine, the VM continues to exist.
* Unlock the device and try to remove the VM again.
* Removal fails for:
Thread-2187::ERROR::2013-04-07 15:37:23,671::hsm::1450::Storage.HSM::(deleteImage) Empty or not found image 750bfe61-34f3-454d-92f8-80fce9419e13 in SD d38426a3-fe2d-42ab-800e-0ce7c5ffb95a. {'ae0f4f90-b15c-4f78-8d3a-7f9b048d2ff9': ImgsPar(imgs=['659a325e-bef7-4cad-a859-0549ea04dcad'], parent='00000000-0000-0000-0000-000000000000'), '32d2282e-50ec-4a6e-964a-be518b0abec6': ImgsPar(imgs=['13eb9b31-6a97-48c6-8a1c-d18182bee867'], parent='00000000-0000-0000-0000-000000000000')}
* Since this image now exists in engine and not in VDSM, it is impossible to remove the VM, or the template it's based on, or its domain.

Comment 1 Daniel Paikov 2013-04-07 13:02:23 UTC
Created attachment 732339 [details]
vdsm.log

Comment 2 Maor 2013-04-08 08:11:41 UTC
I suspect this is a similar scenario as BZ916554
which is a duplicate of BZ884635.

In the log it can be noticed that we get the following error:
IRSGenericException: IRSErrorException: Image does not exist in domain: 'image=750bfe61-34f3-454d-92f8-80fce9419e13, domain=d38426a3-fe2d-42ab-800e-0ce7c5ffb95a'

Once BZ884635 will be fixed, it should be solved.

Comment 3 Maor 2013-04-08 08:18:18 UTC
Should we close as duplicate?

Comment 4 Allon Mureinik 2013-04-08 11:31:35 UTC
I added bug 884635 as a blocker - IMHO, this should be left open since it describes a different scenario, which may pass/fail QA independently of the original bug.

Comment 5 Allon Mureinik 2013-04-25 09:05:22 UTC
Patch was reverted as it breaks QE automation tests.
Need to revisit once automations are fixed.

Comment 6 Eyal Edri 2013-04-28 07:23:39 UTC
moved back to POST since bug was reverted on sf14.
please move to modified after syncing with qe on fixed tests.

Comment 7 Eyal Edri 2013-04-28 07:34:29 UTC
moved back to ON_DEV per development request.

Comment 8 Allon Mureinik 2013-05-08 11:24:50 UTC
Moving to modified based on fixed described in the external tracker that was pushed for another bug.

Liron - please document the behavior change.

Comment 9 Elad 2013-05-23 16:44:15 UTC
removal to a running vm succeeded. the disk remains in the system and is manually removable. 

verified on RHEVM-3.2-SF17.1:
vdsm-4.10.2-21.0.el6ev.x86_64
rhevm-3.2.0-11.28.el6ev.noarch

Comment 10 Itamar Heim 2013-06-11 08:48:34 UTC
3.2 has been released

Comment 11 Itamar Heim 2013-06-11 08:48:37 UTC
3.2 has been released

Comment 12 Itamar Heim 2013-06-11 08:48:37 UTC
3.2 has been released

Comment 13 Itamar Heim 2013-06-11 08:53:35 UTC
3.2 has been released

Comment 14 Itamar Heim 2013-06-11 09:24:15 UTC
3.2 has been released


Note You need to log in before you can comment on or make changes to this bug.