Bug 949283

Summary: RHEVM - Backend: No roll-forward in engine on failed removal of VM
Product: Red Hat Enterprise Virtualization Manager Reporter: Daniel Paikov <dpaikov>
Component: ovirt-engineAssignee: Liron Aravot <laravot>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: acathrow, amureini, dyasny, eedri, hateya, iheim, lpeer, Rhev-m-bugs, scohen, yeylon, ykaul
Target Milestone: ---Flags: amureini: Triaged+
Target Release: 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: sf16 Doc Type: Bug Fix
Doc Text:
Cause: When attempting to delete a vm and failing to create a deletion task for the "first" image, rollforward wasn't done - the disk was marked as illegal and the vm wasn't deleted. Consequence: vm left, the user possibly can't remove it, disk possibly can't be removed it it doesn't exist on the storage domain. Fix: when disk doesn't exist on the storage domain, it will be removed from the engine when attempting to delete it. when attempting to remove a vm, the vm will be removed as the first step, any of it's disks that are failed to be removed would be floating in illegal status and can be removed afterwards. Result: when attempting to remove a vm - it will be removed, it's disks which weren't removed would remain floating with illegal status.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 884635    
Bug Blocks:    
Attachments:
Description Flags
engine.log
none
vdsm.log none

Description Daniel Paikov 2013-04-07 13:01:14 UTC
Created attachment 732338 [details]
engine.log

* iSCSI/FCP DC with template and VM based on template.
* Open the child VM's device with python to cause VM removal to fail:
[root@orange-vdse ~]# python 
Python 2.6.6 (r266:84292, Oct 12 2012, 14:23:48) 
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> v = open('/dev/<vg name>/<lv name>', "r")
* Try to remove the VM.
* Removal fails for:
Thread-2140::ERROR::2013-04-07 15:36:04,046::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': 'Cannot remove Logical Volume: (\'d3842
6a3-fe2d-42ab-800e-0ce7c5ffb95a\', "{\'58d60d63-393e-4b5b-b688-a11e5ccf59b7\': ImgsPar(imgs=[\'750bfe61-34f3-454d-92f8-80fce9419e13\'], parent=\'ae0f4f90-b15
c-4f78-8d3a-7f9b048d2ff9\')}")', 'code': 551}}
* There is no roll-forward in engine, the VM continues to exist.
* Unlock the device and try to remove the VM again.
* Removal fails for:
Thread-2187::ERROR::2013-04-07 15:37:23,671::hsm::1450::Storage.HSM::(deleteImage) Empty or not found image 750bfe61-34f3-454d-92f8-80fce9419e13 in SD d38426a3-fe2d-42ab-800e-0ce7c5ffb95a. {'ae0f4f90-b15c-4f78-8d3a-7f9b048d2ff9': ImgsPar(imgs=['659a325e-bef7-4cad-a859-0549ea04dcad'], parent='00000000-0000-0000-0000-000000000000'), '32d2282e-50ec-4a6e-964a-be518b0abec6': ImgsPar(imgs=['13eb9b31-6a97-48c6-8a1c-d18182bee867'], parent='00000000-0000-0000-0000-000000000000')}
* Since this image now exists in engine and not in VDSM, it is impossible to remove the VM, or the template it's based on, or its domain.

Comment 1 Daniel Paikov 2013-04-07 13:02:23 UTC
Created attachment 732339 [details]
vdsm.log

Comment 2 Maor 2013-04-08 08:11:41 UTC
I suspect this is a similar scenario as BZ916554
which is a duplicate of BZ884635.

In the log it can be noticed that we get the following error:
IRSGenericException: IRSErrorException: Image does not exist in domain: 'image=750bfe61-34f3-454d-92f8-80fce9419e13, domain=d38426a3-fe2d-42ab-800e-0ce7c5ffb95a'

Once BZ884635 will be fixed, it should be solved.

Comment 3 Maor 2013-04-08 08:18:18 UTC
Should we close as duplicate?

Comment 4 Allon Mureinik 2013-04-08 11:31:35 UTC
I added bug 884635 as a blocker - IMHO, this should be left open since it describes a different scenario, which may pass/fail QA independently of the original bug.

Comment 5 Allon Mureinik 2013-04-25 09:05:22 UTC
Patch was reverted as it breaks QE automation tests.
Need to revisit once automations are fixed.

Comment 6 Eyal Edri 2013-04-28 07:23:39 UTC
moved back to POST since bug was reverted on sf14.
please move to modified after syncing with qe on fixed tests.

Comment 7 Eyal Edri 2013-04-28 07:34:29 UTC
moved back to ON_DEV per development request.

Comment 8 Allon Mureinik 2013-05-08 11:24:50 UTC
Moving to modified based on fixed described in the external tracker that was pushed for another bug.

Liron - please document the behavior change.

Comment 9 Elad 2013-05-23 16:44:15 UTC
removal to a running vm succeeded. the disk remains in the system and is manually removable. 

verified on RHEVM-3.2-SF17.1:
vdsm-4.10.2-21.0.el6ev.x86_64
rhevm-3.2.0-11.28.el6ev.noarch

Comment 10 Itamar Heim 2013-06-11 08:48:34 UTC
3.2 has been released

Comment 11 Itamar Heim 2013-06-11 08:48:37 UTC
3.2 has been released

Comment 12 Itamar Heim 2013-06-11 08:48:37 UTC
3.2 has been released

Comment 13 Itamar Heim 2013-06-11 08:53:35 UTC
3.2 has been released

Comment 14 Itamar Heim 2013-06-11 09:24:15 UTC
3.2 has been released