Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 911231

Summary:

engine: engine reports a vm removed with wipe=true as removed when in actuallity there was an error in vdsm and the vm was not removed (NFS storage)

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Dafna Ron <dron>

Component:

ovirt-engine

Assignee:

Maor <mlipchuk>

Status:

CLOSED WONTFIX

QA Contact:

Dafna Ron <dron>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

3.1.3

CC:

abaron, acathrow, dyasny, iheim, lpeer, Rhev-m-bugs, scohen, yeylon, ykaul

Target Milestone:

---

Keywords:

Regression

Target Release:

3.2.0

Flags:

scohen: Triaged+

Hardware:

x86_64

OS:

Linux

Whiteboard:

storage

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-02-27 11:42:55 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
logs	none

Description Dafna Ron 2013-02-14 15:37:53 UTC

Created attachment 697259 [details]
logs

Description of problem:

I have a vm that I created in iscsi DC with wipe=true. 
after exporting the vm I imported it to NFS domain. 
when I removed the vm, although event log reports the vm as successfully removed it was not removed due to an error in vdsm and still exists in the data domain. 

Version-Release number of selected component (if applicable):

3.1.3 
vdsm-4.10.2-1.4.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create vm in iscsi DC with wipe=true
2. export the vm 
3. import the vm to NFS DC
4. remove the vm
5. try to import the vm again
  
Actual results:

the vm is not actually removed (the engine log even gets an error) but we report that the vm was removed. 
when we try to import the vm again we find that it already exists in the the setup

Expected results:

we should not report success and remove the vm from the db when vdsm reports a failure in delete before actually making sure that the image was removed. 

Additional info: logs

events reported the vm was removed: 
	
2013-Feb-14, 16:45
	
VM WIPE was successfully removed.
	
2013-Feb-14, 16:45
	
Removal of VM WIPE was initiated by admin@internal.

trying to import again: 

Error while executing action: Cannot copy Template. The Storage Domain already contains the target disk(s).

image no longer exosts in db: 

root@daffi-linux ~]# psql --expanded -U postgres engine  -c "SELECT image_guid from images" |grep dd474627-6829-4375-bd39-3c7d06389789
Password for user postgres: 
[root@daffi-linux ~]# 


image exists in the domain: 

[root@orion images]# ls -l
total 4
drwxr-xr-x 2 vdsm kvm 4096 Feb 14 16:42 dd474627-6829-4375-bd39-3c7d06389789
[root@orion images]# pwd
/export/Dafna/data/36c9b553-5da3-483c-8f29-7b60880c1548/images
[root@orion images]# 


task is reported with failure in vdsm and logged to engine log: 

013-02-14 16:45:13,167 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-4) [1468c4bc] Failed in HSMGetAllTasksStatusesVDS method
2013-02-14 16:45:13,167 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-4) [1468c4bc] Error code SourceImageActionError and error message VDSGenericException: VDSErrorException: Failed to HSMG
etAllTasksStatusesVDS, error = Error during source image manipulation
2013-02-14 16:45:13,167 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-4) [1468c4bc] SPMAsyncTask::PollTask: Polling task dac9423a-4cdf-40fa-88f4-8559fbd64da9 (Parent Command RemoveVm, Parameters Type org.ovirt.e
ngine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'.
2013-02-14 16:45:13,385 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-4) [1468c4bc] BaseAsyncTask::LogEndTaskFailure: Task dac9423a-4cdf-40fa-88f4-8559fbd64da9 (Parent Command RemoveVm, Parameters Type org.ovirt
.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure:
-- Result: cleanSuccess
-- Message: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error during source image manipulation,
-- Exception: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error during source image manipulation

Comment 3 Ayal Baron 2013-02-24 06:24:29 UTC

Maor, why do we proceed with deletion when vdsm reports that it failed to remove the image?

Comment 5 Ayal Baron 2013-02-24 06:51:38 UTC

Dafna, what's the difference between this bug and bug 911209 ?

Comment 6 Maor 2013-02-24 08:16:46 UTC

(In reply to comment #3)
> Maor, why do we proceed with deletion when vdsm reports that it failed to
> remove the image?
Remove VM will use roll forward.
If engine already sends a task to VDSM, then it is VDSM responsibility to remove the disk.
What will a hybrid VM in the setup will do any good?
Right now we already got orphaned disks of deleted VMs which the only reason they are left in the setup is because VDSM is not aware engine wants to delete them.

Comment 7 Maor 2013-02-24 08:18:46 UTC

why it is a regression?

Comment 8 Dafna Ron 2013-02-24 09:06:09 UTC

(In reply to comment #5)
> Dafna, what's the difference between this bug and bug 911209 ?

this is for the engine rollback. 
the logs shows that the engine is getting an error from vdsm but still removes the vm from the db. 
911209 is about the actual probem (that a vm sent with wipe=true will not be removed from the stroage because of vdsm exception), but I think its should be a vdsm bug and not an engine bug since through API we are able to send wipe=true in NFS storage.

Comment 9 Dafna Ron 2013-02-24 09:08:41 UTC

(In reply to comment #7)
> why it is a regression?

because it did not happen in earlier versions

Comment 12 Ayal Baron 2013-02-27 11:42:55 UTC

Delete is defined as roll forward from behaviour pov.
vdsm should collect garbage, but this is not new.
The only thing that may have changed is vdsm behaviour that used to try again (once) after deleting.