Bug 911231
| Summary: | engine: engine reports a vm removed with wipe=true as removed when in actuallity there was an error in vdsm and the vm was not removed (NFS storage) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||
| Component: | ovirt-engine | Assignee: | Maor <mlipchuk> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Dafna Ron <dron> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.1.3 | CC: | abaron, acathrow, dyasny, iheim, lpeer, Rhev-m-bugs, scohen, yeylon, ykaul | ||||
| Target Milestone: | --- | Keywords: | Regression | ||||
| Target Release: | 3.2.0 | Flags: | scohen:
Triaged+
|
||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-02-27 11:42:55 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Maor, why do we proceed with deletion when vdsm reports that it failed to remove the image? Dafna, what's the difference between this bug and bug 911209 ? (In reply to comment #3) > Maor, why do we proceed with deletion when vdsm reports that it failed to > remove the image? Remove VM will use roll forward. If engine already sends a task to VDSM, then it is VDSM responsibility to remove the disk. What will a hybrid VM in the setup will do any good? Right now we already got orphaned disks of deleted VMs which the only reason they are left in the setup is because VDSM is not aware engine wants to delete them. why it is a regression? (In reply to comment #5) > Dafna, what's the difference between this bug and bug 911209 ? this is for the engine rollback. the logs shows that the engine is getting an error from vdsm but still removes the vm from the db. 911209 is about the actual probem (that a vm sent with wipe=true will not be removed from the stroage because of vdsm exception), but I think its should be a vdsm bug and not an engine bug since through API we are able to send wipe=true in NFS storage. (In reply to comment #7) > why it is a regression? because it did not happen in earlier versions Delete is defined as roll forward from behaviour pov. vdsm should collect garbage, but this is not new. The only thing that may have changed is vdsm behaviour that used to try again (once) after deleting. |
Created attachment 697259 [details] logs Description of problem: I have a vm that I created in iscsi DC with wipe=true. after exporting the vm I imported it to NFS domain. when I removed the vm, although event log reports the vm as successfully removed it was not removed due to an error in vdsm and still exists in the data domain. Version-Release number of selected component (if applicable): 3.1.3 vdsm-4.10.2-1.4.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. create vm in iscsi DC with wipe=true 2. export the vm 3. import the vm to NFS DC 4. remove the vm 5. try to import the vm again Actual results: the vm is not actually removed (the engine log even gets an error) but we report that the vm was removed. when we try to import the vm again we find that it already exists in the the setup Expected results: we should not report success and remove the vm from the db when vdsm reports a failure in delete before actually making sure that the image was removed. Additional info: logs events reported the vm was removed: 2013-Feb-14, 16:45 VM WIPE was successfully removed. 2013-Feb-14, 16:45 Removal of VM WIPE was initiated by admin@internal. trying to import again: Error while executing action: Cannot copy Template. The Storage Domain already contains the target disk(s). image no longer exosts in db: root@daffi-linux ~]# psql --expanded -U postgres engine -c "SELECT image_guid from images" |grep dd474627-6829-4375-bd39-3c7d06389789 Password for user postgres: [root@daffi-linux ~]# image exists in the domain: [root@orion images]# ls -l total 4 drwxr-xr-x 2 vdsm kvm 4096 Feb 14 16:42 dd474627-6829-4375-bd39-3c7d06389789 [root@orion images]# pwd /export/Dafna/data/36c9b553-5da3-483c-8f29-7b60880c1548/images [root@orion images]# task is reported with failure in vdsm and logged to engine log: 013-02-14 16:45:13,167 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-4) [1468c4bc] Failed in HSMGetAllTasksStatusesVDS method 2013-02-14 16:45:13,167 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-4) [1468c4bc] Error code SourceImageActionError and error message VDSGenericException: VDSErrorException: Failed to HSMG etAllTasksStatusesVDS, error = Error during source image manipulation 2013-02-14 16:45:13,167 INFO [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-4) [1468c4bc] SPMAsyncTask::PollTask: Polling task dac9423a-4cdf-40fa-88f4-8559fbd64da9 (Parent Command RemoveVm, Parameters Type org.ovirt.e ngine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'. 2013-02-14 16:45:13,385 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-4) [1468c4bc] BaseAsyncTask::LogEndTaskFailure: Task dac9423a-4cdf-40fa-88f4-8559fbd64da9 (Parent Command RemoveVm, Parameters Type org.ovirt .engine.core.common.asynctasks.AsyncTaskParameters) ended with failure: -- Result: cleanSuccess -- Message: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error during source image manipulation, -- Exception: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error during source image manipulation