Bug 1199314

Summary: [BLOCKED] RHEV export recovery doesn't handle multiple disks
Product: Red Hat Enterprise Virtualization Manager Reporter: Gordon Watson <gwatson>
Component: vdsmAssignee: Liron Aravot <laravot>
Status: CLOSED WONTFIX QA Contact: Aharon Canan <acanan>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.5.0CC: aefrat, amureini, bazulay, ebenahar, ecohen, gwatson, iheim, laravot, lpeer, lsurette, pzhukov, ssekidde, tnisan, yeylon, ylavi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-22 09:00:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1185830    
Bug Blocks:    

Description Gordon Watson 2015-03-05 23:21:07 UTC
Description of problem:

If a VM with more than one disk is exported, if one of the disk copies fails, the other disk copy completes to the export domain and is not removed, i.e. there is no recovery/rollback mechanism in place for this. 

The result will be that one of the disks will now have its images on the export domain. However, the RHEV-M GUI will not reflect this.

If the problem that caused the failure is resolved and the export is repeated, the export will now fail with "volume already exists" reported against the disk that was successfully copied previously. 

Finally, using 'Force Override' allows it to succees, but the user is not prompted to use this and is unaware of what actually transpired.



Version-Release number of selected component (if applicable):

RHEV 3.5
RHEV-H 6.6 (20150114.0) w/vdsm-4.16.8.1-5


How reproducible:

Every time.



Steps to Reproduce:

1. Create a VM with 2 disks.
2. Start an export.
3. Depending upon the time of disk image (raw or qcow2) and upon whether 'Collapse Snapshots' was selected or not, kill one of the 'dd' or 'qemu-img convert' processes.
4. The export will fail.
5. Check the export domain's 'VM Import' tab in the GUI. The VM will not be displayed.
6. Check the export domain. One disk will physically exist there.
7. Retry the export.
8. It should fail with "volume already exists".


Actual results:

When one disk of an export fails to be copied, other disks are left on the export domain.


Expected results:

When one disk of an export fails to be copied, either all of the successfully copied disks should be removed from the export domain as part of a rollback sequence or the user should be notified that they exist there and to use 'Force Override' when retrying.



Additional info:

Comment 2 Allon Mureinik 2015-03-08 10:52:40 UTC
Tal/Liron, didn't we already have such a BZ on our sights?

Comment 3 Liron Aravot 2015-03-08 21:05:02 UTC
this wasn't resolved for export of vm (just looked over the code). therefore this BZ is relevant.

Comment 5 Allon Mureinik 2015-03-22 14:34:27 UTC
(In reply to Liron Aravot from comment #3)
> this wasn't resolved for export of vm (just looked over the code). therefore
> this BZ is relevant.

The real problem here is that we don't have a mechanism to track the delete tasks we would like to start once we determine a failure.
Rewriting this command in a non-SPM infrastructure should provide a plausible solution for this. Blocking on bug 1185830.