Description of problem:
If a VM with more than one disk is exported, if one of the disk copies fails, the other disk copy completes to the export domain and is not removed, i.e. there is no recovery/rollback mechanism in place for this.
The result will be that one of the disks will now have its images on the export domain. However, the RHEV-M GUI will not reflect this.
If the problem that caused the failure is resolved and the export is repeated, the export will now fail with "volume already exists" reported against the disk that was successfully copied previously.
Finally, using 'Force Override' allows it to succees, but the user is not prompted to use this and is unaware of what actually transpired.
Version-Release number of selected component (if applicable):
RHEV 3.5
RHEV-H 6.6 (20150114.0) w/vdsm-4.16.8.1-5
How reproducible:
Every time.
Steps to Reproduce:
1. Create a VM with 2 disks.
2. Start an export.
3. Depending upon the time of disk image (raw or qcow2) and upon whether 'Collapse Snapshots' was selected or not, kill one of the 'dd' or 'qemu-img convert' processes.
4. The export will fail.
5. Check the export domain's 'VM Import' tab in the GUI. The VM will not be displayed.
6. Check the export domain. One disk will physically exist there.
7. Retry the export.
8. It should fail with "volume already exists".
Actual results:
When one disk of an export fails to be copied, other disks are left on the export domain.
Expected results:
When one disk of an export fails to be copied, either all of the successfully copied disks should be removed from the export domain as part of a rollback sequence or the user should be notified that they exist there and to use 'Force Override' when retrying.
Additional info:
(In reply to Liron Aravot from comment #3)
> this wasn't resolved for export of vm (just looked over the code). therefore
> this BZ is relevant.
The real problem here is that we don't have a mechanism to track the delete tasks we would like to start once we determine a failure.
Rewriting this command in a non-SPM infrastructure should provide a plausible solution for this. Blocking on bug 1185830.