Description of problem: An Export to OVA operation was performed. A new volume per disk was created and the snapshot function was executed, albeit a day later, but then the operation failed on the engine and was rolled back. The rollback/reversion sequence removed the new volumes. However, these volumes were currently in use as the active volumes for the 'qemu-kvm' process. Thus when the VM was later restarted, it started up on the parent volumes as the active volumes, and all of the data that had been written to the volumes that had been removed was lost. Version-Release number of selected component (if applicable): RHV 4.3.4 RHVH 4.3-0.8; libvirt-4.5.0-10.el7_6.10.x86_64 qemu-kvm-rhev-2.12.0-18.el7_6.5.x86_64 vdsm-4.30.17-1.el7ev.x86_6 How reproducible: No, at least not yet. Steps to Reproduce: 1. 2. 3. Actual results: Active volumes were removed while in use by the VM, instead of a Live Merge being issued, resulting in loss of data when VM was later restarted. Expected results: A Live Merge should should have been performed. Even no rollback of the operation would have been better than what happened. Additional info:
ExportVmToOva creates a snapshot. It seems the new volumes were created successfully and thus SnapshotVDSCommand was called: 2020-09-22 08:20:00,088+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-84) [2c2b4425-3c9f-44e7-9bab-d5c864f319b2] START, SnapshotVDSCommand 2020-09-22 08:20:02,354+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-84) [2c2b4425-3c9f-44e7-9bab-d5c864f319b2] FINISH, SnapshotVDSCommand It succeeded - an indication for that is the call to DumpXmls that returns a modified domain XML and therefore afterwards the VM started using the new volumes. But create-snapshot detected as a failed task later on (probably expired - I see other commands that expired in about that time) so probably endVmCommand of CreateSnapshotForVm was called and decided to end the actions on the disks with failure - and what it does is to remove the created volumes. So in that very unlikely situation where CreateSnapshotForVm failed after calling SnapshotVDSCommand (note that it's called differently in 4.4), we should not remove the created volumes on failure (we can probably know that using the phase the command reached to). Also note that the likelihood of this to happen probably reduced significantly in 4.4 where our virt ansible tasks no longer block CoCo from monitoring other commands. And yes, that may happen also in clone VM I supposed since it also calls create-snapshot.
Hi, As Arik said, it can happen on every flow we make snapshot and on snapshot operation as a stand alone. We had many bugs in that area and I think we solved all of them in 4.4. For this case, the CreateSnapshotCommand failure leads to unwanted cleanup of the volume. It was fixed, and this fix had a backport to 4.3. The bug report is using RHV 4.3.4 while the fix is in RHV 4.3.6. The engine checks if the volume is in useage by the VM and skip it's deletion if it does. Therefore I'm closing as a duplicate. *** This bug has been marked as a duplicate of bug 1746730 ***