Description of problem: Exporting a vDisk as OVA on RHV which times out after the default 30 mins will leave an orphan logical volume consuming space in the storage domain. TeardownImageVDSCommand and DeleteImageGroupVDSCommand fails because the volume was still `in use` Version-Release number of selected component (if applicable): ovirt-engine-4.2.7.5-0.1.el7ev.noarch vdsm-4.20.43-1.el7ev.x86_64 How reproducible: When ansible times out exporting/importing OVA image Steps to Reproduce: 1. Export a VM with a big disk as an OVA image, any size that exceeds the default 30 mins ansible time out would do the trick 2. Task will fail and attempting to tear down the image and remove it will also fail due to volume still in use 3. Volume will remain in the storage domain with the remove_me tag, orphan Actual results: Volume remains in storage domain with out use Expected results: Volume should be removed Additional info: Will attach engine.log, vdsm.lo and ansible.log
Arik, can you please have a look?
Javier, do we know if the VM was running? It's not clear from the logs (doesn't start early enough)
So, we can increase the timeout here, but it's not a guarantee that the guest disks are unlocked. Tal - since the VM is down, we'll need help from storage to investigate why the LVs were locked. Any ideas?
(In reply to Ryan Barry from comment #8) > So, we can increase the timeout here, but it's not a guarantee that the > guest disks are unlocked. Tal - since the VM is down, we'll need help from > storage to investigate why the LVs were locked. Any ideas? I suppose that unlocking the LV failed because qemu-img process was still running and using the disk. Only the SSH connection was closed by timeout.
I thought this also, but comment#6 looks like it's down (I'm waiting to get the rest of the engine log to see if something else brought it back up)
(In reply to Ryan Barry from comment #10) > I thought this also, but comment#6 looks like it's down (I'm waiting to get > the rest of the engine log to see if something else brought it back up) 2019-02-08 - it is a week before the issue. Why we need to take this into account? The engine.log in attachment 1539639 [details] starts from 2019-02-09 and the Engine is running at that moment.
That's why you're on the bug ;) Because I didn't find entries in the current engine log to indicate whether the VM which was being exported was up or down. If it was down a week beforehand, the qemu-img process shouldn't still be running and using the disk unless I've missed something in the logs
(In reply to Ryan Barry from comment #12) > Because I didn't find entries in the current engine log to indicate whether > the VM which was being exported was up or down. If it was down a week > beforehand, the qemu-img process shouldn't still be running and using the > disk unless I've missed something in the logs The qemu-img process is not related to the VM running. During the export, pack_ova.py script executes qemu-img to convert disks.
Shmuel, what are the next steps?
I am currently looking for an easy way to populate SSH channel with some traffic during the conversion process and verifying that it does not break the functionality. I'll post the patch soon.
Moving to Virt since you are looking into the solution
Since we have a patch, can you please target this bug? 4.4? 4.3.5?
sync2jira
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops
Verified: ovirt-engine-4.4.1.1-0.5.el8ev.noarch vdsm-4.40.17-1.el8ev.x86_64 libvirt-daemon-6.0.0-22.module+el8.2.1+6815+1c792dc8.x86_64 qemu-kvm-4.2.0-22.module+el8.2.1+6758+cb8d64c2.x86_64 Verification scenario: 1. Create VM with 200GB preallocated NFS disk. Install latest RHEL8 OS on it and verify OS is running properly. 2. Export OVA to host NFS mount (use NFS in order to make export time to take longer). Verify OVA exported successfully and took more than 30 minutes (it took 43:51 minutes to export it). 3. Import exported OVA (Import time was 2:26 minutes) 4. Run imported VM. Verify OS is running properly and disk size is 200GB with thin provision allocation policy.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3246