From time to time exporting a VM to an OVA fails. It seems that something happens right after getting to the step of invoking the packing script (pack_ova.py) that makes the engine think the operation failed. It doesn't seem to be a problem on with the host though since an attempt to export a template to OVA on that host succeeds right after.
Filip, can you please also check how come that OST didn't fail on that and tried to import that OVA?
The way OST checks the result of this export doesn't seem very reliable. During the export a temporal vm snapshot is created and then deleted after the export is complete. The OST only checks if this snapshot is not present, which it won't be whether the export fails or succeeds. Perhaps we should check for the actual .ova file on the host?
(In reply to Filip Januška from comment #5) > The way OST checks the result of this export doesn't seem very reliable. > During the export a temporal vm snapshot is created and then deleted after > the export is complete. The OST only checks if this snapshot is not present, > which it won't be whether the export fails or succeeds. Perhaps we should > check for the actual .ova file on the host? Ah yes, that takes me back to the time it was added.. The rational was not to make any "heavy" call but just to check if the snapshot is still there in order to determine whether the export command is still executed. Then, when we know the export command (that is executed asynchronously) is completed, we can check whether it succeeded or not I wouldn't change the way we check if the export command is completed or not, and I wouldn't add another check for the existence of the OVA file (because that's what the import command does at the beginning) but I think we should check if the export command succeeded or not by its execution job If we're able to say "export command ended with failure" - it would be less confusing (initially I suspected that another job removed the ova before we got to the import phase)
still executed -> still executing
This one is very difficult to reproduce, only regression testing by automation is required
Verified with: ovirt-engine-4.4.7.6-0.11.el8ev.noarch There is no regression found in automation tests regarding to exporting VM to OVA.
This bugzilla is included in oVirt 4.4.7 release, published on July 6th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
Happened again with the fix: converting disk: /rhev/data-center/mnt/192.168.202.2:_exports_nfs_share2/49b69e88-f7a9-4daa-8add-69c07629c465/images/699a4e21-8eb5-4d54-a21b-fa1bb877bca9/7b1674d2-185e-4c2a-aeb7-1ceb4cc1c8d6, offset 19968 losetup: /var/tmp/ova_vm.ova.tmp: failed to set up loop device: Resource temporarily unavailable And in 'messages': Jul 7 11:31:06 lago-basic-suite-master-host-0 kernel: loop: module loaded Jul 7 11:31:06 lago-basic-suite-master-host-0 kernel: loop0: detected capacity change from 0 to 414208 Jul 7 11:31:06 lago-basic-suite-master-host-0 kernel: loop_set_status: loop0 () has still dirty pages (nrpages=1) Seems that others have faced this issue as well: https://www.spinics.net/lists/kernel/msg3975499.html Their proposed fix: https://www.spinics.net/lists/kernel/msg3977449.html If we can't ensure that there are no dirty pages at the time we setup the loop back device, we can take the same approach (since create-ova needs to work on clusters that won't get the fix [1]) at pack_ova.py by identifying if the error is "Resource temporarily unavailable" (which is EAGAIN) and retry 64 times. Need to think of how to avoid iterating 64 times when losetup would also iterate 64 times internally though. [1] https://github.com/karelzak/util-linux/commit/3e03cb680668e4d47286bc7e6ab43e47bb84c989
(In reply to Arik from comment #13) > If we can't ensure that there are no dirty pages at the time we setup the > loop back device, we can take the same approach (since create-ova needs to > work on clusters that won't get the fix [1]) at pack_ova.py by identifying > if the error is "Resource temporarily unavailable" (which is EAGAIN) and > retry 64 times. Need to think of how to avoid iterating 64 times when > losetup would also iterate 64 times internally though. Of course it doesn't have to be 64 times.. we can check it after a second or few seconds several times
Verified with: ovirt-engine-4.4.8.3-0.10.el8ev.noarch No regression issues were found.
This bugzilla is included in oVirt 4.4.8 release, published on August 19th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.8 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.