Bug 1975225
Summary: | Occasional failures to export VM to OVA | ||
---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Arik <ahadas> |
Component: | BLL.Virt | Assignee: | Liran Rotenberg <lrotenbe> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Qin Yuan <qiyuan> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.4.7 | CC: | bugs, dfodor, fjanuska |
Target Milestone: | ovirt-4.4.8 | Keywords: | Reopened |
Target Release: | --- | Flags: | pm-rhel:
ovirt-4.4+
|
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-4.4.8 | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-08-19 06:23:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Arik
2021-06-23 09:49:33 UTC
Filip, can you please also check how come that OST didn't fail on that and tried to import that OVA? The way OST checks the result of this export doesn't seem very reliable. During the export a temporal vm snapshot is created and then deleted after the export is complete. The OST only checks if this snapshot is not present, which it won't be whether the export fails or succeeds. Perhaps we should check for the actual .ova file on the host? (In reply to Filip Januška from comment #5) > The way OST checks the result of this export doesn't seem very reliable. > During the export a temporal vm snapshot is created and then deleted after > the export is complete. The OST only checks if this snapshot is not present, > which it won't be whether the export fails or succeeds. Perhaps we should > check for the actual .ova file on the host? Ah yes, that takes me back to the time it was added.. The rational was not to make any "heavy" call but just to check if the snapshot is still there in order to determine whether the export command is still executed. Then, when we know the export command (that is executed asynchronously) is completed, we can check whether it succeeded or not I wouldn't change the way we check if the export command is completed or not, and I wouldn't add another check for the existence of the OVA file (because that's what the import command does at the beginning) but I think we should check if the export command succeeded or not by its execution job If we're able to say "export command ended with failure" - it would be less confusing (initially I suspected that another job removed the ova before we got to the import phase) still executed -> still executing This one is very difficult to reproduce, only regression testing by automation is required Verified with: ovirt-engine-4.4.7.6-0.11.el8ev.noarch There is no regression found in automation tests regarding to exporting VM to OVA. This bugzilla is included in oVirt 4.4.7 release, published on July 6th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. Happened again with the fix: converting disk: /rhev/data-center/mnt/192.168.202.2:_exports_nfs_share2/49b69e88-f7a9-4daa-8add-69c07629c465/images/699a4e21-8eb5-4d54-a21b-fa1bb877bca9/7b1674d2-185e-4c2a-aeb7-1ceb4cc1c8d6, offset 19968 losetup: /var/tmp/ova_vm.ova.tmp: failed to set up loop device: Resource temporarily unavailable And in 'messages': Jul 7 11:31:06 lago-basic-suite-master-host-0 kernel: loop: module loaded Jul 7 11:31:06 lago-basic-suite-master-host-0 kernel: loop0: detected capacity change from 0 to 414208 Jul 7 11:31:06 lago-basic-suite-master-host-0 kernel: loop_set_status: loop0 () has still dirty pages (nrpages=1) Seems that others have faced this issue as well: https://www.spinics.net/lists/kernel/msg3975499.html Their proposed fix: https://www.spinics.net/lists/kernel/msg3977449.html If we can't ensure that there are no dirty pages at the time we setup the loop back device, we can take the same approach (since create-ova needs to work on clusters that won't get the fix [1]) at pack_ova.py by identifying if the error is "Resource temporarily unavailable" (which is EAGAIN) and retry 64 times. Need to think of how to avoid iterating 64 times when losetup would also iterate 64 times internally though. [1] https://github.com/karelzak/util-linux/commit/3e03cb680668e4d47286bc7e6ab43e47bb84c989 (In reply to Arik from comment #13) > If we can't ensure that there are no dirty pages at the time we setup the > loop back device, we can take the same approach (since create-ova needs to > work on clusters that won't get the fix [1]) at pack_ova.py by identifying > if the error is "Resource temporarily unavailable" (which is EAGAIN) and > retry 64 times. Need to think of how to avoid iterating 64 times when > losetup would also iterate 64 times internally though. Of course it doesn't have to be 64 times.. we can check it after a second or few seconds several times Verified with: ovirt-engine-4.4.8.3-0.10.el8ev.noarch No regression issues were found. This bugzilla is included in oVirt 4.4.8 release, published on August 19th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.8 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |