Bug 1923178
| Summary: | Can not download VM disks due to 'Cannot transfer Virtual Disk: Disk is locked' | |||
|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Ilan Zuckerman <izuckerm> | |
| Component: | BLL.Storage | Assignee: | Pavel Bar <pbar> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Amit Sharir <asharir> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.4.5.3 | CC: | aefrat, ahadas, asharir, bugs, bzlotnik, dfodor, eshames, eshenitz, jean-louis, lsvaty, nsoffer, pbar, sfishbai, tnisan, Yury.Panchenko | |
| Target Milestone: | ovirt-4.4.9 | Keywords: | Automation, Reopened, ZStream | |
| Target Release: | --- | Flags: | pm-rhel:
ovirt-4.4+
eshenitz: blocker+ asharir: testing_plan_complete+ |
|
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2014017 (view as bug list) | Environment: | ||
| Last Closed: | 2021-10-21 07:27:14 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1976020 | |||
| Bug Blocks: | 1883949, 1984308, 1994001, 2014017 | |||
|
Comment 1
Nir Soffer
2021-02-01 14:43:47 UTC
*** This bug has been marked as a duplicate of bug 1849861 *** This blocks v2v bug 1976020 In this comment we can see that image transfer switched state to FINISHED 9 seconds before the lock was released. https://bugzilla.redhat.com/show_bug.cgi?id=1976020#c7 *** Bug 2001894 has been marked as a duplicate of this bug. *** Hi, I tried to reproduce bug 1994001 on RHEV 4.4.9-3 and encountered a similar error to what was seen in this bug. During my reproduction scenario, I ran QE's automation test case "TestUploadImages" which uses upload_disk.py and checksum_disk.py SDK's. After the upload, the flow does a checksum of the disk. In my tests, I got the following errors: 2021-10-05 15:37:13,716 - ThreadPoolExecutor-2_1 - VDS - ERROR - [10.46.12.145] Failed to run command ['python3', '/usr/share/doc/python3-ovirt-engine-sdk4/examples/checksum_disk.py', '-c', 'engine', 'bc443017-6542-46b7-8e3c-61c277f68f44'] ERR: Traceback (most recent call last): File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/checksum_disk.py", line 50, in <module> connection, disk, types.ImageTransferDirection.DOWNLOAD) File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 200, in create_transfer transfer = transfers_service.add(transfer) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 14153, in add return self._internal_add(image_transfer, headers, query, wait) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 232, in _internal_add return future.wait() if wait else future File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in wait return self._code(response) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 229, in callback self._check_fault(response) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in _check_fault self._raise_error(response, body) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in _raise_error raise error ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot transfer Virtual Disk: Disk is locked. Please try again later.]". HTTP response code is 409. OUT: What is weird is that the test fails on types.ImageTransferDirection.DOWNLOAD although the test doesn't perform a download operation at all. Attaching relevant logs. Pavel can you please advise on this? (In reply to Amit Sharir from comment #13) > Hi, > > I tried to reproduce bug 1994001 on RHEV 4.4.9-3 and encountered a similar > error to what was seen in this bug. > During my reproduction scenario, I ran QE's automation test case > "TestUploadImages" which uses upload_disk.py and checksum_disk.py SDK's. > After the upload, the flow does a checksum of the disk. > In my tests, I got the following errors: > > 2021-10-05 15:37:13,716 - ThreadPoolExecutor-2_1 - VDS - ERROR - > [10.46.12.145] Failed to run command ['python3', > '/usr/share/doc/python3-ovirt-engine-sdk4/examples/checksum_disk.py', '-c', > 'engine', 'bc443017-6542-46b7-8e3c-61c277f68f44'] ERR: Traceback (most > recent call last): > File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/checksum_disk.py", > line 50, in <module> > connection, disk, types.ImageTransferDirection.DOWNLOAD) > File > "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", > line 200, in create_transfer > transfer = transfers_service.add(transfer) > File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line > 14153, in add > return self._internal_add(image_transfer, headers, query, wait) > File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 232, > in _internal_add > return future.wait() if wait else future > File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, > in wait > return self._code(response) > File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 229, > in callback > self._check_fault(response) > File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, > in _check_fault > self._raise_error(response, body) > File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, > in _raise_error > raise error > ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is > "[Cannot transfer Virtual Disk: Disk is locked. Please try again later.]". > HTTP response code is 409. > OUT: > > > > > What is weird is that the test fails on > types.ImageTransferDirection.DOWNLOAD although the test doesn't perform a > download operation at all. > Attaching relevant logs. > > Pavel can you please advise on this? The error that you encountered happens for the same error as this bug (1923178). After image upload, "imagetransfer.finalize_transfer()" checks for "ImageTransferPhase.FINISHED_SUCCESS/FINISHED_FAILURE" and finishes the moment one of the final statuses appears, allowing next command to start running. But the final status appears too soon, before all the disks locks were released. Thus the following image upload / download operation might fail. Regarding your checksum issue and your surprise to see the seemingly not related "ImageTransferDirection.DOWNLOAD" there - the checksum is actually a 2-steps operation: (1) download image & (2) calculate checksum of the image downloaded at step #1. https://github.com/oVirt/python-ovirt-engine-sdk4/blob/main/examples/checksum_disk.py#L50 So actually it's upload+download scenario that fails due to "locks not yet released although 1st command status already updated as finished" issue. As this bug was marked as a duplicate of bug 2001894 we/QE need to make sure we also cover the dup bug flow. @Amit please add this bug 2001894 flow to the verification effort: From the dup bug description: " We have a script that does the following flow: 1. Create snapshot 2. Download VM at the time of that snapshot 3. Delete snapshot sometimes a failure when removing the snapshot " Version: ovirt-engine-4.4.9.2-0.6.el8ev.noarch vdsm-4.40.90.2-1.el8ev.x86_64 Verification flow: I Used multiple automation/manual tests (with different storage types) that reproduced this issue in the past - without any success in reproduction. Verification Conclusions: The expected output matched the actual output. All the flows I mentioned were completed with no errors. Bug verified. This bugzilla is included in oVirt 4.4.9 release, published on October 20th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.9 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |