Description of problem: Disk uploads can fail due to uploading malformed disks. The problem is these failed uploads leave the image locked instead of cleaning it up. The customer cannot cleanup by himself and requires GSS to manually intervene. Would be nice to gracefully handle these problems. Version-Release number of selected component (if applicable): rhvm-4.3.5.1-0.1.el7.noarch (reproduced) rhvm-4.3.3.7-0.1.el7.noarch (customer) How reproducible: 100% Steps to Reproduce: 1. Create qcow2 with disk size larger than virtual size $ qemu-img create -f qcow2 test.qcow2 10G $ truncate -s 20G test.qcow2 2. Upload via Admin Portal Actual results: Disk locked on failure Expected results: Disk unlocked on failure, user able to cleanup by himself. Additional info: 1. For the case described in "Steps to Reproduce", createVolume fails with: 2019-07-12 14:08:36,943+1000 ERROR (tasks/1) [storage.Volume] Unexpected error (volume:1267) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1224, in create initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 498, in _create size, initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 543, in calculate_volume_alloc_size preallocate, capacity, initial_size) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 344, in calculate_volume_alloc_size initial_size) 2. Engine sees the failure to create the volume, but does not end the failure gracefully. 2019-07-12 14:08:51,627+10 WARN [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [2b1cdfc] Validation of action 'RemoveDisk' failed for user admin@internal-authz. Reasons: VAR__ACTION__REMOVE,VAR__TYPE__DISK,ACTION_TYPE_FAILED_VM_IMAGE_DOES_NOT_EXIST 2019-07-12 14:08:51,629+10 INFO [org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [2b1cdfc] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='[]'}' 2019-07-12 14:08:51,629+10 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [2b1cdfc] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='[]'}' 2019-07-12 14:08:51,674+10 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [2b1cdfc] Error during log command: org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand. Exception null
sync2jira
Created attachment 1621820 [details] upload log
Moving back to 'Assigned' as this bug reproduce, see last comment by Yosi.
Hi After trying to reproduce the bug in several ways, one way is using the steps provided in the bug description, I see that there is no trace for the result from the engine side. the upload works as expected. I tried to make a failed upload but, the ways I tried, the engine cleaned them up. I remember that I faced the manual cleanup of uploaded images but at that time I was working on another bug and imageio was in the upgrading process. please verify that it still affects the users and if there are other reproducing steps.
(In reply to Ahmad Khiet from comment #15) > Hi > > After trying to reproduce the bug in several ways, one way is using the > steps provided in the bug description, I see that there is no trace for the > result from the engine side. > the upload works as expected. > I tried to make a failed upload but, the ways I tried, the engine cleaned > them up. > > I remember that I faced the manual cleanup of uploaded images but at that > time I was working on another bug and imageio was in the upgrading process. > please verify that it still affects the users and if there are other > reproducing steps. Ilan can you see if this reproduce(use Yossi's Comment 10 as he managed to reproduce it a while back This bug was opened by Germano not QE so QE do not know if this still affects users/customers. This is a Q for GSS focal Germano, putting NeedInfo on him.
I was able to reproduce this issue on: ovirt-engine-4.4.4.4-0.9.el8ev.noarch rhv-release-4.4.4-5-001.noarch Steps as described in comment #10 with same kind of errors in engine and vdsm logs. Attaching the logs.
After testing on master I can't reproduce the bug, maybe there is another missing info on why it reproduces with Ilan. I tried with different image sizes but not 20G. following Ilan logs, I see that there is an error in AddDiskCommandCallback which is affected by vdsm error raised at: https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/blockVolume.py#L320 with the following error log.error("The requested initial %s is bigger " "than the max size %s", initial_size, max_size) which is reflected back to the engine. there is another similar bug which solved by Benny https://bugzilla.redhat.com/show_bug.cgi?id=1758048 Ilan, did you upload to file storage domain or block?
I uploaded the disk to a block storage domain.
now it reproduced on iscsi there are two separate issues here: one is solved by benny https://gerrit.ovirt.org/c/ovirt-engine/+/111196 is about the exceptions in the logs. the 2nd one about the image size which is the subject of this bug.
Trying to verify on rhv-4.4.5-7 according steps from comment #10 After a failed attempt to upload the disk using UI, it remains in the 'disks' section with 'illegal' status. Is that an expected result? From engine.log: 2021-03-02 14:22:05,174+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-3) [] BaseAsyncTask::logEndTaskFailure: Task 'ebc26725-f650-4b43-96df-416384c89fa5' (Parent Command 'AddImageFromScratch', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'cleanSuccess' -- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = value=Error creating a new volume: ("Volume creation 336e6d35-49aa-476d-ac3b-c9bd26194a9e failed: Invalid parameter: 'initial_size=21474836480'",) abortedcode=205, code = 205', -- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = value=Error creating a new volume: ("Volume creation 336e6d35-49aa-476d-ac3b-c9bd26194a9e failed: Invalid parameter: 'initial_size=21474836480'",) abortedcode=205, code = 205' . . . 2021-03-02 14:22:09,861+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [] EVENT_ID: USER_ADD_DISK_FINISHED_FAILURE(2,022), Add-Disk operation failed to complete. 2021-03-02 14:22:09,896+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [f6aaeaff-e4b4-4d3d-bac6-e2c4bffed7b1] Updating image transfer 6a3b8980-c0d5-4a19-b09c-96e3904b397f phase to Finished Failure Now, attempting to remove the disk from the UI: 2021-03-02 14:28:58,365+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-37485) [8475c7d8-4584-4862-9e90-2322b8473bcd] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command DeleteImageGroupVDS failed: Image does not exist in domain: 'image=df7c7581-bbcb-4eac-b3c4-489bac0387fa, domain=9db95765-0fb7-485e-91f2-381354a66d13' . . . 2021-03-02 14:28:58,639+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-37485) [8475c7d8-4584-4862-9e90-2322b8473bcd] EVENT_ID: USER_FINISHED_REMOVE_DISK(2,014), Disk test was successfully removed from domain iscsi_0 (User admin@internal-authz). Result: The disk removed from the UI
Hi, that is the expected behavior, otherwise, the image will still 'locked' and can't be removed.
(In reply to Ahmad Khiet from comment #28) > Hi, > > that is the expected behavior, otherwise, the image will still 'locked' and > can't be removed. Good, switching to 'verified' according comment #27
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) 4.4.z [ovirt-4.4.5] security, bug fix, enhancement), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1169