Description of problem: After a reboot of the RHV-M machine and while ovirt-engine was still initializing and connecting to the hosts, a new image transfer was requested and failed with a java.lang.NullPointerException This caused the image transfer command to be stuck in phase 0 in the image_transfers table and the disk locked. Version-Release number of selected component (if applicable): ovirt-engine-4.5.3.7-1.el8ev.noarch How reproducible: Happened several times in customer environment. Not reproduced locally Steps to Reproduce: I guest the issue will be easier to reproduce with a large number of hosts 1. Reboot RHV-M 2. Right after ovirt-engine starts, and while not all hosts are still connected and Up, create a image transfer Actual results: ~~~ 2023-05-11 03:21:42,550+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-4) [fd0145f9-2b02-4369-a2fe-711a77da1681] Starting image transfer: ImageTransfer:{id='9ef9defd-c9fc-483f-829d-7810f38b8cf6', phase='Initializing', type='Download', active='false', lastUpdated='Thu May 11 03:21:41 CEST 2023', message='null', vdsId='null', diskId='null', imagedTicketId='null', proxyUri='null', bytesSent='null', bytesTotal='2281701376', clientInactivityTimeout='3600', timeoutPolicy='legacy', imageFormat='COW', transferClientType='Transfer via API', shallow='false'} 2023-05-11 03:21:42,561+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (default task-4) [fd0145f9-2b02-4369-a2fe-711a77da1681] START, GetVolumeInfoVDSCommand(HostName = host1.example.com, GetVolumeInfoVDSCommandParameters:{hostId='09b256a2-8a6b-4fe0-b825-9b9248ba1c6d', storagePoolId='268c347f-d249-411f-a806-9ec0f50eb418', storageDomainId='4484db8c-4e14-4f86-852b-b2cffe079c02', imageGroupId='7ce86012-dce2-4ca7-9dfb-e93c33f688be', imageId='d1298146-e125-48d5-be67-f62257a3d001'}), log id: 2284cc75 2023-05-11 03:21:42,601+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (default task-4) [fd0145f9-2b02-4369-a2fe-711a77da1681] FINISH, GetVolumeInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@84fc36da, log id: 2284cc75 2023-05-11 03:21:42,637+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-4) [fd0145f9-2b02-4369-a2fe-711a77da1681] Successfully added Download disk 'foo.example.com_Disk1' (disk id: '7ce86012-dce2-4ca7-9dfb-e93c33f688be', image id: 'd1298146-e125-48d5-be67-f62257a3d001') for image transfer '9ef9defd-c9fc-483f-829d-7810f38b8cf6' 2023-05-11 03:21:42,692+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-4) [fd0145f9-2b02-4369-a2fe-711a77da1681] Command 'org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand' failed: null 2023-05-11 03:21:42,692+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-4) [fd0145f9-2b02-4369-a2fe-711a77da1681] Exception: java.lang.NullPointerException at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand.lambda$checkForActiveVds$2(TransferDiskImageCommand.java:1141) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) [...] 2023-05-11 03:21:42,758+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [fd0145f9-2b02-4369-a2fe-711a77da1681] EVENT_ID: TRANSFER_IMAGE_INITIATED(1,031), Image Download with disk foo.example.com_Disk1 was initiated by admin@internal-authz. 2023-05-11 03:21:42,773+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-4) [fd0145f9-2b02-4369-a2fe-711a77da1681] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='[7ce86012-dce2-4ca7-9dfb-e93c33f688be=DISK, efe27b6c-115e-4cc9-bfbe-a3732258061e=VM]'}' ~~~ Image transfer present in the database: engine=> select command_id, disk_id, imaged_ticket_id, phase, last_updated, proxy_uri, bytes_sent, bytes_total from image_transfers; command_id | disk_id | imaged_ticket_id | phase | last_updated | proxy_uri | bytes_sent | bytes_total --------------------------------------+--------------------------------------+------------------+-------+----------------------------+-----------+------------+------------- 9ef9defd-c9fc-483f-829d-7810f38b8cf6 | 7ce86012-dce2-4ca7-9dfb-e93c33f688be | | 1 | 2023-05-11 01:21:42.639+00 | | 0 | 2281701376 Volume locked (imagestatus=2) Expected results: NullPointerException handled gracefully and transfer cancelled. Additional info:
Verified After discussing with Arik the verification covered by Automation Regression tests (tier1, tier2, and tier3). Versions: ovirt-engine-4.5.3.9-0.zstream.20230605102913.git3bfac2e0eac.el8.noarch vdsm-4.50.3.8-1.el8ev.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Virtualization security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3771