Created attachment 1852800 [details] engine.log Description of problem: After failed OVA import we cannot place any of the hosts to maintenance mode. Getting error "Error while executing action: General command validation failure." All hosts/engine was rebooted. There is no active task/image transfer. Version-Release number of selected component (if applicable): ovirt 4.4.10.5-1.el8 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Error while executing action: General command validation failure. Expected results: Additional info:
Hi, Can you share the output of the following query $ select * from image_transfers
Created attachment 1853123 [details] image_transfers
(In reply to tm-pi from comment #3) > Created attachment 1853123 [details] > image_transfers I see all transfers are on phase 7, which is FINALIZING_SUCCESS and I see some transfers are fairly recent, can you share the engine log containing the entire run of such?
Created attachment 1853313 [details] engine.log-20220122.gz
Created attachment 1853314 [details] engine.log-20220123.gz
thanks, I see a lot of: 2022-01-21 03:21:05,484+01 INFO [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) [a25f4e14-ef1e-425e-88ec-bdddacc306c9] Not invoking command's TransferDiskImage (1687a3da-0a4e-4114-8aee-8ff704f9f374) doPolling method callback's pollOnExecutionFailed is false. 2022-01-21 03:21:15,492+01 INFO [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-46) [a25f4e14-ef1e-425e-88ec-bdddacc306c9] Not invoking command's TransferDiskImage (1687a3da-0a4e-4114-8aee-8ff704f9f374) doPolling method callback's pollOnExecutionFailed is false. 2022-01-21 03:21:25,499+01 INFO [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [a25f4e14-ef1e-425e-88ec-bdddacc306c9] Not invoking command's TransferDiskImage (1687a3da-0a4e-4114-8aee-8ff704f9f374) doPolling method callback's pollOnExecutionFailed is false. 2022-01-21 03:21:35,505+01 INFO [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-45) [a25f4e14-ef1e-425e-88ec-bdddacc306c9] Not invoking command's TransferDiskImage (1687a3da-0a4e-4114-8aee-8ff704f9f374) doPolling method callback's pollOnExecutionFailed is false. 2022-01-21 03:21:45,511+01 INFO [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-18) [a25f4e14-ef1e-425e-88ec-bdddacc306c9] Not invoking command's TransferDiskImage (1687a3da-0a4e-4114-8aee-8ff704f9f374) doPolling method callback's pollOnExecutionFailed is false. but the logs aren't old enough so I can't see exactly what failed, can you provide additional older logs?
Created attachment 1853356 [details] engine.log-20211220 engine.log-20211220 where was found the first occurence of "a25f4e14-ef1e-425e-88ec-bdddacc306c9"
Thanks, I think I understand what happens: 1. The download, downloads a snapshot, so the disk_id field in image_transfers is empty, this is what causes the NPE 2. Another thing is that downloads are stuck on phase 7, I can't see the failures for all of them, but based on the one in the logs 2021-12-19 14:49:17,384+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-488) [a25f4e14-ef1e-425e-88ec-bdddacc306c9] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM node-ib2-bl7.vdi.in.cz.net command GetVolumeInfoVDS failed: Resource timeout: () 2021-12-19 14:49:17,384+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (default task-488) [a25f4e14-ef1e-425e-88ec-bdddacc306c9] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand' return value ' VolumeInfoReturn:{status='Status [code=851, message=Resource timeout: ()]'}' TransferImage failed during the execution phase, likely because of a storage issue. This causes the callback to be stuck, hence all the "Not invoking..." messages. However, in this case the transfer would be stuck on "Initializing", so there is probably another call imagetransfers/<id>/finalize which sets it to FINALIZING_SUCCESS. I'll post fixes soon, in the meanwhile, can you share how did you download the snapshots? is this with a custom script or a backup vendor software? To workaround the issue you can remove the inactive entries from the image_transfers table As for the OVA import, the issue is: 2022-01-22 03:38:52,976+01 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [] Unexpected return value: Status [code=-32603, message=Internal JSON-RPC error: {'reason': "'str' object has no attribute 'decode'"}] Which has been fixed in https://bugzilla.redhat.com/show_bug.cgi?id=2026809
These snapshots were made by Veritas Netbackup during backup. Quite unusable software for Ovirt backup, full of bugs - in my opinion. OVA import - I upgraded ovirt to latest version, I think it helped. The only problem is that something is blocking to switch host to maintenance.
You mentioned workarround "remove the inactive entries from the image_transfers". Is it safe to easily delete all entries from this table? There are no other record dependencies?
(In reply to tm-pi from comment #11) > You mentioned workarround "remove the inactive entries from the > image_transfers". > Is it safe to easily delete all entries from this table? There are no other > record dependencies? If these transfers are inactive it is safe (of course make sure there is a backup in place)
Thank you. Workarround helped. Now we can put node into the maintenance.
*** Bug 2070491 has been marked as a duplicate of this bug. ***
Fixed in 4.5.0 (ovirt-engine-4.5.0.1)
Another consequence of this situation (having multiple image transfers set with phase FINALIZING_SUCCESS) is described in bz 2083652