Created attachment 1836585 [details] logs Description of problem: During the verification of bug 2015121 new issue was found: During Full backup, finalize backup fails while writing to disk with requested extension. Versions: ovirt-engine-4.4.9.3-0.3.el8ev.noarch vdsm-4.40.90.3-1.el8ev.x86_64 libvirt-7.6.0-5.module+el8.5.0+12933+58cb48a1 Steps to reproduce: 1. Start a VM with 20g thin disk (enable incremental) 2. In the VM, starts dd process, writing 20g to the think disk # dd if=/dev/zero bs=1M count=20480 of=/dev/sda oflag=direct conv=fsync status=progress 3. during the writing started backup ./backup_vm.py -c engine full dc411543-5a9c-440e-9567-12b83214ea28 Expected results: finalize backup should end up successfully. Actual results: ./backup_vm.py -c engine full b5621374-29b7-4d67-8928-9a8237699648 [ 0.0 ] Starting full backup for VM 'b5621374-29b7-4d67-8928-9a8237699648' [ 1.7 ] Waiting until backup 'b7135713-df89-4325-8438-ec2e6bee7ec4' is ready [ 29.3 ] Created checkpoint 'cc2580d6-b905-47e4-8b68-f202ba2e9472' (to use in --from-checkpoint-uuid for the next incremental backup) [ 29.4 ] Downloading full backup for disk '684cb233-c78c-4225-8356-1f106719694c' [ 30.7 ] Finalizing backup Traceback (most recent call last): File "./backup_vm.py", line 525, in <module> main() File "./backup_vm.py", line 191, in main args.command(args) File "./backup_vm.py", line 207, in cmd_full download_backup(connection, backup.id, args) File "./backup_vm.py", line 407, in download_backup incremental=has_incremental) File "./backup_vm.py", line 425, in download_disk timeout_policy=types.ImageTransferTimeoutPolicy(args.timeout_policy)) File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 235, in create_transfer .format(transfer.id, transfer.phase)) RuntimeError: Unexpected transfer 284a6ee2-0c19-4a0a-8799-a23f1564c9e0 phase finalizing_failure
The reproduction of this issue is not so clear, I assume that it might reproduce if check the "enable incremental backup" for the disk created. I could reproduce it 3 times when the "enable incremental backup" was marked. When I disabled it, it didn't happen (not sure this is the trigger to it).
Reproduces also in 4.4.8: # ./backup_vm.py -c engine full 36653930-1a0a-4ecf-97af-da3f36c00e8d [ 0.0 ] Starting full backup for VM '36653930-1a0a-4ecf-97af-da3f36c00e8d' [ 1.3 ] Waiting until backup '71929277-32e3-469f-a111-f3545b3218ca' is ready [ 25.8 ] Created checkpoint '844c83fe-c987-4787-8e0b-41c593ed1355' (to use in --from-checkpoint-uuid for the next incremental backup) [ 25.8 ] Downloading full backup for disk '53c73291-ff86-43c9-8d0a-df6690630dd6' [ 27.3 ] Finalizing backup Traceback (most recent call last): File "./backup_vm.py", line 525, in <module> main() File "./backup_vm.py", line 191, in main args.command(args) File "./backup_vm.py", line 207, in cmd_full download_backup(connection, backup.id, args) File "./backup_vm.py", line 407, in download_backup incremental=has_incremental) File "./backup_vm.py", line 425, in download_disk timeout_policy=types.ImageTransferTimeoutPolicy(args.timeout_policy)) File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 235, in create_transfer .format(transfer.id, transfer.phase)) RuntimeError: Unexpected transfer a9bef242-484a-4f04-b771-1f1055b492a1 phase finalizing_failure Versions: 4.4.8.6-0.1.el8ev vdsm-4.40.80.6-1.el8ev.x86_64 libvirt-7.0.0-14.4.module+el8.4.0+12413+d23780e6
2021-10-21 16:59:36,412+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (default task-5) [3424710f-6a2a-4d41-8a03-5ae96996053a] START, AddImageTicketVDSCommand(HostName = host_mixed_2, AddImageTicketVDSCommandParameters:{hostId='eefe7565-0aa2-4e75-bc10-7e99817aaef9', ticketId='e5dcc834-aa06-4c84-aa0f-dc36433dad53', timeout='300', operations='[read]', size='21474836480', url='null', filename='null', sparse='false', transferId='284a6ee2-0c19-4a0a-8799-a23f1564c9e0', dirty='false'}), log id: 27770105 2021-10-21 16:59:36,428+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (default task-5) [3424710f-6a2a-4d41-8a03-5ae96996053a] Failed in 'AddImageTicketVDS' method 2021-10-21 16:59:36,434+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [3424710f-6a2a-4d41-8a03-5ae96996053a] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command AddImageTicketVDS failed: Image daemon request failed: "status=400, reason=Bad Request, error=Invalid ticket: Invalid value for 'url': None: expecting a <class 'str'> value\n" The problem is that url was empty, this happened because the VM was running but was paused, so StartVmBackup performed cold backup but TransferDiskImage expects it to be a live backup because the VM is not down[1] private boolean isLiveBackup() { return isBackup() && getBackupVm() != null && !getBackupVm().isDown(); } [1] https://github.com/oVirt/ovirt-engine/blob/e921ac72c181872004c02db674ca835470149894/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/storage/disk/image/TransferDiskImageCommand.java#L1123
(In reply to sshmulev from comment #0) ... > Steps to reproduce: > 1. Start a VM with 20g thin disk (enable incremental) > 2. In the VM, starts dd process, writing 20g to the think disk ... > 3. during the writing started backup > > ./backup_vm.py -c engine full dc411543-5a9c-440e-9567-12b83214ea28 ... > Actual results: > ./backup_vm.py -c engine full b5621374-29b7-4d67-8928-9a8237699648 > [ 0.0 ] Starting full backup for VM 'b5621374-29b7-4d67-8928-9a8237699648' > [ 1.7 ] Waiting until backup 'b7135713-df89-4325-8438-ec2e6bee7ec4' is > ready > [ 29.3 ] Created checkpoint 'cc2580d6-b905-47e4-8b68-f202ba2e9472' (to use > in --from-checkpoint-uuid for the next incremental backup) > [ 29.4 ] Downloading full backup for disk > '684cb233-c78c-4225-8356-1f106719694c' > [ 30.7 ] Finalizing backup > Traceback (most recent call last): ... > File "./backup_vm.py", line 407, in download_backup > incremental=has_incremental) > File "./backup_vm.py", line 425, in download_disk > timeout_policy=types.ImageTransferTimeoutPolicy(args.timeout_policy)) > File > "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", > line 235, in create_transfer > .format(transfer.id, transfer.phase)) > RuntimeError: Unexpected transfer 284a6ee2-0c19-4a0a-8799-a23f1564c9e0 phase > finalizing_failure This is not an error in finalizing the backup, but an error in downloading the disk 684cb233-c78c-4225-8356-1f106719694c. This is a high severity issue. I'm missing more info on the system state after this failure: - Did the transfer finish? - Did the backup finish? If a transfer or backup get stuck, the disk and vm remain locked, these are urgent issues that should never happen.
(In reply to Benny Zlotnik from comment #3) > The problem is that url was empty, this happened because the VM was running > but was paused, so StartVmBackup performed cold backup but TransferDiskImage > expects it to be a live backup because the VM is not down[1] > > private boolean isLiveBackup() { > return isBackup() && getBackupVm() != null && > !getBackupVm().isDown(); > } This is a very bad bug that can cause a data corruption. Cold backup should be used only when the VM is DOWN, never when the VM is PAUSED or in any other states which are not DOWN. We have 3 states: - live backup: vm is UP - cold backup: vm is DOWN - backup not possible: any other state
(In reply to Nir Soffer from comment #4) > (In reply to sshmulev from comment #0) > ... > > Steps to reproduce: > > 1. Start a VM with 20g thin disk (enable incremental) > > 2. In the VM, starts dd process, writing 20g to the think disk > ... > > 3. during the writing started backup > > > > ./backup_vm.py -c engine full dc411543-5a9c-440e-9567-12b83214ea28 > ... > > Actual results: > > ./backup_vm.py -c engine full b5621374-29b7-4d67-8928-9a8237699648 > > [ 0.0 ] Starting full backup for VM 'b5621374-29b7-4d67-8928-9a8237699648' > > [ 1.7 ] Waiting until backup 'b7135713-df89-4325-8438-ec2e6bee7ec4' is > > ready > > [ 29.3 ] Created checkpoint 'cc2580d6-b905-47e4-8b68-f202ba2e9472' (to use > > in --from-checkpoint-uuid for the next incremental backup) > > [ 29.4 ] Downloading full backup for disk > > '684cb233-c78c-4225-8356-1f106719694c' > > [ 30.7 ] Finalizing backup > > Traceback (most recent call last): > ... > > File "./backup_vm.py", line 407, in download_backup > > incremental=has_incremental) > > File "./backup_vm.py", line 425, in download_disk > > timeout_policy=types.ImageTransferTimeoutPolicy(args.timeout_policy)) > > File > > "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", > > line 235, in create_transfer > > .format(transfer.id, transfer.phase)) > > RuntimeError: Unexpected transfer 284a6ee2-0c19-4a0a-8799-a23f1564c9e0 phase > > finalizing_failure > > This is not an error in finalizing the backup, but an error in downloading > the disk 684cb233-c78c-4225-8356-1f106719694c. This is a high severity > issue. > > I'm missing more info on the system state after this failure: > - Did the transfer finish? > - Did the backup finish? > > If a transfer or backup get stuck, the disk and vm remain locked, these are > urgent issues that should never happen. @Sophie can you please check it out and answer Nir ?
Adding more clarifications for reproducing the bug: 1) The "enable incremental backup" of the disk doesn't influence it - The issue can happen also when it's disabled. 2) When It fails in the transfer state, the disks of the VM stays in "finalizing_cleanup" state /api/imagetransfers <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <image_transfers> <image_transfer href="/ovirt-engine/api/imagetransfers/27ee03f2-9da7-4635-a2dc-c6e0dd4525a9" id="27ee03f2-9da7-4635-a2dc-c6e0dd4525a9"> <actions> <link href="/ovirt-engine/api/imagetransfers/27ee03f2-9da7-4635-a2dc-c6e0dd4525a9/resume" rel="resume"/> <link href="/ovirt-engine/api/imagetransfers/27ee03f2-9da7-4635-a2dc-c6e0dd4525a9/cancel" rel="cancel"/> <link href="/ovirt-engine/api/imagetransfers/27ee03f2-9da7-4635-a2dc-c6e0dd4525a9/pause" rel="pause"/> <link href="/ovirt-engine/api/imagetransfers/27ee03f2-9da7-4635-a2dc-c6e0dd4525a9/extend" rel="extend"/> <link href="/ovirt-engine/api/imagetransfers/27ee03f2-9da7-4635-a2dc-c6e0dd4525a9/finalize" rel="finalize"/> </actions> <active>false</active> <direction>download</direction> <format>raw</format> <inactivity_timeout>60</inactivity_timeout> <phase>finalizing_cleanup</phase> <shallow>false</shallow> <timeout_policy>cancel</timeout_policy> <transferred>0</transferred> <image id="5878643b-20d4-48aa-807a-1001418ae031"/> </image_transfer> </image_transfers> 3) The backup is finished, checked via rest API /ovirt-engine/api/vms/vm-id/backups - was empty from backups. 4) The disk status is ok from rest API, only marked as: "finalizing_cleanup" - meaning the disks are not locked So moving back the severity to high.
So we have a transfer stuck in non final state. This can break code expecting single image transfer per disk at the same time. In 4.4.7 we fixed a horrible bug when same disk was reported multiple times both in UI/API, because we had finished transfers, and the system was not filtering out finished transfers in when querying the database. Please test: 1. We can perform more backups for this vm when we have a transfer stuck in this pahse 2. We can download that disk (after stopping the vm) 3. If we can download the disk, do we report the right status for the disk during the download, or it reports "finalizing_cleanup"? 4. If we can download the disk, do we see duplicate disks in the UI/API? UI: vms > vm name -> disks API: /api/vms/vm-id/diskattachments/
(In reply to Nir Soffer from comment #8) > So we have a transfer stuck in non final state. > > This can break code expecting single image transfer per disk at the same > time. > > In 4.4.7 we fixed a horrible bug when same disk was reported multiple times > both in UI/API, because we had finished transfers, and the system was not > filtering out finished transfers in when querying the database. > > Please test: > > 1. We can perform more backups for this vm when we have a transfer stuck > in this phase seems like a problem: Repetedly have this error while trying to create another backup after this disk state: # ./backup_vm.py -c engine full 2cc0f3d4-8328-4949-813a-07e1a6261576 [ 0.0 ] Starting full backup for VM '2cc0f3d4-8328-4949-813a-07e1a6261576' [ 1.4 ] Waiting until backup '8293c90f-4fbd-42d3-be51-fc832e92c219' is ready Traceback (most recent call last): File "./backup_vm.py", line 525, in <module> main() File "./backup_vm.py", line 191, in main args.command(args) File "./backup_vm.py", line 205, in cmd_full backup = start_backup(connection, args) File "./backup_vm.py", line 347, in start_backup backup = get_backup(connection, backup_service, backup.id) File "./backup_vm.py", line 519, in get_backup .format(backup_uuid, last_event)) RuntimeError: Backup 8293c90f-4fbd-42d3-be51-fc832e92c219 has failed, last reported event: {'code': 10792, 'description': 'Backup 8293c90f-4fbd-42d3-be51-fc832e92c219 for VM VM_2016913 failed (User: admin@internal-authz).'} > > 2. We can download that disk (after stopping the VM) > Passed [root@storage-ge3-vdsm3 examples]# python3.6 download_disk.py -c engine 5878643b-20d4-48aa-807a-1001418ae031 /tmp/bla [ 0.0 ] Connecting... [ 0.2 ] Creating image transfer... [ 2.9 ] Transfer ID: 99a7c2f1-a5fe-4a40-a238-f907671b908f [ 2.9 ] Transfer host name: host_mixed_3 [ 2.9 ] Downloading disk... [ 100.00% ] 20.00 GiB, 51.01 seconds, 401.50 MiB/s [ 53.9 ] Finalizing image transfer... [ 57.9 ] Download completed successfully [root@storage-ge3-vdsm3 examples]# > 3. If we can download the disk, do we report the right status for > the disk during the download, or it reports "finalizing_cleanup"? > We can download the disk, although another instance of this disk is created during the download (when it's done it's also gone), reports right to the new temporary one, but still keeps the one in the finalizing cleanup. - seems like the the disk is stuck on that "finalizing cleanup" state. > 4. If we can download the disk, do we see duplicate disks in the UI/API? > UI: vms > vm name -> disks > API: /api/vms/vm-id/diskattachments/ The duplicate disks appear only during the download process itself, and then they are gone.
Bumping severity based on comment 9.
(In reply to Benny Zlotnik from comment #3) > 2021-10-21 16:59:36,412+03 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] > (default task-5) [3424710f-6a2a-4d41-8a03-5ae96996053a] START, > AddImageTicketVDSCommand(HostName = host_mixed_2, > AddImageTicketVDSCommandParameters:{hostId='eefe7565-0aa2-4e75-bc10- > 7e99817aaef9', ticketId='e5dcc834-aa06-4c84-aa0f-dc36433dad53', > timeout='300', operations='[read]', size='21474836480', url='null', > filename='null', sparse='false', > transferId='284a6ee2-0c19-4a0a-8799-a23f1564c9e0', dirty='false'}), log id: > 27770105 > 2021-10-21 16:59:36,428+03 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] > (default task-5) [3424710f-6a2a-4d41-8a03-5ae96996053a] Failed in > 'AddImageTicketVDS' method > 2021-10-21 16:59:36,434+03 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (default task-5) [3424710f-6a2a-4d41-8a03-5ae96996053a] EVENT_ID: > VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command > AddImageTicketVDS failed: Image daemon request failed: "status=400, > reason=Bad Request, error=Invalid ticket: Invalid value for 'url': None: > expecting a <class 'str'> value\n" > > The problem is that url was empty, this happened because the VM was running > but was paused, so StartVmBackup performed cold backup but TransferDiskImage > expects it to be a live backup because the VM is not down[1] > > private boolean isLiveBackup() { > return isBackup() && getBackupVm() != null && > !getBackupVm().isDown(); > } > > > [1] > https://github.com/oVirt/ovirt-engine/blob/ > e921ac72c181872004c02db674ca835470149894/backend/manager/modules/bll/src/ > main/java/org/ovirt/engine/core/bll/storage/disk/image/ > TransferDiskImageCommand.java#L1123 Thanks Benny, you're right here. Backup started when the VM was in 'UP' state and moved to 'PAUSED' exactly when the scratch disks for the live backup were created and just before the backup operation started - Scratch disks creation ended (for live backup only) - 2021-10-21 16:59:33,171+03 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-72) [d0b2e1ad-d59e-4b33-aa71-8f3cd2f34fa5] Scratch disks prepared for the backup VM moved to 'PAUSED' state - 2021-10-21 16:59:34,746+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-13) [7858da35] VM 'b5621374-29b7-4d67-8928-9a8237699648'(New_VM) moved from 'Up' --> 'Paused' 2021-10-21 16:59:34,762+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-13) [7858da35] EVENT_ID: VM_PAUSED(1,025), VM New_VM has been paused. 2021-10-21 16:59:34,773+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-13) [7858da35] EVENT_ID: VM_PAUSED_ENOSPC(138), VM New_VM has been paused due to no Storage space error. So the engine now checks again if the backup is live/cold and the backup is taking accordingly, in this case, the VM 'PAUSED' and is considered as a cold backup so new bitmap added instead of using libvirt - 2021-10-21 16:59:35,248+03 INFO [org.ovirt.engine.core.bll.storage.disk.image.AddVolumeBitmapCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-85) [d0b2e1ad-d59e-4b33-aa71-8f3cd2f34fa5] Running command: AddVolumeBitmapCommand internal: true. Now, the VM got back to 'UP' state before the image transfer started - 2021-10-21 16:59:35,864+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-27) [7858da35] VM 'b5621374-29b7-4d67-8928-9a8237699648'(New_VM) moved from 'Paused' --> 'Up' 2021-10-21 16:59:35,879+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-27) [7858da35] EVENT_ID: VM_RECOVERED_FROM_PAUSE_ERROR(196), VM New_VM has recovered from paused back to up. So the image transfer operation tries to download the backup as a live backup and not a cold backup and the disk's URL is missing. The fix should include all the possibilities that a VM will enter a 'PAUSED' mode during the live backup operation and continue the live backup operation as it was for a running VM. Should be a fairly simple fix. Also, this flow can happen only if the VM paused right before the backup started, so it is a specific corner case.
This can happen also with incremental backup: ./backup_vm.py -c engine incremental --from-checkpoint-uuid 76e329b7-0cf0-4811-afa4-d36431bf4f75 3e5f62b4-134a-42c4-bf39-46eb34042a16 [ 0.0 ] Starting incremental backup for VM '3e5f62b4-134a-42c4-bf39-46eb34042a16' [ 1.6 ] Waiting until backup '6676d238-2f5e-4659-897e-9684c63fdfc8' is ready [ 18.8 ] Created checkpoint 'f4967aba-9534-4968-ae4e-e7a12ab334a5' (to use in --from-checkpoint-uuid for the next incremental backup) [ 18.9 ] Downloading incremental backup for disk 'cc45e439-5a0c-4fda-abc3-681743a6d0b7' [ 20.0 ] Finalizing backup Traceback (most recent call last): File "./backup_vm.py", line 525, in <module> main() File "./backup_vm.py", line 191, in main args.command(args) File "./backup_vm.py", line 225, in cmd_incremental download_backup(connection, backup.id, args, incremental=True) File "./backup_vm.py", line 407, in download_backup incremental=has_incremental) File "./backup_vm.py", line 425, in download_disk timeout_policy=types.ImageTransferTimeoutPolicy(args.timeout_policy)) File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 235, in create_transfer .format(transfer.id, transfer.phase)) RuntimeError: Unexpected transfer 6cfd393e-8c53-4f9e-9bb1-522927ce27ec phase finalizing_failure The same thing happens - The disk is stuck in "finalizing_cleanup" state and it's impossible to do another incremental backup or full backup: [ 0.0 ] Starting incremental backup for VM '3e5f62b4-134a-42c4-bf39-46eb34042a16' [ 1.4 ] Waiting until backup 'e0231ac6-f27b-4bb6-8163-e38449088b1a' is ready Traceback (most recent call last): File "./backup_vm.py", line 525, in <module> main() File "./backup_vm.py", line 191, in main args.command(args) File "./backup_vm.py", line 223, in cmd_incremental backup = start_backup(connection, args) File "./backup_vm.py", line 347, in start_backup backup = get_backup(connection, backup_service, backup.id) File "./backup_vm.py", line 519, in get_backup .format(backup_uuid, last_event)) RuntimeError: Backup e0231ac6-f27b-4bb6-8163-e38449088b1a has failed, last reported event: {'code': 10792, 'description': 'Backup e0231ac6-f27b-4bb6-8163-e38449088b1a for VM vm1 failed (User: admin@internal-authz).'} ./backup_vm.py -c engine full 3e5f62b4-134a-42c4-bf39-46eb34042a16 [ 0.0 ] Starting full backup for VM '3e5f62b4-134a-42c4-bf39-46eb34042a16' [ 1.3 ] Waiting until backup '195e3422-ef92-47d6-9a2c-e5c53addd818' is ready Traceback (most recent call last): File "./backup_vm.py", line 525, in <module> main() File "./backup_vm.py", line 191, in main args.command(args) File "./backup_vm.py", line 205, in cmd_full backup = start_backup(connection, args) File "./backup_vm.py", line 347, in start_backup backup = get_backup(connection, backup_service, backup.id) File "./backup_vm.py", line 519, in get_backup .format(backup_uuid, last_event)) RuntimeError: Backup 195e3422-ef92-47d6-9a2c-e5c53addd818 has failed, last reported event: {'code': 10792, 'description': 'Backup 195e3422-ef92-47d6-9a2c-e5c53addd818 for VM vm1 failed (User: admin@internal-authz).'}
Verified. Versions: vdsm-4.40.90.4-1.el8ev.x86_64 ovirt-engine-4.4.9.4-0.1.el8ev.noarch Steps to reproduce: (Tried 5 times - all with success) 1. Start a VM with 20g thin disk 2. In the VM, starts dd process, writing 20g to the think disk # dd if=/dev/zero bs=1M count=20480 of=/dev/sda oflag=direct conv=fsync status=progress 3. during the writing started backup ./backup_vm.py -c engine full <VM_ID> 4. ./backup_vm.py -c engine incremental --from-checkpoint-uuid <CHECKPOINT_ID> <VM_ID> Engine logs: 2021-10-31 10:54:15,455+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-20) [bf6ca867-4e43-464a-900d-7dd525cf4853] Scratch disks created for the backup 2021-10-31 10:54:29,347+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-2) [bf6ca867-4e43-464a-900d-7dd525cf4853] Scratch disks prepared for the backup 2021-10-31 10:54:18,574+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-31) [3812da67] VM 'c453af58-10c8-4d7d-851a-605ec72fee4a'(vm3) moved from 'Up' --> 'Paused' 2021-10-31 10:54:19,450+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-17) [3812da67] VM 'c453af58-10c8-4d7d-851a-605ec72fee4a'(vm3) moved from 'Paused' --> 'Up' 2021-10-31 10:54:31,843+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-26) [91924898-e812-4f54-8253-1c7132bfaead] Creating ImageTransfer entity for command '238b39c2-d277-4c8b-a62f-2ffd30970805', proxyEnabled: true 2021-10-31 10:54:31,853+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-26) [91924898-e812-4f54-8253-1c7132bfaead] Successfully added Download disk 'latest-rhel-guest-image-8.5-infra' (disk id: '2b4ce1b9-c3e5-44d2-a8cb-133cd53349b0', image id: '090d0bda-4d08-4d5a-8e3b-fe6c820d8cc0') for image transfer command '238b39c2-d277-4c8b-a62f-2ffd30970805' 2021-10-31 10:54:31,887+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (default task-26) [91924898-e812-4f54-8253-1c7132bfaead] START, AddImageTicketVDSCommand(HostName = host_mixed_2, AddImageTicketVDSCommandParameters:{hostId='281e0d9c-c7fd-4a19-9a4c-dd49a3379326', ticketId='9ef86998-500e-4c73-85ca-9e392aba8aa7', timeout='300', operations='[read]', size='10737418240', url='nbd:unix:/run/vdsm/backup/7b8f5974-0745-4856-84b6-50ce13061b12:exportname=vda', filename='null', sparse='false', transferId='238b39c2-d277-4c8b-a62f-2ffd30970805', dirty='false'}), log id: 342b3b4c 2021-10-31 10:54:31,898+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (default task-26) [91924898-e812-4f54-8253-1c7132bfaead] FINISH, AddImageTicketVDSCommand, return: StatusOnlyReturn [status=Status [code=0, message=Done]], log id: 342b3b4c 2021-10-31 10:54:31,898+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-26) [91924898-e812-4f54-8253-1c7132bfaead] Started transfer session with ticket id 9ef86998-500e-4c73-85ca-9e392aba8aa7, timeout 300 seconds 2021-10-31 10:54:31,898+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-26) [91924898-e812-4f54-8253-1c7132bfaead] Adding proxy image ticket, id 9ef86998-500e-4c73-85ca-9e392aba8aa7 2021-10-31 10:54:31,902+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-26) [91924898-e812-4f54-8253-1c7132bfaead] Started transfer session with transfer id 238b39c2-d277-4c8b-a62f-2ffd30970805, timeout 300 seconds 2021-10-31 10:54:31,911+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-26) [91924898-e812-4f54-8253-1c7132bfaead] Updating image transfer 238b39c2-d277-4c8b-a62f-2ffd30970805 (image 2b4ce1b9-c3e5-44d2-a8cb-133cd53349b0) phase to Transferring 2021-10-31 10:54:31,912+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-26) [91924898-e812-4f54-8253-1c7132bfaead] Returning from proceedCommandExecution after starting transfer session for image transfer command '238b39c2-d277-4c8b-a62f-2ffd30970805' 2021-10-31 10:54:31,922+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-26) [91924898-e812-4f54-8253-1c7132bfaead] EVENT_ID: TRANSFER_IMAGE_INITIATED(1,031), Image Download with disk latest-rhel-guest-image-8.5-infra was initiated by admin@internal-authz.
This bugzilla is included in async update of oVirt engine (4.4.9.4) for oVirt 4.4.9 release, published on October 28th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.9 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.