Bug 2027260

Summary: Cold backup fail in various ways - backup is reported ready before add_bitmap jobs complete
Product: [oVirt] ovirt-engine Reporter: Nir Soffer <nsoffer>
Component: Backup-Restore.VMsAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED CURRENTRELEASE QA Contact: Amit Sharir <asharir>
Severity: urgent Docs Contact: bugs <bugs>
Priority: unspecified    
Version: 4.4.10CC: ahadas, asharir, bugs, bzlotnik, eshames, lsvaty, pbar
Target Milestone: ovirt-4.4.10Keywords: ZStream
Target Release: 4.4.10.1Flags: pm-rhel: ovirt-4.4+
lsvaty: blocker-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.10.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-19 07:00:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Vdsm log showing several cold backup failures none

Description Nir Soffer 2021-11-29 09:00:41 UTC
Created attachment 1843966 [details]
Vdsm log showing several cold backup failures

Description of problem:

Cold backup fails randomly most of the time because engine reports the backup is
ready before the add_bitmap jobs complete.

When running cold backup, engine adds a new bitmap to the top volume. Engine
has this loop:

    for (DiskImage diskImage : vmBackup.getDisks()) {
        ...
        ActionReturnValue returnValue = runInternalActionWithTasksContext(
            ActionType.AddVolumeBitmap, parameters);

Once the internal action returns, engine switch state to READY and user
can start the backup process. However when the user starts the backup, the
add_bitmap storage jobs are still running.

The add_bitmap job is doing:

1. Activate logical volume including parent logical volumes
2. List bitmaps in qcow2 image
3. Add bitmap to qcow2 image
4. Deactivate logical volume including parent logical volumes

Disk backup is doing:

1. Activate logical volume including parent logical volumes
2. For incremental backup, prepare overlay on top of original image
3. Merge bitamp from entire chain to overlay
4. Start qemu-nbd with logical volume (full backup) or overlay (incremental backup)
5. Delete the overlay if doing incremental backup

When running add_bitmap job in the same time as backup, we see various failures:

## Failure 1: qemu-img fail to open the volume 

add_bitmap job deactivate volume right after the volume was activated
for preparing a backup.
 
/var/log/vdsm/vdsm.log:

2021-11-28 22:30:11,446+0200 INFO  (jsonrpc/7) [vdsm.api] FINISH start_nbd_server error=Command ['/usr/bin/qemu-img', 'info', '--output', 'json', '-f', 'qcow2', '--backing-chain', '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840'] failed with rc=1 out=b'' err=b"qemu-img: Could not open '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/9d63f782-7467-4243-af1e-5c1f8b49c111': Could not open '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/9d63f782-7467-4243-af1e-5c1f8b49c111': No such file or directory\n" from=::ffff:192.168.122.11,53010, flow_id=22b1fe4d-3498-4a63-adfc-c4ff99043585, task_id=6caa126e-1493-47fd-bba8-bf9d7ee5cb93 (api:52)


## Failure 2: qemu-nbd fail to accept connections


add_bitmap job deactivate parent volumes after qemu-nbd was started.
Client fail to connect to qemu-nbd.

/var/log/ovirt-imageio/daemon.log:

2021-11-28 22:24:18,376 INFO    (Thread-15) [images] [::ffff:192.168.122.1] OPTIONS ticket=4cebb41f-e0e2-40b8-8968-e4420ddb868
7
2021-11-28 22:24:18,376 ERROR   (Thread-15) [http] Server error
Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/__init__.py", line 66, in get
    return ticket.get_context(req.connection_id)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/auth.py", line 146, in get_context
    return self._connections[con_id]
KeyError: 15

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/http.py", line 785, in __call__
    self.dispatch(req, resp)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/http.py", line 830, in dispatch
    return method(req, resp, *match.groups())
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/cors.py", line 84, in wrapper
    return func(self, req, resp, *args)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/images.py", line 246, in options
    ctx = backends.get(req, ticket, self.config)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/__init__.py", line 85, in get
    cafile=ca_file)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/nbd.py", line 42, in open
    client = nbd.open(url, dirty=dirty)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 321, in open
    return Client(address, name, dirty=dirty)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 408, in __init__
    self._sock = self._connect(address)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 483, in _connect
    return self._create_unix_connection(address)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line 513, in _create_unix_connection
    sock.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
2021-11-28 22:24:18,377 INFO    (Thread-15) [http] CLOSE connection=15 client=::ffff:192.168.122.1 [connection 1 ops, 0.001430
 s] [dispatch 1 ops, 0.000770 s]


## Failure 3: "qemu-img bitmap" fail to lock the image

Because "qemu-img bitmap" run by add_bitmap storage job holds
a write lock.

2021-11-19 00:18:42,065+0200 ERROR (jsonrpc/3) [storage.dispatcher]
FINISH start_nbd_server error=Command ['/usr/bin/qemu-img'
, 'bitmap', '--merge', '0345dff0-cd6a-4a5f-bf3a-bf230318c5c0', '-F',
'qcow2', '-b', '/rhev/data-center/mnt/blockSD/feab3738-c1
58-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840',
'/var/lib/vdsm/st
orage/transient_disks/fbf383f0-8de9-4e14-9a4c-2f9194ade81d/overlay',
'0345dff0-cd6a-4a5f-bf3a-bf230318c5c0'] failed with rc=1
out=b'' err=b'qemu-img: Could not open
\'/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca
51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840\':
Failed to get shared "write" lock\nIs another process using
the image [/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840]?\n' (dispatcher:87)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py",
line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line
110, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line
1190, in prepare
    raise self.error
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line
884, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-231>", line 2, in start_nbd_server
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line
3757, in start_nbd_server
    url = nbd.start_server(server_id, config)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/nbd.py", line
130, in start_server
    path = _create_overlay(server_id, vol.volumePath, cfg.bitmap)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/nbd.py", line
207, in _create_overlay
    src_img, bitmap, "qcow2", overlay, bitmap).run()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/operation.py",
line 83, in run
    self._finalize(out, err)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/operation.py",
line 179, in _finalize
    raise cmdutils.Error(self._cmd, rc, out, err)
vdsm.common.cmdutils.Error: Command ['/usr/bin/qemu-img', 'bitmap',
'--merge', '0345dff0-cd6a-4a5f-bf3a-bf230318c5c0', '-F', '
qcow2', '-b', '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/
44d498a1-54a5-4371-8eda-02d839d7c840',
'/var/lib/vdsm/storage/transient_disks/fbf383f0-8de9-4e14-9a4c-2f9194ade81d/overlay',
'
0345dff0-cd6a-4a5f-bf3a-bf230318c5c0'] failed with rc=1 out=b''
err=b'qemu-img: Could not open \'/rhev/data-center/mnt/blockSD
/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840\':
Fail
ed to get shared "write" lock\nIs another process using the image
[/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95
c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840]?\n'


## Failure 4: "qemu-img bitmap" fail to lock the image

This is a variant of 3 - "qemu-img bitmap" run by add_bitmap
storage job fails because "qemu-img bimtap" run by start_nbd_server
code took the lock.

In this case the current backup may succeed, but the next incremental
backup will fail since bitmap was not added.


The root cause is not waiting for the storage jobs to complete before
reporting the backup as READY.

In engine we should have this flow:

    for disk in disks:
        start add bitmap job

    wait until add bitmap jobs complete

    report backup as ready


Version-Release number of selected component (if applicable):
4.5.0-0.0.master.20211118145311.el8

How reproducible:
Randomly, but mostly fails.

Steps to Reproduce:
1. Start full or incremental backup

Actual results:
Backup fails in various ways, logs showing that add_bitmap jobs were
running when backup was started.

Expected results:
Backup succeeds, log show that add_bitmap jobs completed before backup
was started.

Additional info:


## Complete flow showing failure 3:

Staring incremental backup succeeded...

$ ./backup_vm.py -c engine-dev incremental --from-checkpoint-uuid
0345dff0-cd6a-4a5f-bf3a-bf230318c5c0 --backup-dir
/data/scratch/nsoffer/backups 6e95d38f-d9b8-4955-878c-da6d631d0ab2
[   0.0 ] Starting incremental backup for VM
'6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.3 ] Waiting until backup '9b66cc4e-4814-4162-8200-aba58aa54b18' is ready
[   1.3 ] Created checkpoint 'be87e5a5-5ed3-4aa5-915a-84f992a7bcf5'
(to use in --from-checkpoint-uuid for the next incremental backup)
[   1.4 ] Downloading incremental backup for disk
'242c082f-ca51-4cf4-82b9-fd3a3c92f38b'

But the download was stuck, and finally timed out:

[  62.6 ] Finalizing backup
Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py",
line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py",
line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py",
line 225, in cmd_incremental
    download_backup(connection, backup.id, args, incremental=True)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py",
line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py",
line 420, in download_disk
    transfer = imagetransfer.create_transfer(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/helpers/imagetransfer.py",
line 240, in create_transfer
    raise RuntimeError(
RuntimeError: Timed out waiting for transfer
e97a0008-b57a-4931-8153-d87113da9431

In engine event log we see this error:

VDSM host4 command StartNbdServerVDS failed: Command
['/usr/bin/qemu-img', 'bitmap', '--merge',
'0345dff0-cd6a-4a5f-bf3a-bf230318c5c0', '-F', 'qcow2', '-b',
'/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840',
'/var/lib/vdsm/storage/transient_disks/fbf383f0-8de9-4e14-9a4c-2f9194ade81d/overlay',
'0345dff0-cd6a-4a5f-bf3a-bf230318c5c0'] failed with rc=1 out=b''
err=b'qemu-img: Could not open
\'/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840\':
Failed to get shared "write" lock\nIs another process using the image
[/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840]?\n'

Looks like someone was accessing the image while we tried to prepare the disk
for cold backup.

But when this happens, starting the nbd server fails. Looking in vdsm
log we see:

2021-11-19 00:18:41,805+0200 DEBUG (jsonrpc/3) [jsonrpc.JsonRpcServer]
Calling 'NBD.start_server' in bridge with {'server_id':
 'fbf383f0-8de9-4e14-9a4c-2f9194ade81d', 'config': {'detect_zeroes':
True, 'discard': False, 'readonly': True, 'bitmap': '0345
dff0-cd6a-4a5f-bf3a-bf230318c5c0', 'sd_id':
'feab3738-c158-4d48-8a41-b5a95c057a50', 'img_id':
'242c082f-ca51-4cf4-82b9-fd3a3c9
2f38b', 'vol_id': '44d498a1-54a5-4371-8eda-02d839d7c840',
'backing_chain': True}} (__init__:333)

We created an overlay image using transient disk:

2021-11-19 00:18:41,941+0200 INFO  (jsonrpc/3) [storage.transientdisk]
Creating transient disk /var/lib/vdsm/storage/transientas
_disks/fbf383f0-8de9-4e14-9a4c-2f9194ade81d/overlay (transientdisk:61)

Then we create qcow2 image in the overlay using volume
44d498a1-54a5-4371-8eda-02d839d7c840 as a backing file:

2021-11-19 00:18:41,941+0200 DEBUG (jsonrpc/3) [common.commands]
/usr/bin/taskset --cpu-list 0-1 /usr/bin/nice -n 19 /usr/bin/
ionice -c 3 /usr/bin/qemu-img create -f qcow2 -o compat=1.1 -b
/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057
a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840
-F qcow2 /var/lib/vdsm/storage/transient_
disks/fbf383f0-8de9-4e14-9a4c-2f9194ade81d/overlay (cwd None) (commands:154)

But at the same time there is another thread accessing the same volume!

2021-11-19 00:18:41,952+0200 DEBUG (tasks/4) [storage.misc.exccmd]
SUCCESS: <err> = b'1+0 records in\n1+0 records out\n512 byt
es copied, 0.000743612 s, 689 kB/s\n'; <rc> = 0 (commands:247)

2021-11-19 00:18:41,952+0200 DEBUG (tasks/4) [storage.misc] err:
[b'1+0 records in', b'1+0 records out', b'512 bytes copied,
0.000743612 s, 689 kB/s'], size: 512 (misc:112)

2021-11-19 00:18:41,952+0200 INFO  (tasks/4) [storage.volumemanifest]
Starting volume operation on 44d498a1-54a5-4371-8eda-02d839d7c840
(generation=2, set_illegal=False) (volume:691)

tasks/4 is adding a bitmap in the volume used as the backing file of
the overlay...

2021-11-19 00:18:41,953+0200 INFO  (tasks/4) [storage.bitmaps] Add
bitmap be87e5a5-5ed3-4aa5-915a-84f992a7bcf5 to
'/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840'
(bitmaps:168)

thread tasks/4 is starting to add a bitmap in the volume
44d498a1-54a5-4371-8eda-02d839d7c840

2021-11-19 00:18:41,953+0200 DEBUG (tasks/4) [common.commands]
/usr/bin/taskset --cpu-list 0-1 /usr/bin/nice -n 19 /usr/bin/ionice -c
3 /usr/bin/qemu-img bitmap --add --enable
/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840
be87e5a5-5ed3-4aa5-915a-84f992a7bcf5 (cwd
/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b)
(commands:154)

thread jsonrpc/3 finished to create the overlay image...

2021-11-19 00:18:42,002+0200 DEBUG (jsonrpc/3) [storage.operation]
SUCCESS: <err> = b''; <rc> = 0 (operation:170)

thread jsonrpc/3 add bitmap in the overlay...

2021-11-19 00:18:42,003+0200 DEBUG (jsonrpc/3) [common.commands]
/usr/bin/taskset --cpu-list 0-1 /usr/bin/nice -n 19 /usr/bin/ionice -c
3 /usr/bin/qemu-img bitmap --add --enable
/var/lib/vdsm/storage/transient_disks/fbf383f0-8de9-4e14-9a4c-2f9194ade81d/overlay
0345dff0-cd6a-4a5f-bf3a-bf230318c5c0 (cwd
/var/lib/vdsm/storage/transient_disks/fbf383f0-8de9-4e14-9a4c-2f9194ade81d)
(commands:154)

2021-11-19 00:18:42,035+0200 DEBUG (jsonrpc/3) [storage.operation]
SUCCESS: <err> = b''; <rc> = 0 (operation:170)

thread jsonrpc/3 try to merge bitmaps into the ovelay...

2021-11-19 00:18:42,035+0200 DEBUG (jsonrpc/3) [common.commands]
/usr/bin/taskset --cpu-list 0-1 /usr/bin/nice -n 19 /usr/bin/ionice -c
3 /usr/bin/qemu-img bitmap --merge
0345dff0-cd6a-4a5f-bf3a-bf230318c5c0 -F qcow2 -b
/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840
/var/lib/vdsm/storage/transient_disks/fbf383f0-8de9-4e14-9a4c-2f9194ade81d/overlay
0345dff0-cd6a-4a5f-bf3a-bf230318c5c0 (cwd
/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b)
(commands:154)

And it fails since someone else (thread tasks/4) is taking a lock on
the backing file.

2021-11-19 00:18:42,063+0200 DEBUG (jsonrpc/3) [storage.operation]
FAILED: <err> = b'qemu-img: Could not open
\'/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840\':
Failed to get shared "write" lock\nIs another process using the image
[/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840]?\n';
<rc> = 1 (operation:170)

And tasks/4 finished.

2021-11-19 00:18:42,089+0200 DEBUG (tasks/4) [storage.operation]
SUCCESS: <err> = b''; <rc> = 0 (operation:170)

We modify the image being backed up - bad bad bug.

Now the NBD.start_server request fails, as expected:

2021-11-19 00:18:42,065+0200 ERROR (jsonrpc/3) [storage.dispatcher]
FINISH start_nbd_server error=Command ['/usr/bin/qemu-img'
, 'bitmap', '--merge', '0345dff0-cd6a-4a5f-bf3a-bf230318c5c0', '-F',
'qcow2', '-b', '/rhev/data-center/mnt/blockSD/feab3738-c1
58-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840',
'/var/lib/vdsm/st
orage/transient_disks/fbf383f0-8de9-4e14-9a4c-2f9194ade81d/overlay',
'0345dff0-cd6a-4a5f-bf3a-bf230318c5c0'] failed with rc=1
out=b'' err=b'qemu-img: Could not open
\'/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca
51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840\':
Failed to get shared "write" lock\nIs another process using
the image [/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d4
98a1-54a5-4371-8eda-02d839d7c840]?\n' (dispatcher:87)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py",
line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line
110, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line
1190, in prepare
    raise self.error
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line
884, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-231>", line 2, in start_nbd_server
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line
3757, in start_nbd_server
    url = nbd.start_server(server_id, config)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/nbd.py", line
130, in start_server
    path = _create_overlay(server_id, vol.volumePath, cfg.bitmap)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/nbd.py", line
207, in _create_overlay
    src_img, bitmap, "qcow2", overlay, bitmap).run()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/operation.py",
line 83, in run
    self._finalize(out, err)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/operation.py",
line 179, in _finalize
    raise cmdutils.Error(self._cmd, rc, out, err)
vdsm.common.cmdutils.Error: Command ['/usr/bin/qemu-img', 'bitmap',
'--merge', '0345dff0-cd6a-4a5f-bf3a-bf230318c5c0', '-F', '
qcow2', '-b', '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/
44d498a1-54a5-4371-8eda-02d839d7c840',
'/var/lib/vdsm/storage/transient_disks/fbf383f0-8de9-4e14-9a4c-2f9194ade81d/overlay',
'
0345dff0-cd6a-4a5f-bf3a-bf230318c5c0'] failed with rc=1 out=b''
err=b'qemu-img: Could not open \'/rhev/data-center/mnt/blockSD
/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840\':
Fail
ed to get shared "write" lock\nIs another process using the image
[/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95
c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840]?\n'

On engine side, we see the error:

2021-11-19 00:18:42,077+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.StartNbdServerVDSCommand]
(default task-1) [ea5f41
c1-2fc3-4414-b85b-36c6da71d6af] FINISH, StartNbdServerVDSCommand,
return: , log id: 78f11471
2021-11-19 00:18:42,078+02 ERROR
[org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
(default task-1) [ea5
f41c1-2fc3-4414-b85b-36c6da71d6af] Failed to prepare image for
transfer session: EngineException: org.ovirt.engine.core.vdsbro
ker.vdsbroker.VDSErrorException: VDSGenericException:
VDSErrorException: Failed to StartNbdServerVDS, error = Command
['/usr/b
in/qemu-img', 'bitmap', '--merge',
'0345dff0-cd6a-4a5f-bf3a-bf230318c5c0', '-F', 'qcow2', '-b',
'/rhev/data-center/mnt/blockSD
/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840',
'/var
/lib/vdsm/storage/transient_disks/fbf383f0-8de9-4e14-9a4c-2f9194ade81d/overlay',
'0345dff0-cd6a-4a5f-bf3a-bf230318c5c0'] faile
d with rc=1 out=b'' err=b'qemu-img: Could not open
\'/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images
/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840\':
Failed to get shared "write" lock\nIs another pr
ocess using the image
[/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3
c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840]?\n', code = 100 (Failed
with error GeneralException and code 100): org.ovirt.eng
ine.core.common.errors.EngineException: EngineException:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGener
icException: VDSErrorException: Failed to StartNbdServerVDS, error =
Command ['/usr/bin/qemu-img', 'bitmap', '--merge', '0345d
ff0-cd6a-4a5f-bf3a-bf230318c5c0', '-F', 'qcow2', '-b',
'/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/ima
ges/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840',
'/var/lib/vdsm/storage/transient_disks/fbf383f
0-8de9-4e14-9a4c-2f9194ade81d/overlay',
'0345dff0-cd6a-4a5f-bf3a-bf230318c5c0'] failed with rc=1 out=b''
err=b'qemu-img: Could
 not open \'/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d
498a1-54a5-4371-8eda-02d839d7c840\': Failed to get shared "write"
lock\nIs another process using the image [/rhev/data-center/
mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c
840]?\n', code = 100 (Failed with error GeneralException and code 100)

But backup was started successfully!

Following the events for this backup in engine log:

2021-11-19 00:18:39,868+02 INFO
[org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand]
(default task-1) [17244d70-d10c-403f-999d-0ce6cf969cdf] Change VM
'6e95d38f-d9b8-4955-878c-da6d631d0ab2' backup
'9b66cc4e-4814-4162-8200-aba58aa54b18' phase from 'INITIALIZING' to
'STARTING'

2021-11-19 00:18:40,585+02 INFO
[org.ovirt.engine.core.bll.storage.disk.image.AddVolumeBitmapCommand]
(EE-ManagedScheduledExe
cutorService-engineScheduledThreadPool-Thread-17)
[17244d70-d10c-403f-999d-0ce6cf969cdf] Running command:
AddVolumeBitmapCommnd internal: true.

2021-11-19 00:18:40,630+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.AddVolumeBitmapVDSCommand]
(EE-ManagedScheduledExe
cutorService-engineScheduledThreadPool-Thread-17)
[17244d70-d10c-403f-999d-0ce6cf969cdf] FINISH,
AddVolumeBitmapVDSCommand, re
turn: , log id: 73d3acaa2

Creating first bitmap finished in 0.045 seconds

2021-11-19 00:18:40,647+02 INFO
[org.ovirt.engine.core.bll.storage.disk.image.AddVolumeBitmapCommand]
(EE-ManagedScheduledExe
cutorService-engineScheduledThreadPool-Thread-17)
[17244d70-d10c-403f-999d-0ce6cf969cdf] Running command:
AddVolumeBitmapCommand internal: true.

2021-11-19 00:18:40,800+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.AddVolumeBitmapVDSCommand]
(EE-ManagedScheduledExe
cutorService-engineScheduledThreadPool-Thread-17)
[17244d70-d10c-403f-999d-0ce6cf969cdf] START,
AddVolumeBitmapVDSCommand(Host
Name = host4, VolumeBitmapVDSCommandParameters:{hostId='85a424f7-7cd6-4a89-993c-539d6c10fd0c',
storageDomainId='feab3738-c158-
4d48-8a41-b5a95c057a50',
jobId='b2875d2b-ea1c-4cc7-bafb-8ebfd5d81cbc'}), log id: 135c503a

Creating a second bitmap finished in 0.153 seconds.

2021-11-19 00:18:40,820+02 INFO
[org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand]
(EE-ManagedScheduledExecutorS
ervice-engineScheduledThreadPool-Thread-17)
[17244d70-d10c-403f-999d-0ce6cf969cdf] Change VM
'6e95d38f-d9b8-4955-878c-da6d631d
0ab2' backup '9b66cc4e-4814-4162-8200-aba58aa54b18' phase from
'STARTING' to 'READY'

In vdsm we see:

2021-11-19 00:18:40,623+0200 INFO  (jsonrpc/7) [vdsm.api] START
sdm_add_bitmap(job_id='cce515dc-3d31-4178-a2ae-66c5e5eea641',
vol_info={'generation': 2, 'sd_id':
'feab3738-c158-4d48-8a41-b5a95c057a50', 'img_id':
'242c082f-ca51-4cf4-82b9-fd3a3c92f38b', 'vol_id':
'44d498a1-54a5-4371-8eda-02d839d7c840'},
bitmap='be87e5a5-5ed3-4aa5-915a-84f992a7bcf5')
from=::ffff:192.168.122.11,45386,
flow_id=17244d70-d10c-403f-999d-0ce6cf969cdf,
task_id=4c8986ec-fa86-4b7b-8f92-ffc9f3dc0742 (api:48)

2021-11-19 00:18:40,624+0200 DEBUG (jsonrpc/7) [storage.taskmanager]
scheduled job add_bitmap for task 4c8986ec-fa86-4b7b-8f92-ffc9f3dc0742
 (taskManager:79)

2021-11-19 00:18:40,624+0200 INFO  (jsonrpc/7) [vdsm.api] FINISH
sdm_add_bitmap return=None from=::ffff:192.168.122.11,45386,
flow_id=17244d70-d10c-403f-999d-0ce6cf969cdf,
task_id=4c8986ec-fa86-4b7b-8f92-ffc9f3dc0742 (api:54)

The command finished in 1 millisecond - because it did run yet - it is
scheduled to run later
like all storage jobs.

2021-11-19 00:18:40,624+0200 INFO  (tasks/4)
[storage.threadpool.workerthread] START task
4c8986ec-fa86-4b7b-8f92-ffc9f3dc0742 (cmd=<bound method Task.commit of
<vdsm.storage.task.Task object at 0x7f8e47f18048>>, args=None)
(threadPool:146)

Yes, this is tasks/4 we met before. It finished in 1.8 seconds.

2021-11-19 00:18:42,445+0200 INFO  (tasks/4)
[storage.threadpool.workerthread] FINISH task
4c8986ec-fa86-4b7b-8f92-ffc9f3dc0742 (threadPool:148)


## Complete flow showing failure 1:

1. Job b2bb3f7c-702f-481b-bbbf-a92cb73e29b5 starting

2021-11-28 22:30:10,073+0200 INFO  (jsonrpc/0) [vdsm.api] START sdm_add_bitmap(job_id='b2bb3f7c-702f-481b-bbbf-a92cb73e29b5', vol_info={'generation': 7, 'sd_id': 'feab3738-c158-4d48-8a41-b5a95c057a50', 'img_id': '242c082f-ca51-4cf4-82b9-fd3a3c92f38b', 'vol_id': '44d498a1-54a5-4371-8eda-02d839d7c840'}, bitmap='a2993507-d5eb-4191-b769-a5cd943e795b') from=::ffff:192.168.122.11,53010, flow_id=96fec517-538f-46d2-9a88-024240de4824, task_id=9542a71b-8eba-473e-bd0a-92332af9b99b (api:48)

2. Job b2bb3f7c-702f-481b-bbbf-a92cb73e29b5 scheduled task 9542a71b-8eba-473e-bd0a-92332af9b99b

2021-11-28 22:30:10,073+0200 INFO  (jsonrpc/0) [vdsm.api] FINISH sdm_add_bitmap return=None from=::ffff:192.168.122.11,53010, flow_id=96fec517-538f-46d2-9a88-024240de4824, task_id=9542a71b-8eba-473e-bd0a-92332af9b99b (api:54)

3. Task 9542a71b-8eba-473e-bd0a-92332af9b99b startring on thread tasks/2

2021-11-28 22:30:10,105+0200 INFO  (tasks/2) [storage.threadpool.workerthread] START task 9542a71b-8eba-473e-bd0a-92332af9b99b (cmd=<bound method Task.commit of <vdsm.storage.task.Task object at 0x7f782197cac8>>, args=None) (threadPool:146)

4. Engine prepares image for image transfer

2021-11-28 22:30:10,996+0200 INFO  (jsonrpc/4) [vdsm.api] START prepareImage(sdUUID='feab3738-c158-4d48-8a41-b5a95c057a50', spUUID='3f8e08e6-5485-429e-a686-63754906b27b', imgUUID='242c082f-ca51-4cf4-82b9-fd3a3c92f38b', leafUUID='44d498a1-54a5-4371-8eda-02d839d7c840', allowIllegal=True) from=::ffff:192.168.122.11,53010, flow_id=22b1fe4d-3498-4a63-adfc-c4ff99043585, task_id=be70f82f-cab3-49cd-b4ef-84658b2a3fce (api:48)

5. LV was already active (activated by thread tasks/2), so it is refreshed

2021-11-28 22:30:11,103+0200 INFO  (jsonrpc/4) [storage.lvm] Refreshing active lvs: vg=feab3738-c158-4d48-8a41-b5a95c057a50 lvs=['9d63f782-7467-4243-af1e-5c1f8b49c111'] (lvm:1678)

6. Thread tasks/2 deactivate LV

2021-11-28 22:30:11,137+0200 INFO  (tasks/2) [storage.lvm] Deactivating lvs: vg=feab3738-c158-4d48-8a41-b5a95c057a50 lvs=['9d63f782-7467-4243-af1e-5c1f8b49c111'] (lvm:1693)

7. Prepare image finished, but LV was deactivated by tasks/2

2021-11-28 22:30:11,413+0200 INFO  (jsonrpc/4) [vdsm.api] FINISH prepareImage return={'path': '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840', 'info': {'type': 'block', 'path': '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840'}, 'imgVolumesInfo': [{'domainID': 'feab3738-c158-4d48-8a41-b5a95c057a50', 'imageID': '242c082f-ca51-4cf4-82b9-fd3a3c92f38b', 'volumeID': '44d498a1-54a5-4371-8eda-02d839d7c840', 'path': '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840', 'leasePath': '/dev/feab3738-c158-4d48-8a41-b5a95c057a50/leases', 'leaseOffset': 110100480}, {'domainID': 'feab3738-c158-4d48-8a41-b5a95c057a50', 'imageID': '242c082f-ca51-4cf4-82b9-fd3a3c92f38b', 'volumeID': '9d63f782-7467-4243-af1e-5c1f8b49c111', 'path': '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/9d63f782-7467-4243-af1e-5c1f8b49c111', 'leasePath': '/dev/feab3738-c158-4d48-8a41-b5a95c057a50/leases', 'leaseOffset': 109051904}]} from=::ffff:192.168.122.11,53010, flow_id=22b1fe4d-3498-4a63-adfc-c4ff99043585, task_id=be70f82f-cab3-49cd-b4ef-84658b2a3fce (api:54)

8. Task 9542a71b-8eba-473e-bd0a-92332af9b99b finished

2021-11-28 22:30:11,328+0200 INFO  (tasks/2) [storage.threadpool.workerthread] FINISH task 9542a71b-8eba-473e-bd0a-92332af9b99b (threadPool:148)

9. Engine starts NBD server

2021-11-28 22:30:11,416+0200 INFO  (jsonrpc/7) [vdsm.api] START start_nbd_server(server_id='e1e63d2b-1253-4fbb-9b48-5637c7664d15', config={'detect_zeroes': True, 'discard': False, 'readonly': True, 'bitmap': 'd333d467-da40-4169-9963-c490fc7620c0', 'sd_id': 'feab3738-c158-4d48-8a41-b5a95c057a50', 'img_id': '242c082f-ca51-4cf4-82b9-fd3a3c92f38b', 'vol_id': '44d498a1-54a5-4371-8eda-02d839d7c840', 'backing_chain': True}) from=::ffff:192.168.122.11,53010, flow_id=22b1fe4d-3498-4a63-adfc-c4ff99043585, task_id=6caa126e-1493-47fd-bba8-bf9d7ee5cb93 (api:48)

10. Starting NBD server failed, since the LV was deactivated by tasks/2.

2021-11-28 22:30:11,446+0200 INFO  (jsonrpc/7) [vdsm.api] FINISH start_nbd_server error=Command ['/usr/bin/qemu-img', 'info', '--output', 'json', '-f', 'qcow2', '--backing-chain', '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/44d498a1-54a5-4371-8eda-02d839d7c840'] failed with rc=1 out=b'' err=b"qemu-img: Could not open '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/9d63f782-7467-4243-af1e-5c1f8b49c111': Could not open '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/242c082f-ca51-4cf4-82b9-fd3a3c92f38b/9d63f782-7467-4243-af1e-5c1f8b49c111': No such file or directory\n" from=::ffff:192.168.122.11,53010, flow_id=22b1fe4d-3498-4a63-adfc-c4ff99043585, task_id=6caa126e-1493-47fd-bba8-bf9d7ee5cb93 (api:52)

Comment 3 Nir Soffer 2021-11-29 09:15:30 UTC
How to reproduce:

1. Setup

VM with 2 disks:
- OS qcow2 disk using a template (I used 6g fedora image with ~1.6g of data)
- Empty 10g qcow2 data disk

Single host - other wise add_bitmap jobs may run on another host, hiding the 
incorrect flow.

2. Start full backup to create a checkpoint:

    ./backup_vm.py -c engine-dev full \
        --backup-dir backups \
        6e95d38f-d9b8-4955-878c-da6d631d0ab2

I tried 3 times, first 2 invocations failed.

3. Run incremental backup using the checkpoint created in step 2.

    ./backup_vm.py -c engine-dev incremental \
        --from-checkpoint-uuid 0345dff0-cd6a-4a5f-bf3a-bf230318c5c0 \
        --backup-dir backups \
        6e95d38f-d9b8-4955-878c-da6d631d0ab2

I tried few times, most runs failed.

Since this is a race, you need to run both full backup and incremental backup
in a loop to reproduce the errors.

You can run incremental backup multiple times using the same checkpoint. The
backup creates a new checkpoint, but backing up the same checkpoint multiple
times is valid, and will create identical backup.

Comment 4 Nir Soffer 2021-11-29 10:24:09 UTC
Example reproduction with full backup:

10 backups, 1 succeeded, 9 failed!

$ for i in $(seq 10); do echo iteration $i; ./backup_vm.py -c engine-dev full --backup-dir /data/scratch/backups 6e95d38f-d9b8-4955-878c-da6d631d0ab2; echo "---"; done
iteration 1
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.6 ] Waiting until backup 'ce9da0d8-7237-4ad0-85ec-61e36df7c43a' is ready
[   1.6 ] Created checkpoint '88fb7415-9267-4171-add9-1607239a3213' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.7 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   3.7 ] Image transfer '8babfd11-b878-480d-bc60-ae02115c8072' is ready
[   3.7 ] Finalizing image transfer0 bytes/s                                   
[   6.7 ] Download completed successfully
[   6.7 ] Finalizing backup
Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 444, in download_disk
    client.download(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 150, in download
    with _open_http(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 514, in _open_http
    return http.open(url, mode, cafile=cafile, secure=secure)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 48, in open
    return Backend(url, **options)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 76, in __init__
    self._connect()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 119, in _connect
    options = self._options()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 482, in _options
    raise self._reraise(res.status, body)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 585, in _reraise
    raise http.Error(status, msg)
ovirt_imageio._internal.http.Error: Server failed to perform the request, check logs
---
iteration 2
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup 'be2c34f0-46bd-427d-9227-43902636f7b2' is ready
[   1.2 ] Created checkpoint 'b0be8af4-59e6-4a26-b7b4-3d54978a1cdb' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.3 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   3.1 ] Image transfer 'f87e12e7-fe19-4130-be2e-08b20ba1c878' is ready
[   3.2 ] Finalizing image transfer0 bytes/s                                   
[   6.2 ] Download completed successfully
[   6.2 ] Finalizing backup
Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 444, in download_disk
    client.download(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 150, in download
    with _open_http(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 514, in _open_http
    return http.open(url, mode, cafile=cafile, secure=secure)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 48, in open
    return Backend(url, **options)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 76, in __init__
    self._connect()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 119, in _connect
    options = self._options()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 482, in _options
    raise self._reraise(res.status, body)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 585, in _reraise
    raise http.Error(status, msg)
ovirt_imageio._internal.http.Error: Server failed to perform the request, check logs
---
iteration 3
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup 'e07bde95-31c8-4cde-823f-3e473e076df6' is ready
[   1.2 ] Created checkpoint '8172c36d-8e2f-49d3-a6d9-dfefcda65141' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.3 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   3.1 ] Image transfer '73e3088a-5e26-4513-b940-ccd06461a2fb' is ready
[   3.1 ] Finalizing image transfer0 bytes/s                                   
[   6.2 ] Download completed successfully
[   6.2 ] Finalizing backup
Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 444, in download_disk
    client.download(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 150, in download
    with _open_http(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 514, in _open_http
    return http.open(url, mode, cafile=cafile, secure=secure)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 48, in open
    return Backend(url, **options)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 76, in __init__
    self._connect()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 119, in _connect
    options = self._options()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 482, in _options
    raise self._reraise(res.status, body)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 585, in _reraise
    raise http.Error(status, msg)
ovirt_imageio._internal.http.Error: Server failed to perform the request, check logs
---
iteration 4
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup '84de46ed-4917-474d-897d-77ed56030084' is ready
[   1.2 ] Created checkpoint 'f51a45e3-10f6-4e99-bd25-9833c1256a8b' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.3 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   3.1 ] Image transfer 'dda45081-f8dd-477c-ab78-e49ed7f237e5' is ready
[   3.1 ] Finalizing image transfer0 bytes/s                                   
[   5.2 ] Download completed successfully
[   5.2 ] Finalizing backup
Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 444, in download_disk
    client.download(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 150, in download
    with _open_http(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 514, in _open_http
    return http.open(url, mode, cafile=cafile, secure=secure)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 48, in open
    return Backend(url, **options)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 76, in __init__
    self._connect()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 119, in _connect
    options = self._options()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 482, in _options
    raise self._reraise(res.status, body)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 585, in _reraise
    raise http.Error(status, msg)
ovirt_imageio._internal.http.Error: Server failed to perform the request, check logs
---
iteration 5
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup 'd438aafb-d57d-4027-87db-cf33f0817dd3' is ready
[   1.2 ] Created checkpoint '86911693-4156-4452-aa55-4acb1a90be23' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.3 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   2.9 ] Image transfer '0ae2fe92-6b95-4631-8b02-4d7e549eb4e5' is ready
[   2.9 ] Finalizing image transfer0 bytes/s                                   
[   5.9 ] Download completed successfully
[   5.9 ] Finalizing backup
Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 444, in download_disk
    client.download(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 150, in download
    with _open_http(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 514, in _open_http
    return http.open(url, mode, cafile=cafile, secure=secure)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 48, in open
    return Backend(url, **options)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 76, in __init__
    self._connect()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 119, in _connect
    options = self._options()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 482, in _options
    raise self._reraise(res.status, body)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 585, in _reraise
    raise http.Error(status, msg)
ovirt_imageio._internal.http.Error: Server failed to perform the request, check logs
---
iteration 6
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup 'ba327596-5c27-48da-beb8-f2c2e51db0ad' is ready
[   1.2 ] Created checkpoint '3672cb60-edad-4794-b38e-50ca3473d40d' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.3 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   3.1 ] Image transfer '6375c4b6-7fd9-405e-990c-978e88de1bb1' is ready
[   3.1 ] Finalizing image transfer0 bytes/s                                   
[   6.1 ] Download completed successfully
[   6.1 ] Finalizing backup
Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 444, in download_disk
    client.download(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 150, in download
    with _open_http(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 514, in _open_http
    return http.open(url, mode, cafile=cafile, secure=secure)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 48, in open
    return Backend(url, **options)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 76, in __init__
    self._connect()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 119, in _connect
    options = self._options()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 482, in _options
    raise self._reraise(res.status, body)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 585, in _reraise
    raise http.Error(status, msg)
ovirt_imageio._internal.http.Error: Server failed to perform the request, check logs
---
iteration 7
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup 'ff7d4150-561b-4cf3-b5fc-a8ff98f256fe' is ready
[   2.2 ] Created checkpoint '88179ce2-093e-48e6-b44a-e71e04b7f32d' (to use in --from-checkpoint-uuid for the next incremental backup)
[   2.3 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   3.6 ] Image transfer '1e1cd2cb-b15e-4e66-a9f0-04bc99e3a50b' is ready
[ 100.00% ] 6.00 GiB, 4.75 seconds, 1.26 GiB/s                                 
[   8.4 ] Finalizing image transfer
[  18.6 ] Download completed successfully
[  18.6 ] Downloading full backup for disk '74d8e801-f555-494f-a788-a7e64c8e7606'
[  20.2 ] Image transfer 'df782eb3-799d-40c5-aa95-20f0791f6bcd' is ready
[ 100.00% ] 10.00 GiB, 0.07 seconds, 135.31 GiB/s                              
[  20.2 ] Finalizing image transfer
[  22.3 ] Download completed successfully
[  22.3 ] Finalizing backup
[  31.5 ] Full backup 'ff7d4150-561b-4cf3-b5fc-a8ff98f256fe' completed successfully
---
iteration 8
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup 'b965e31e-7f7a-4ba5-8173-00749b2f5f19' is ready
[   1.2 ] Created checkpoint 'cd2d2948-20b7-4a6a-96ab-0feab49148fb' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.2 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   2.8 ] Image transfer '76d82aad-60af-4f41-85f5-3a98809cfcf0' is ready
[   2.8 ] Finalizing image transfer0 bytes/s                                   
[   4.9 ] Download completed successfully
[   4.9 ] Finalizing backup
Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 444, in download_disk
    client.download(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 150, in download
    with _open_http(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 514, in _open_http
    return http.open(url, mode, cafile=cafile, secure=secure)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 48, in open
    return Backend(url, **options)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 76, in __init__
    self._connect()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 119, in _connect
    options = self._options()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 482, in _options
    raise self._reraise(res.status, body)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 585, in _reraise
    raise http.Error(status, msg)
ovirt_imageio._internal.http.Error: Server failed to perform the request, check logs
---
iteration 9
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup '9a072ad5-c1dc-4750-a33a-49f6e5889a49' is ready
[   1.2 ] Created checkpoint '6e90df94-9a7d-4050-b4a7-0cb0801b13df' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.3 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   2.9 ] Image transfer 'a59b4603-9870-43a5-a621-152bb66c5a16' is ready
[   2.9 ] Finalizing image transfer0 bytes/s                                   
[   5.9 ] Download completed successfully
[   5.9 ] Finalizing backup
Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 444, in download_disk
    client.download(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 150, in download
    with _open_http(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 514, in _open_http
    return http.open(url, mode, cafile=cafile, secure=secure)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 48, in open
    return Backend(url, **options)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 76, in __init__
    self._connect()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 119, in _connect
    options = self._options()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 482, in _options
    raise self._reraise(res.status, body)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 585, in _reraise
    raise http.Error(status, msg)
ovirt_imageio._internal.http.Error: Server failed to perform the request, check logs
---
iteration 10
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup '9ff432bc-59cb-4ac1-b73e-e2d0028ff4ad' is ready
[   1.2 ] Created checkpoint '713ea2f3-4b1b-4981-adf8-01489aa4925b' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.2 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   2.9 ] Image transfer '88b473b0-19f0-4b5c-af3d-9eadc9db64de' is ready
[   3.0 ] Finalizing image transfer0 bytes/s                                   
[   6.0 ] Download completed successfully
[   6.0 ] Finalizing backup
Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 444, in download_disk
    client.download(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 150, in download
    with _open_http(
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/client/_api.py", line 514, in _open_http
    return http.open(url, mode, cafile=cafile, secure=secure)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 48, in open
    return Backend(url, **options)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 76, in __init__
    self._connect()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 119, in _connect
    options = self._options()
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 482, in _options
    raise self._reraise(res.status, body)
  File "/usr/lib64/python3.10/site-packages/ovirt_imageio/_internal/backends/http.py", line 585, in _reraise
    raise http.Error(status, msg)
ovirt_imageio._internal.http.Error: Server failed to perform the request, check logs
---

Comment 5 Nir Soffer 2021-11-29 10:37:35 UTC
Example reproduction with incremental backup.

First do full backup - I was lucky and it succeeded on the first attempt:

$ ./backup_vm.py -c engine-dev full --backup-dir /data/scratch/backups 6e95d38f-d9b8-4955-878c-da6d631d0ab2
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup '5cf4f02c-2eaf-4e2b-aa91-c90b6431e592' is ready
[   2.3 ] Created checkpoint 'e802b0b4-1765-4f55-8119-3d79399b5f2d' (to use in --from-checkpoint-uuid for the next incremental backup)
[   2.3 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   3.8 ] Image transfer '28ac3eb2-e5f1-4e72-9ed3-f7e96268b6ae' is ready
[ 100.00% ] 6.00 GiB, 4.37 seconds, 1.37 GiB/s                                 
[   8.2 ] Finalizing image transfer
[  10.3 ] Download completed successfully
[  10.3 ] Downloading full backup for disk '74d8e801-f555-494f-a788-a7e64c8e7606'
[  11.8 ] Image transfer 'e81ad73c-6758-4883-bc82-9d3bd2a76aa4' is ready
[ 100.00% ] 10.00 GiB, 0.07 seconds, 141.55 GiB/s                              
[  11.9 ] Finalizing image transfer
[  14.9 ] Download completed successfully
[  14.9 ] Finalizing backup
[  21.1 ] Full backup '5cf4f02c-2eaf-4e2b-aa91-c90b6431e592' completed successfully

Now try incremental backup in a loop, all using checkpoint e802b0b4-1765-4f55-8119-3d79399b5f2d.

First attempt succeeded, second attempt timed out after 60 seconds,
third attempt blocked forever waiting for backup completion.

$ for i in $(seq 10); do echo iteration $i; ./backup_vm.py -c engine-dev incremental --from-checkpoint-uuid e802b0b4-1765-4f55-8119-3d79399b5f2d --backup-dir /data/scratch/backups 6e95d38f-d9b8-4955-878c-da6d631d0ab2; echo "---"; done
iteration 1
[   0.0 ] Starting incremental backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup 'da90ef42-4bbc-4fe6-8b6f-feb9938854bc' is ready
[   1.2 ] Created checkpoint '964c2cb0-5f47-4ba1-b4c6-929db015ed85' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.3 ] Downloading incremental backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   2.9 ] Image transfer '42038e4a-0b7b-4a42-8a66-7fe4741acf13' is ready
[ 100.00% ] 6.00 GiB, 0.08 seconds, 71.81 GiB/s                                
[   3.0 ] Finalizing image transfer
[   5.0 ] Download completed successfully
[   5.0 ] Downloading incremental backup for disk '74d8e801-f555-494f-a788-a7e64c8e7606'
[   6.5 ] Image transfer '685dafe2-a8f4-4851-9a94-a82b42836bdc' is ready
[ 100.00% ] 10.00 GiB, 0.07 seconds, 152.78 GiB/s                              
[   6.6 ] Finalizing image transfer
[   9.6 ] Download completed successfully
[   9.6 ] Finalizing backup
[  12.8 ] Incremental backup 'da90ef42-4bbc-4fe6-8b6f-feb9938854bc' completed successfully
---
iteration 2
[   0.0 ] Starting incremental backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup 'd21d6285-fca6-4b3d-bc23-bc7367a802c0' is ready
[   1.2 ] Created checkpoint 'b91f3a63-7457-4349-8e13-845e9da2347a' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.2 ] Downloading incremental backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[  62.6 ] Finalizing backup
Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 225, in cmd_incremental
    download_backup(connection, backup.id, args, incremental=True)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 420, in download_disk
    transfer = imagetransfer.create_transfer(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 240, in create_transfer
    raise RuntimeError(
RuntimeError: Timed out waiting for transfer 411acaea-dc6b-4ff9-904c-e3e47e6d9394
---
iteration 3
[   0.0 ] Starting incremental backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.2 ] Waiting until backup 'fd31a96d-7df0-4674-9bee-1ab7d4120a02' is ready
[   1.2 ] Created checkpoint 'a428b068-766d-4e0d-8671-22b59ab79eb0' (to use in --from-checkpoint-uuid for the next incremental backup)
[   1.2 ] Downloading incremental backup for disk '74d8e801-f555-494f-a788-a7e64c8e7606'
[  62.4 ] Finalizing backup

(I interrupted the script after several minutes)

[  62.4 ] Finalizing backup
^CTraceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 225, in cmd_incremental
    download_backup(connection, backup.id, args, incremental=True)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 420, in download_disk
    transfer = imagetransfer.create_transfer(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 240, in create_transfer
    raise RuntimeError(
RuntimeError: Timed out waiting for transfer 08864ab4-9745-450a-b5ca-98f107c43de1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 228, in cmd_incremental
    stop_backup(connection, backup.id, args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 369, in stop_backup
    time.sleep(1)
KeyboardInterrupt


Checking vms backups:

https://engine-dev/ovirt-engine/api/vms/6e95d38f-d9b8-4955-878c-da6d631d0ab2/backups

<backups>
<backup href="/ovirt-engine/api/vms/6e95d38f-d9b8-4955-878c-da6d631d0ab2/backups/fd31a96d-7df0-4674-9bee-1ab7d4120a02" id="fd31a96d-7df0-4674-9bee-1ab7d4120a02">
<actions>
<link href="/ovirt-engine/api/vms/6e95d38f-d9b8-4955-878c-da6d631d0ab2/backups/fd31a96d-7df0-4674-9bee-1ab7d4120a02/finalize" rel="finalize"/>
</actions>
<link href="/ovirt-engine/api/vms/6e95d38f-d9b8-4955-878c-da6d631d0ab2/backups/fd31a96d-7df0-4674-9bee-1ab7d4120a02/disks" rel="disks"/>
<creation_date>2021-11-29T12:27:59.893+02:00</creation_date>
<from_checkpoint_id>e802b0b4-1765-4f55-8119-3d79399b5f2d</from_checkpoint_id>
<modification_date>2021-11-29T12:29:02.228+02:00</modification_date>
<phase>finalizing</phase>
<to_checkpoint_id>a428b068-766d-4e0d-8671-22b59ab79eb0</to_checkpoint_id>
<host href="/ovirt-engine/api/hosts/00000000-0000-0000-0000-000000000000" id="00000000-0000-0000-0000-000000000000"/>
<vm href="/ovirt-engine/api/vms/6e95d38f-d9b8-4955-878c-da6d631d0ab2" id="6e95d38f-d9b8-4955-878c-da6d631d0ab2"/>
</backup>
</backups>

Backup was stuck in finalizing state.

Pavel, isn't this fixed in engine 4.5.0-0.0.master.20211118145311.el8?

Comment 6 Nir Soffer 2021-11-29 11:01:35 UTC
Testing incremental backup again with latest engine (built yesterday):

$ rpm -q ovirt-engine
ovirt-engine-4.5.0-0.2.master.20211128100822.el8.noarch

$ for i in $(seq 10); do echo iteration $i; ./backup_vm.py -c engine-dev incremental --from-checkpoint-uuid e802b0b4-1765-4f55-8119-3d79399b5f2d --backup-dir /data/scratch/backups 6e95d38f-d9b8-4955-878c-da6d631d0ab2; echo "---"; done
iteration 1
[   0.0 ] Starting incremental backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.6 ] Waiting until backup 'e0b2608b-7652-4fec-b8f2-10e82c545d80' is ready
[   2.7 ] Created checkpoint 'cebe36f1-1aed-4440-9eff-fa54fda14e00' (to use in --from-checkpoint-uuid for the next incremental backup)
[   2.8 ] Downloading incremental backup for disk '74d8e801-f555-494f-a788-a7e64c8e7606'
[   4.7 ] Image transfer '6fa5ea1f-67ee-47be-b63a-55f30774a050' is ready
[ 100.00% ] 10.00 GiB, 0.07 seconds, 144.69 GiB/s                              
[   4.8 ] Finalizing image transfer
[   7.8 ] Download completed successfully
[   7.9 ] Downloading incremental backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   9.5 ] Image transfer '36d590c7-60f2-460f-bdfb-1e9d4e16b361' is ready
[ 100.00% ] 6.00 GiB, 0.08 seconds, 78.79 GiB/s                                
[   9.6 ] Finalizing image transfer
[  12.6 ] Download completed successfully
[  12.6 ] Incremental backup not available for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[  12.6 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[  12.7 ] Finalizing backup

(Interrupt script after few minutes)

^CTraceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 225, in cmd_incremental
    download_backup(connection, backup.id, args, incremental=True)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 405, in download_backup
    download_disk(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 420, in download_disk
    transfer = imagetransfer.create_transfer(
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 200, in create_transfer
    transfer = transfers_service.add(transfer)
  File "/home/nsoffer/.local/lib/python3.10/site-packages/ovirt_engine_sdk_python-4.4.15-py3.10-linux-x86_64.egg/ovirtsdk4/services.py", line 14153, in add
    return self._internal_add(image_transfer, headers, query, wait)
  File "/home/nsoffer/.local/lib/python3.10/site-packages/ovirt_engine_sdk_python-4.4.15-py3.10-linux-x86_64.egg/ovirtsdk4/service.py", line 232, in _internal_add
    return future.wait() if wait else future
  File "/home/nsoffer/.local/lib/python3.10/site-packages/ovirt_engine_sdk_python-4.4.15-py3.10-linux-x86_64.egg/ovirtsdk4/service.py", line 55, in wait
    return self._code(response)
  File "/home/nsoffer/.local/lib/python3.10/site-packages/ovirt_engine_sdk_python-4.4.15-py3.10-linux-x86_64.egg/ovirtsdk4/service.py", line 229, in callback
    self._check_fault(response)
  File "/home/nsoffer/.local/lib/python3.10/site-packages/ovirt_engine_sdk_python-4.4.15-py3.10-linux-x86_64.egg/ovirtsdk4/service.py", line 132, in _check_fault
    self._raise_error(response, body)
  File "/home/nsoffer/.local/lib/python3.10/site-packages/ovirt_engine_sdk_python-4.4.15-py3.10-linux-x86_64.egg/ovirtsdk4/service.py", line 118, in _raise_error
    raise error
ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot transfer Virtual Disk. The VM backup is not in READY phase, backup phase is FAILED.]". HTTP response code is 409.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 525, in <module>
    main()
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 191, in main
    args.command(args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 228, in cmd_incremental
    stop_backup(connection, backup.id, args)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 370, in stop_backup
    backup = get_backup(connection, backup_service, backup_uuid)
  File "/home/nsoffer/src/python-ovirt-engine-sdk4/examples/./backup_vm.py", line 510, in get_backup
    backup = backup_service.get()
  File "/home/nsoffer/.local/lib/python3.10/site-packages/ovirt_engine_sdk_python-4.4.15-py3.10-linux-x86_64.egg/ovirtsdk4/services.py", line 33917, in get
    return self._internal_get(headers, query, wait)
  File "/home/nsoffer/.local/lib/python3.10/site-packages/ovirt_engine_sdk_python-4.4.15-py3.10-linux-x86_64.egg/ovirtsdk4/service.py", line 211, in _internal_get
    return future.wait() if wait else future
  File "/home/nsoffer/.local/lib/python3.10/site-packages/ovirt_engine_sdk_python-4.4.15-py3.10-linux-x86_64.egg/ovirtsdk4/service.py", line 54, in wait
    response = self._connection.wait(self._context)
  File "/home/nsoffer/.local/lib/python3.10/site-packages/ovirt_engine_sdk_python-4.4.15-py3.10-linux-x86_64.egg/ovirtsdk4/__init__.py", line 496, in wait
    return self.__wait(context, failed_auth)
  File "/home/nsoffer/.local/lib/python3.10/site-packages/ovirt_engine_sdk_python-4.4.15-py3.10-linux-x86_64.egg/ovirtsdk4/__init__.py", line 532, in __wait
    self._multi.select(1.0)
KeyboardInterrupt


Looks like the backup_vm.py detected a failed backup:

ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot transfer Virtual Disk. The VM backup is not in READY phase, backup phase is FAILED.]". HTTP response code is 409.

And tried to finalize the backup, but the backup is stuck in finalizing 
phase again:

https://engine-dev/ovirt-engine/api/vms/6e95d38f-d9b8-4955-878c-da6d631d0ab2/backups

<backups>
<backup href="/ovirt-engine/api/vms/6e95d38f-d9b8-4955-878c-da6d631d0ab2/backups/e0b2608b-7652-4fec-b8f2-10e82c545d80" id="e0b2608b-7652-4fec-b8f2-10e82c545d80">
<actions>
<link href="/ovirt-engine/api/vms/6e95d38f-d9b8-4955-878c-da6d631d0ab2/backups/e0b2608b-7652-4fec-b8f2-10e82c545d80/finalize" rel="finalize"/>
</actions>
<link href="/ovirt-engine/api/vms/6e95d38f-d9b8-4955-878c-da6d631d0ab2/backups/e0b2608b-7652-4fec-b8f2-10e82c545d80/disks" rel="disks"/>
<creation_date>2021-11-29T12:54:25.777+02:00</creation_date>
<from_checkpoint_id>e802b0b4-1765-4f55-8119-3d79399b5f2d</from_checkpoint_id>
<modification_date>2021-11-29T12:54:38.019+02:00</modification_date>
<phase>finalizing</phase>
<to_checkpoint_id>cebe36f1-1aed-4440-9eff-fa54fda14e00</to_checkpoint_id>
<host href="/ovirt-engine/api/hosts/00000000-0000-0000-0000-000000000000" id="00000000-0000-0000-0000-000000000000"/>
<vm href="/ovirt-engine/api/vms/6e95d38f-d9b8-4955-878c-da6d631d0ab2" id="6e95d38f-d9b8-4955-878c-da6d631d0ab2"/>
</backup>
</backups>

Looks like the bug when backup is stuck in finalizing state is not fixed
yet in 4.5.

Comment 7 Nir Soffer 2021-11-29 13:27:58 UTC
Notes for testing on 4.4.9/10:

1. Test with single host - make it much more reproducible.

2. Test with the standard environment (3 hosts) - closer to real
   system. This will tell how this failure is likely for users.

Comment 8 Eyal Shenitzky 2021-11-29 14:21:49 UTC
> Looks like the bug when backup is stuck in finalizing state is not fixed
> yet in 4.5.

Right, and it is already reported by bug 1985746 and it is not fixed yet.

Amit, can you please check if this bug was also found on 4.4.9/4.4.10?

Comment 9 Eyal Shenitzky 2021-11-29 14:30:53 UTC
Nir, what is the severity of this bug? why it is urgent?

Comment 10 Amit Sharir 2021-11-29 14:49:55 UTC
Following our meeting today, I reproduced the issue on 
vdsm-4.40.100.1-1.el8ev.x86_64 / ovirt-engine-4.4.10-0.17.el8ev.noarch

I created a VM with 2 disks:
- OS qcow2 disk using a template
- Empty 10g qcow2 data disk

For the "full backup" procedure with 3 hosts I got: 3/10 failures.
 
2/3 failures were:
Error message- 
Traceback (most recent call last):
  File "./backup_vm.py", line 525, in <module>
    main()
  File "./backup_vm.py", line 191, in main
    args.command(args)
  File "./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "./backup_vm.py", line 407, in download_backup
    incremental=has_incremental)
  File "./backup_vm.py", line 425, in download_disk
    timeout_policy=types.ImageTransferTimeoutPolicy(args.timeout_policy))
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 241, in create_transfer
    "Timed out waiting for transfer {}".format(transfer.id))
RuntimeError: Timed out waiting for transfer c9945bd1-0d4b-4c07-96dc-b655225ae515

1/3 failures I got:

backup stuck in finalizing state like mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1985746

Comment 11 Amit Sharir 2021-11-29 15:28:47 UTC
I did the same flow mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2027260#c10 with less active hosts (full backup, 2 hosts, for a VM with 2 disks):
The issue didn't reproduce on the older version:
vdsm-4.40.90.4-1.el8ev.x86_64/ovirt-engine-4.4.9.5-0.1.el8ev.noarch


Please give me a "need info" if any additional information/reproductions are required from QE.

Comment 12 Amit Sharir 2021-11-30 11:07:18 UTC
When using only 1 active host it was easier to reproduce the issue.
Tested on vdsm-4.40.100.1-1.el8ev.x86_64 / ovirt-engine-4.4.10-0.17.el8ev.noarch

I created a VM with 2 disks:
- OS qcow2 disk using a template
- Empty 10g qcow2 data disk

For the "full backup" procedure I got: 4/6 failures (in the 6th out of 10 iterations of the script, the backup was stuck in finalizing).

Example of error messages I received: 


Traceback (most recent call last):
  File "./backup_vm.py", line 525, in <module>
    main()
  File "./backup_vm.py", line 191, in main
    args.command(args)
  File "./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "./backup_vm.py", line 407, in download_backup
    incremental=has_incremental)
  File "./backup_vm.py", line 425, in download_disk
    timeout_policy=types.ImageTransferTimeoutPolicy(args.timeout_policy))
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 241, in create_transfer
    "Timed out waiting for transfer {}".format(transfer.id))
RuntimeError: Timed out waiting for transfer bf600dd1-8675-4164-a6ac-64ad106c4747

---

Traceback (most recent call last):
  File "./backup_vm.py", line 525, in <module>
    main()
  File "./backup_vm.py", line 191, in main
    args.command(args)
  File "./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "./backup_vm.py", line 407, in download_backup
    incremental=has_incremental)
  File "./backup_vm.py", line 452, in download_disk
    **extra_args)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/client/_api.py", line 155, in download
    proxy_url=proxy_url) as src:
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/client/_api.py", line 514, in _open_http
    return http.open(url, mode, cafile=cafile, secure=secure)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", line 48, in open
    return Backend(url, **options)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", line 76, in __init__
    self._connect()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", line 119, in _connect
    options = self._options()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", line 482, in _options
    raise self._reraise(res.status, body)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", line 585, in _reraise
    raise http.Error(status, msg)
ovirt_imageio._internal.http.Error: Server failed to perform the request, check logs
---

Comment 13 Nir Soffer 2021-11-30 13:46:40 UTC
(In reply to Amit Sharir from comment #12)
> When using only 1 active host it was easier to reproduce the issue.
> Tested on vdsm-4.40.100.1-1.el8ev.x86_64 /
> ovirt-engine-4.4.10-0.17.el8ev.noarch
...
> For the "full backup" procedure I got: 4/6 failures (in the 6th out of 10
> iterations of the script, the backup was stuck in finalizing).

So we tried 6 times, 1 succeeded, 4 failed, 1 stuck forever.

So this is not a 4.5 regression.

Comment 14 Nir Soffer 2021-11-30 13:55:47 UTC
(In reply to Eyal Shenitzky from comment #9)
> Nir, what is the severity of this bug? why it is urgent?

From development point of view, this is major issue, doing something
which is not safe:
- modifying an image during backup
- running operations that activate/deactivate or lock/unlock an image
  at the same time

From user point of view, this makes cold backup unreliable, it will fail
randomly. This is very bad for RHV and for Red Hat; we should not 
release code that fails randomly.

However, this bug is less horrible than the bug when backup is stuck
in finalizing state - in that case you cannot use the system since the
VM is locked.

If we look on the full half of the glass, backup is not widely used 
yet, and this bug affect only cold backup which is not the main flow.

Comment 15 Amit Sharir 2021-11-30 18:19:24 UTC
Reproduced this issue again with only 1 active host.
Tested on vdsm-4.40.100.1-1.el8ev.x86_64 / ovirt-engine-4.4.10-0.17.el8ev.noarch

I created a VM with 2 disks:
- OS qcow2 disk using a template
- Empty 10g qcow2 data disk

For the "full backup" procedure I got: 3/9 failures (in the 9th out of 10 iterations of the script, the backup was stuck in finalizing).

Example of error messages I received: 

Traceback (most recent call last):
  File "./backup_vm.py", line 525, in <module>
    main()
  File "./backup_vm.py", line 191, in main
    args.command(args)
  File "./backup_vm.py", line 207, in cmd_full
    download_backup(connection, backup.id, args)
  File "./backup_vm.py", line 407, in download_backup
    incremental=has_incremental)
  File "./backup_vm.py", line 425, in download_disk
    timeout_policy=types.ImageTransferTimeoutPolicy(args.timeout_policy))
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/helpers/imagetransfer.py", line 241, in create_transfer
    "Timed out waiting for transfer {}".format(transfer.id))
RuntimeError: Timed out waiting for transfer 8008b4c6-0840-413d-b932-9e8775a120d8


Attaching ovirt-imageio/daemon.log, vdsm.log and engine logs in following comments.

Comment 21 Amit Sharir 2021-11-30 22:29:27 UTC
Hi Benny, do my reproduction's look like a bitmap-related issue (backup is ready before the add_bitmap jobs are complete)?
Please note that we have a finalization issue from another bug (https://bugzilla.redhat.com/show_bug.cgi?id=1985746).

In other words, I want to be sure that what I reproduced in the comments above is a result of the new issue mentioned in this bug and not something else. 


Thanks,

Comment 22 Benny Zlotnik 2021-12-02 14:39:21 UTC
Looks the same 

2021-11-30 20:02:52,222+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.AddVolumeBitmapVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [387aad7f-ce02-419f-933e-a41a2d8dcfc0] START, AddVolumeBitmapVDSCommand(HostName = host_mixed_1, VolumeBitmapVDSCommandParameters:{hostId='19738a5d-76d7-4eb9-b702-1c4c6c00c1da', storageDomainId='826ae9c9-4b07-4358-af82-3f76598c2dd2', jobId='f8a2c5d2-d85a-461c-a4f7-873008716632'}), log id: 158354ca
2021-11-30 20:02:52,226+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.AddVolumeBitmapVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [387aad7f-ce02-419f-933e-a41a2d8dcfc0] FINISH, AddVolumeBitmapVDSCommand, return: , log id: 158354ca
2021-11-30 20:02:52,242+02 INFO  [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [387aad7f-ce02-419f-933e-a41a2d8dcfc0] Ready to start image transfers
2021-11-30 20:02:52,499+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-14) [b0a9a961-5ed3-4dcc-8a7d-0e89fccf1bc5] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='[]'}'
2021-11-30 20:02:52,505+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-14) [b0a9a961-5ed3-4dcc-8a7d-0e89fccf1bc5] Running command: TransferDiskImageCommand internal: false. Entities affected :  ID: 627dc8c0-a6c3-4ac7-836f-ec15592d4883 Type: DiskAction group EDIT_DISK_PROPERTIES with role type USER
2021-11-30 20:02:52,505+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-14) [b0a9a961-5ed3-4dcc-8a7d-0e89fccf1bc5] Creating ImageTransfer entity for command '8008b4c6-0840-413d-b932-9e8775a120d8', proxyEnabled: true
2021-11-30 20:02:52,515+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-14) [b0a9a961-5ed3-4dcc-8a7d-0e89fccf1bc5] Successfully added Download disk 'test1949_Disk1' (disk id: '627dc8c0-a6c3-4ac7-836f-ec15592d4883', image id: '79e48285-a7fd-48cc-81f8-2a6d76e02be3') for image transfer command '8008b4c6-0840-413d-b932-9e8775a120d8'
2021-11-30 20:02:52,527+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (default task-14) [b0a9a961-5ed3-4dcc-8a7d-0e89fccf1bc5] START, PrepareImageVDSCommand(HostName = host_mixed_1, PrepareImageVDSCommandParameters:{hostId='19738a5d-76d7-4eb9-b702-1c4c6c00c1da'}), log id: 54e36abc
2021-11-30 20:02:53,020+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (default task-14) [b0a9a961-5ed3-4dcc-8a7d-0e89fccf1bc5] Failed in 'PrepareImageVDS' method, for vds: 'host_mixed_1'; host: 'storage-ge11-vdsm1.lab.eng.tlv2.redhat.com': null
2021-11-30 20:02:53,020+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (default task-14) [b0a9a961-5ed3-4dcc-8a7d-0e89fccf1bc5] Command 'PrepareImageVDSCommand(HostName = host_mixed_1, PrepareImageVDSCommandParameters:{hostId='19738a5d-76d7-4eb9-b702-1c4c6c00c1da'})' execution failed: null
2021-11-30 20:02:53,020+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.PrepareImageVDSCommand] (default task-14) [b0a9a961-5ed3-4dcc-8a7d-0e89fccf1bc5] FINISH, PrepareImageVDSCommand, return: , log id: 54e36abc
2021-11-30 20:02:53,021+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-14) [b0a9a961-5ed3-4dcc-8a7d-0e89fccf1bc5] Failed to prepare image for transfer session: EngineException: java.lang.NullPointerException (Failed with error ENGINE and code 5001): org.ovirt.engine.core.common.errors.EngineException: EngineException: java.lang.NullPointerException (Failed with error ENGINE and code 5001)

Comment 24 Amit Sharir 2021-12-29 12:51:40 UTC
Version:
vdsm-4.40.100.2-1.el8ev.x86_64
ovirt-engine-4.4.10.2-0.1.el8ev.noarch

Verification flow:
I Used both the full and the incremental backup flow mentioned in comments 15 / 12.
Also tested this with only 1 vdsm active (I put the others into maintenance mode). 

Verification Conclusions:
The expected output matched the actual output.
I checked the logs and didn't find any issues regarding the "AddVolumeBitmapVDSCommand".


Bug verified.

Comment 25 Sandro Bonazzola 2022-01-19 07:00:13 UTC
This bugzilla is included in oVirt 4.4.10 release, published on January 18th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.4.10 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 26 Red Hat Bugzilla 2023-09-15 01:17:38 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days