2015121 – VM with thin disk on block storage pause during backup

Bug 2015121 - VM with thin disk on block storage pause during backup

Summary: VM with thin disk on block storage pause during backup

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	General
Sub Component:
Version:	4.50
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.4.9
Target Release:	4.40.90.3
Assignee:	Nir Soffer
QA Contact:	sshmulev
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-10-18 12:51 UTC by Nir Soffer
Modified:	2022-02-09 23:46 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2021-10-25 06:28:54 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	sbonazzo: ovirt-4.4+ lsvaty: exception+

Attachments	(Terms of Use)
vdsm log (66.73 KB, application/x-xz) 2021-10-18 12:51 UTC, Nir Soffer	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHV-43827	None	None	None	2021-10-18 13:18:40 UTC
Red Hat Issue Tracker	RHV-43858	None	None	None	2021-10-25 06:30:24 UTC
Red Hat Knowledge Base (Solution)	6720361	None	None	None	2022-02-09 23:46:38 UTC
oVirt gerrit	117178	master	MERGED	drivemonitor: Fix extend during backup	2021-10-18 19:56:52 UTC
oVirt gerrit	117196	ovirt-4.4.z	MERGED	drivemonitor: Fix extend during backup	2021-10-19 07:18:13 UTC

Description Nir Soffer 2021-10-18 12:51:15 UTC

Created attachment 1834217 [details]
vdsm log

Description of problem:

If thin disk on block storage becomes full during backup, the VM will pause and
the disk will not be extended unitl backup is finished.

If the VM is HA VM and backup takes lot of time, the VM may be killed on the
host by the code killing paused VMs, aborting the backup.

Version-Release number of selected component (if applicable):
vdsm-4.50.0.1

How reproducible:
Always - if you can manage to write data fast enough during a backup.
During real backup, unknown, it is hard to time the guest writes and
the backup.

Steps to Reproduce:
1. Run a VM with empty 2g thin disk on block storage
2. Start backup (do not download backup yet)
3. In the guest, try to write 2g to the empty data disk

The VM will pause

4. Download backup disks
5. Finalize the backup

The VM will resume.

Actual results:
VM pause during the backup, does not resume until backup is stopped.
Vdsm does try to extend the disk during the backup.

Expected results:
Thin disk extended during backup without pausing the VM, or with short
pauses if the guest writes data faster then vdsm can extend the disk.

Additional info:

Here is detailed steps:

1. Start VM with 2 disks

disk 1 - os disk, specs not important.

disk 2:
    size: 2g
    format: qcow2
    storage domain: iSCSI
    allocation policy: thin
    uuid: 25178339-51f7-42dd-968e-a28b19a2d640

2. Start backup

$ ./backup_vm.py -c engine-dev start 6e95d38f-d9b8-4955-878c-da6d631d0ab2
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.3 ] Waiting until backup 'c9753476-fe7f-4e9d-a7dc-38186a7997cb' is ready
[  26.7 ] Created checkpoint 'cf36474c-22ae-4b89-b164-b7824ba7995a' (to use in --from-checkpoint-uuid for the next incremental backup)
[  26.8 ] Backup 'c9753476-fe7f-4e9d-a7dc-38186a7997cb' is ready

3. In the guest, write 2g to the data disk

# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0    6G  0 disk 
├─sda1   8:1    0    1M  0 part 
├─sda2   8:2    0    1G  0 part /boot
├─sda3   8:3    0  615M  0 part [SWAP]
└─sda4   8:4    0  4.4G  0 part /
sdb      8:16   0    2G  0 disk 
sr0     11:0    1 1024M  0 rom  

# realpath /dev/disk/by-id/*25178339-51f7-42dd-968e-a28b19a2d640
/dev/sdb

# dd if=/dev/zero bs=1M count=2048 of=/dev/sdb oflag=direct conv=fsync status=progress
882900992 bytes (883 MB, 842 MiB) copied, 4 s, 221 MB/s

VM paused.

In vdsm log we see:

3.a. Block threshold excceeded for drive sdb

2021-10-18 15:06:42,843+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') Block threshold 536870912 exceeded by 327680 for drive 'sdb[26]' (/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/25178339-51f7-42dd-968e-a28b19a2d640/b302faf4-ad02-4163-be25-c56043b52207) (drivemonitor:173)

3.b. VM was paused

2021-10-18 15:06:45,519+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') abnormal vm stop device  error enospc (vm:5193)
2021-10-18 15:06:45,519+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU stopped: onIOError (vm:6028)

3.c. No drives were extended

2021-10-18 15:06:45,541+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No VM drives were extended (vm:5200)

Vdsm does not try to extend the disk after this point.


4. Download backup for this vm

$ ./backup_vm.py -c engine-dev download --backup-uuid c9753476-fe7f-4e9d-a7dc-38186a7997cb 6e95d38f-d9b8-4955-878c-da6d631d0ab2
[   0.0 ] Downloading VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2' disks
[   0.1 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[   1.3 ] Image transfer 'fefd5712-57ae-47f6-953a-73efbccefac8' is ready
[ 100.00% ] 6.00 GiB, 4.74 seconds, 1.27 GiB/s
[   6.0 ] Finalizing image transfer
[   7.1 ] Download completed successfully
[   7.1 ] Downloading full backup for disk '25178339-51f7-42dd-968e-a28b19a2d640'
[   8.2 ] Image transfer '51f8321b-4e2b-483a-be66-4274a1b5b7d9' is ready
[ 100.00% ] 2.00 GiB, 0.11 seconds, 18.89 GiB/s
[   8.3 ] Finalizing image transfer
[  10.4 ] Download completed successfully
[  10.5 ] Finished downloading disks

Checking backups:

$ qemu-img info 242c082f-ca51-4cf4-82b9-fd3a3c92f38b.202110181508.full.qcow2
image: 242c082f-ca51-4cf4-82b9-fd3a3c92f38b.202110181508.full.qcow2
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 1.58 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

$ qemu-img info 25178339-51f7-42dd-968e-a28b19a2d640.202110181508.full.qcow2
image: 25178339-51f7-42dd-968e-a28b19a2d640.202110181508.full.qcow2
file format: qcow2
virtual size: 2 GiB (2147483648 bytes)
disk size: 196 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

Backup of data disk is correct, this disk contains no data
since the disk was empty when the backup was started.

5. Finalize the backup

$ ./backup_vm.py -c engine-dev stop 6e95d38f-d9b8-4955-878c-da6d631d0ab2 c9753476-fe7f-4e9d-a7dc-38186a7997cb
[   0.0 ] Finalizing backup 'c9753476-fe7f-4e9d-a7dc-38186a7997cb'
[   6.2 ] Backup 'c9753476-fe7f-4e9d-a7dc-38186a7997cb' completed successfully

VM resumes after backup was stopped.

In the guest, the dd command was completed:

# dd if=/dev/zero bs=1M count=2048 of=/dev/sdb oflag=direct conv=fsync status=progress
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 11 s, 193 MB/s
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 11.1553 s, 193 MB/s


In vdsm log we see:

Backup was stopped:

2021-10-18 15:11:06,724+0300 INFO  (jsonrpc/6) [api.virt] START stop_backup(backup_id='c9753476-fe7f-4e9d-a7dc-38186a7997cb') from=::ffff:192.168.122.11,48604, flow_id=3d72a830-95a6-4a57-9a60-4676939c9f0e, vmId=6e95d38f-d9b8-4955-878c-da6d631d0ab2 (api:48)
...
2021-10-18 15:11:06,730+0300 INFO  (jsonrpc/6) [api.virt] FINISH stop_backup return={'status': {'code': 0, 'message': 'Done'}} from=::ffff:192.168.122.11,48604, flow_id=3d72a830-95a6-4a57-9a60-4676939c9f0e, vmId=6e95d38f-d9b8-4955-878c-da6d631d0ab2 (api:54)

Soon after that, vdsm detect tha the disk needs extension:

2021-10-18 15:11:07,335+0300 DEBUG (periodic/1) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') Extension info for drive sdb volume b302faf4-ad02-4163-be25-c56043b52207: BlockInfo(capacity=2147483648, allocation=1074135040, physical=1073741824) (vm:1277)

And try to extend the disk:

2021-10-18 15:11:07,335+0300 INFO  (periodic/1) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') Requesting extension for volume b302faf4-ad02-4163-be25-c56043b52207 on domain feab3738-c158-4d48-8a41-b5a95c057a50 (apparent: 1073741824, capacity: 2147483648, allocated: 1074135040, physical: 1073741824 threshold_state: exceeded) (vm:1350)

The data disk was extended:

# lvs feab3738-c158-4d48-8a41-b5a95c057a50/b302faf4-ad02-4163-be25-c56043b52207
  LV                                   VG                                   Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  b302faf4-ad02-4163-be25-c56043b52207 feab3738-c158-4d48-8a41-b5a95c057a50 -wi-ao---- 2.25g


The root cause is that vdsm does not try to extend thin disks during backup.

Comment 1 Nir Soffer 2021-10-18 12:52:29 UTC

Suggesting for 4.4.9.

Comment 3 Nir Soffer 2021-10-18 13:45:14 UTC

In normal bakckup we start the backup and download the disk immediately. Trying
to reproduce this with normal backup flow:

1. Start a VM with 20g thin disk
2. In the VM, starts dd process, writing 20g to the think disk

# dd if=/dev/zero bs=1M count=20480 of=/dev/sdb oflag=direct conv=fsync status=progress

3. Perform full backup

$ ./backup_vm.py -c engine-dev full 6e95d38f-d9b8-4955-878c-da6d631d0ab2
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.3 ] Waiting until backup 'd9d862e4-c48a-4b30-ad59-4d6bf98adcfa' is ready
[  20.5 ] Created checkpoint '1a63fe50-31d3-4c60-bda2-f9b26f541e96' (to use in --from-checkpoint-uuid for the next incremental backup)
[  20.5 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[  21.6 ] Image transfer '95f28835-b224-4bd9-adaf-7f1f9f4c0fd7' is ready
[ 100.00% ] 6.00 GiB, 5.45 seconds, 1.10 GiB/s                                 
[  27.1 ] Finalizing image transfer
[  28.1 ] Download completed successfully
[  28.1 ] Downloading full backup for disk 'd6aee2d9-01a7-432b-978d-c8f61b5bbd21'
[  29.3 ] Image transfer '30bca4a2-62d1-4be5-82dc-31fd50f345d2' is ready
[ 100.00% ] 20.00 GiB, 15.14 seconds, 1.32 GiB/s                               
[  44.4 ] Finalizing image transfer
[  53.5 ] Download completed successfully
[  53.5 ] Finalizing backup
[  65.7 ] Full backup 'd9d862e4-c48a-4b30-ad59-4d6bf98adcfa' completed successfully

The VM paused after starting the backup, and resume when the backup was done.

Checking vdsm log:

Backup started:

2021-10-18 16:14:34,860+0300 INFO  (jsonrpc/1) [api.virt] START start_backup(config={'backup_id': 'd9d862e4-c48a-4b30-ad59-4d6bf98adcfa', 'disks': [{'checkpoint': True, 'scratch_disk': {'path': '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/fa3d4701-e435-4ff8-86b7-1a7d5ea4b29d/622c57d7-5b7c-4f4c-b5e1-d56596b81a53', 'type': 'block'}, 'imageID': '242c082f-ca51-4cf4-82b9-fd3a3c92f38b', 'volumeID': '44d498a1-54a5-4371-8eda-02d839d7c840', 'backup_mode': 'full', 'domainID': 'feab3738-c158-4d48-8a41-b5a95c057a50'}, {'checkpoint': True, 'scratch_disk': {'path': '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/ea177d1a-c720-4b34-bf38-1fbeb3601c16/84355ad9-1d51-4d49-82c9-c34f9275ec93', 'type': 'block'}, 'imageID': 'd6aee2d9-01a7-432b-978d-c8f61b5bbd21', 'volumeID': '5354890d-8a69-44db-ab5f-ef76271d8cdf', 'backup_mode': 'full', 'domainID': 'feab3738-c158-4d48-8a41-b5a95c057a50'}], 'from_checkpoint_id': None, 'parent_checkpoint_id': 'cf36474c-22ae-4b89-b164-b7824ba7995a', 'require_consistency': False, 'to_checkpoint_id': '1a63fe50-31d3-4c60-bda2-f9b26f541e96'}) from=::ffff:192.168.122.11,48604, flow_id=a5cfc7e5-385d-4d9b-bb1a-9750406156b4, vmId=6e95d38f-d9b8-4955-878c-da6d631d0ab2 (api:48)

2021-10-18 16:14:35,419+0300 DEBUG (jsonrpc/1) [api] FINISH start_backup response={'result': {'disks': {'242c082f-ca51-4cf4-82b9-fd3a3c92f38b': 'nbd:unix:/run/vdsm/backup/d9d862e4-c48a-4b30-ad59-4d6bf98adcfa:exportname=sda', 'd6aee2d9-01a7-432b-978d-c8f61b5bbd21': 'nbd:unix:/run/vdsm/backup/d9d862e4-c48a-4b30-ad59-4d6bf98adcfa:exportname=sde'}, ...

VM paused:

2021-10-18 16:14:40,688+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') abnormal vm stop device  error enospc (vm:5193)
2021-10-18 16:14:40,689+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU stopped: onIOError (vm:6028)
2021-10-18 16:14:40,969+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No VM drives were extended (vm:5200)

Backup stopped:

2021-10-18 16:15:08,803+0300 DEBUG (jsonrpc/6) [api] START stop_backup args=(<vdsm.API.VM object at 0x7f75001529b0>, 'd9d862e4-c48a-4b30-ad59-4d6bf98adcfa') kwargs={} (api:122)
2021-10-18 16:15:08,811+0300 DEBUG (jsonrpc/6) [api] FINISH stop_backup response=None (api:137)

Vdsm try to extend disk:

2021-10-18 16:15:08,894+0300 DEBUG (periodic/2) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') Extension info for drive sde volume 5354890d-8a69-44db-ab5f-ef76271d8cdf: BlockInfo(capacity=21474836480, allocation=7516520448, physical=7516192768) (vm:1277)
2021-10-18 16:15:08,894+0300 INFO  (periodic/2) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') Requesting extension for volume 5354890d-8a69-44db-ab5f-ef76271d8cdf on domain feab3738-c158-4d48-8a41-b5a95c057a50 (apparent: 7516192768, capacity: 21474836480, allocated: 7516520448, physical: 7516192768 threshold_state: exceeded) (vm:1350)

Guest resumed:

2021-10-18 16:15:13,225+0300 DEBUG (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') event Resumed detail 0 opaque None (vm:5634)
2021-10-18 16:15:13,226+0300 INFO  (mailbox-hsm/0) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU running: continue (vm:6028)


Timeline:

16:14:34  backup started
16:14:40  vm paused
16:15:08  backup stopped
16:15:13  vm resumed

VM was paused for 61 seconds during the backup.

So this is easy to reproduce (and test) with normal backup flow.

Comment 4 Nir Soffer 2021-10-18 13:52:28 UTC

(In reply to Nir Soffer from comment #3)
> VM was paused for 61 seconds during the backup.

Correction - vm was paused for 33 seconds during backup.

Comment 5 Nir Soffer 2021-10-18 15:23:19 UTC

Same backup flow with the fix:

1. In the guest, start writing 20g:

# dd if=/dev/zero bs=1M count=20480 of=/dev/sdb oflag=direct conv=fsync status=progress

2. Run full backup

$ ./backup_vm.py -c engine-dev full --backup-dir /data/scratch/nsoffer/backups 6e95d38f-d9b8-4955-878c-da6d631d0ab2
[   0.0 ] Starting full backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.3 ] Waiting until backup '87e1cf21-911d-4207-bf2f-0f23dab05825' is ready
[  27.6 ] Created checkpoint 'e67f5ec0-7e66-43c0-bcd7-ad0c8c2cd9c5' (to use in --from-checkpoint-uuid for the next incremental backup)
[  27.7 ] Downloading full backup for disk '58550e51-f836-4bfc-8ae6-d40b5dc750b4'
[  28.8 ] Image transfer '191e9516-535d-472c-bce9-4542b41b4484' is ready
[ 100.00% ] 20.00 GiB, 14.69 seconds, 1.36 GiB/s                               
[  43.5 ] Finalizing image transfer
[  52.6 ] Download completed successfully
[  52.6 ] Downloading full backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[  53.7 ] Image transfer '4947b46b-d9dd-42b2-9918-1927f052ea04' is ready
[ 100.00% ] 6.00 GiB, 6.06 seconds, 1014.58 MiB/s                              
[  59.8 ] Finalizing image transfer
[  67.9 ] Download completed successfully
[  67.9 ] Finalizing backup
[  73.0 ] Full backup '87e1cf21-911d-4207-bf2f-0f23dab05825' completed successfully

Because all backup data is zero, we expect a very small backup:

$ qemu-img info /data/scratch/nsoffer/backups/58550e51-f836-4bfc-8ae6-d40b5dc750b4.202110181747.full.qcow2
image: /data/scratch/nsoffer/backups/58550e51-f836-4bfc-8ae6-d40b5dc750b4.202110181747.full.qcow2
file format: qcow2
virtual size: 20 GiB (21474836480 bytes)
disk size: 960 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

We backed up about 5g of zeroes at the start of the disk:

$ qemu-img map --output json /data/scratch/nsoffer/backups/58550e51-f836-4bfc-8ae6-d40b5dc750b4.202110181747.full.qcow2
[{ "start": 0, "length": 5404360704, "depth": 0, "present": true, "zero": true, "data": false},
{ "start": 5404360704, "length": 16070475776, "depth": 0, "present": false, "zero": true, "data": false}]

3. Run incremental backup

$ ./backup_vm.py -c engine-dev incremental --backup-dir /data/scratch/nsoffer/backups --from-checkpoint-uuid e67f5ec0-7e66-43c0-bcd7-ad0c8c2cd9c5 6e95d38f-d9b8-4955-878c-da6d631d0ab2
[   0.0 ] Starting incremental backup for VM '6e95d38f-d9b8-4955-878c-da6d631d0ab2'
[   0.4 ] Waiting until backup '3369d87d-dd01-4d46-93fb-904a480a79ca' is ready
[  15.5 ] Created checkpoint '2761217d-c942-4827-8b4e-ccfa20186f1c' (to use in --from-checkpoint-uuid for the next incremental backup)
[  15.6 ] Downloading incremental backup for disk '242c082f-ca51-4cf4-82b9-fd3a3c92f38b'
[  16.7 ] Image transfer 'a070fb56-38a4-4eac-b2c6-47d70ce154c6' is ready
[ 100.00% ] 6.00 GiB, 0.22 seconds, 27.31 GiB/s                                
[  16.9 ] Finalizing image transfer
[  18.0 ] Download completed successfully
[  18.0 ] Downloading incremental backup for disk '58550e51-f836-4bfc-8ae6-d40b5dc750b4'
[  19.1 ] Image transfer 'd04a873e-327b-4a22-941d-09e79e5ba82a' is ready
[ 100.00% ] 20.00 GiB, 35.66 seconds, 574.39 MiB/s                             
[  54.8 ] Finalizing image transfer
[  63.9 ] Download completed successfully
[  63.9 ] Finalizing backup
[  70.2 ] Incremental backup '3369d87d-dd01-4d46-93fb-904a480a79ca' completed successfully

The incremental backup contains more zeroes, so it is also relatively
small file:

$ qemu-img info /data/scratch/nsoffer/backups/58550e51-f836-4bfc-8ae6-d40b5dc750b4.202110181749.incremental.qcow2 
image: /data/scratch/nsoffer/backups/58550e51-f836-4bfc-8ae6-d40b5dc750b4.202110181749.incremental.qcow2
file format: qcow2
virtual size: 20 GiB (21474836480 bytes)
disk size: 2.12 MiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

When I started the incremental backup, the write in the guest was almost
finished, so we seem to get the rest of the zeroes until the end of the disk.

$ qemu-img map --output json /data/scratch/nsoffer/backups/58550e51-f836-4bfc-8ae6-d40b5dc750b4.202110181749.incremental.qcow2 
[{ "start": 0, "length": 5403312128, "depth": 0, "present": false, "zero": true, "data": false},
{ "start": 5403312128, "length": 16071524352, "depth": 0, "present": true, "zero": true, "data": false}]


The VM paused few times during the backup for very short time, expected
when writing fast as possible to thin disk with default configuration.

Looking in vdsm log:

1. Full backup starting...

2021-10-18 17:47:44,862+0300 INFO  (jsonrpc/1) [api.virt] START start_backup(config={'backup_id': '87e1cf21-911d-4207-bf2f-0f23dab05825', 'disks': [{'checkpoint': True, 'scratch_disk': {'path': '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/782f6dce-00cf-4e56-b595-6f990f1764b6/214b2565-72fd-4e4d-b4ed-a96f33040348', 'type': 'block'}, 'imageID': '58550e51-f836-4bfc-8ae6-d40b5dc750b4', 'volumeID': '344b8342-47ea-4033-bf68-008806eeeeac', 'backup_mode': 'full', 'domainID': 'feab3738-c158-4d48-8a41-b5a95c057a50'}, {'checkpoint': True, 'scratch_disk': {'path': '/rhev/data-center/mnt/blockSD/feab3738-c158-4d48-8a41-b5a95c057a50/images/cfc5096e-a8e0-4e9e-b65e-341daeaff190/ec6d9517-3291-4703-8b6a-5d1e7536533e', 'type': 'block'}, 'imageID': '242c082f-ca51-4cf4-82b9-fd3a3c92f38b', 'volumeID': '44d498a1-54a5-4371-8eda-02d839d7c840', 'backup_mode': 'full', 'domainID': 'feab3738-c158-4d48-8a41-b5a95c057a50'}], 'from_checkpoint_id': None, 'parent_checkpoint_id': '1a63fe50-31d3-4c60-bda2-f9b26f541e96', 'require_consistency': False, 'to_checkpoint_id': 'e67f5ec0-7e66-43c0-bcd7-ad0c8c2cd9c5'}) from=::ffff:192.168.122.11,50832, flow_id=2a7b49d0-2c11-47b7-9f88-2e84d65693ee, vmId=6e95d38f-d9b8-4955-878c-da6d631d0ab2 (api:48)

2. Vm was paused

2021-10-18 17:47:44,875+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') abnormal vm stop device ua-58550e51-f836-4bfc-8ae6-d40b5dc750b4 error enospc (vm:5193)
2021-10-18 17:47:44,875+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU stopped: onIOError (vm:6028)

3. Vdsm extends the disk

2021-10-18 17:47:44,893+0300 DEBUG (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') Extension info for drive sdb volume 344b8342-47ea-4033-bf68-008806eeeeac: BlockInfo(capacity=21474836480, allocation=5369757696, physical=5368709120) (vm:1277)
2021-10-18 17:47:44,893+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') Requesting extension for volume 344b8342-47ea-4033-bf68-008806eeeeac on domain feab3738-c158-4d48-8a41-b5a95c057a50 (apparent: 5368709120, capacity: 21474836480, allocated: 5369757696, physical: 5368709120 threshold_state: exceeded) (vm:1350)

4. Full backup was started

2021-10-18 17:47:45,650+0300 DEBUG (jsonrpc/1) [api] FINISH start_backup response={'result': {'disks': {'58550e51-f836-4bfc-8ae6-d40b5dc750b4': 'nbd:unix:/run/vdsm/backup/87e1cf21-911d-4207-bf2f-0f23dab05825:exportname=sdb', '242c082f-ca51-4cf4-82b9-fd3a3c92f38b': 'nbd:unix:/run/vdsm/backup/87e1cf21-911d-4207-bf2f-0f23dab05825:exportname=sda'} ...

5. VM resumed

2021-10-18 17:47:45,397+0300 DEBUG (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') event Resumed detail 0 opaque None (vm:5634)
2021-10-18 17:47:45,405+0300 INFO  (mailbox-hsm/4) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU running: continue (vm:6028)

6. VM paused

2021-10-18 17:48:08,265+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') abnormal vm stop device  error enospc (vm:5193)
2021-10-18 17:48:08,266+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU stopped: onIOError (vm:6028)

7. Vdsm try to extend the disk

2021-10-18 17:48:08,279+0300 WARN  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive sdb, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-18 17:48:08,279+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') Requesting extension for volume 344b8342-47ea-4033-bf68-008806eeeeac on domain feab3738-c158-4d48-8a41-b5a95c057a50 (apparent: 7516192768, capacit
y: 21474836480, allocated: 0, physical: 7516192768 threshold_state: exceeded) (vm:1350)

8. VM resumed

2021-10-18 17:48:09,429+0300 INFO  (mailbox-hsm/3) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU running: continue (vm:6028)

9. Backup stopped

2021-10-18 17:48:26,699+0300 INFO  (jsonrpc/3) [api.virt] START stop_backup(backup_id='87e1cf21-911d-4207-bf2f-0f23dab05825') from=::ffff:192.168.122.11,50832, flow_id=2529ef2a-973e-4425-9ea0-999b6d4550f0, vmId=6e95d38f-d9b8-4955-878c-da6d631d0ab2 (api:48)


Collection all pause/resume events during the backup:

# egrep 'CPU stopped:|CPU running:' vdsm.log 
2021-10-18 17:47:44,875+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU stopped: onIOError (vm:6028)
2021-10-18 17:47:44,898+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU stopped: onSuspend (vm:6028)
2021-10-18 17:47:44,903+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU stopped: onIOError (vm:6028)
2021-10-18 17:47:45,405+0300 INFO  (mailbox-hsm/4) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU running: continue (vm:6028)
2021-10-18 17:47:45,498+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU running: onResume (vm:6028)

Pause time: 0.5 seconds

2021-10-18 17:48:08,266+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU stopped: onIOError (vm:6028)
2021-10-18 17:48:08,286+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU stopped: onSuspend (vm:6028)
2021-10-18 17:48:09,429+0300 INFO  (mailbox-hsm/3) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU running: continue (vm:6028)
2021-10-18 17:48:09,525+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU running: onResume (vm:6028)

Pause time: 1.2 seconds

2021-10-18 17:48:24,250+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU stopped: onIOError (vm:6028)
2021-10-18 17:48:24,329+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU stopped: onSuspend (vm:6028)
2021-10-18 17:48:25,404+0300 INFO  (mailbox-hsm/3) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU running: continue (vm:6028)
2021-10-18 17:48:25,502+0300 INFO  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') CPU running: onResume (vm:6028)

Pause time: 0.1 seconds

We can see that vdsm handles ENOSPC events correctly, and the VM is paused for
short time and resume when vdsm complete the extension.

Comment 6 Nir Soffer 2021-10-18 15:31:04 UTC

We log new warning when libvirt reports allocation=0 when checking if disk should be extended.

Here are the new warnings during backup:

# grep 'No allocation info' vdsm.log 
2021-10-18 17:47:49,174+0300 WARN  (periodic/3) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive sdb, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-18 17:47:51,114+0300 WARN  (periodic/0) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive sdb, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-18 17:48:07,092+0300 WARN  (periodic/3) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive sdb, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-18 17:48:08,279+0300 WARN  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive sdb, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-18 17:48:09,101+0300 WARN  (periodic/1) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive sdb, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-18 17:48:13,140+0300 WARN  (periodic/0) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive sdb, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-18 17:48:15,153+0300 WARN  (periodic/2) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive sdb, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-18 17:48:23,098+0300 WARN  (periodic/2) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive sdb, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-18 17:48:24,316+0300 WARN  (libvirt/events) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive sdb, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-18 17:48:25,118+0300 WARN  (periodic/1) [virt.vm] (vmId='6e95d38f-d9b8-4955-878c-da6d631d0ab2') No allocation info for drive sdb, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)

Comment 7 Nir Soffer 2021-10-18 19:29:56 UTC

The root cause is libvirt bug 2015281.

Comment 8 Nir Soffer 2021-10-18 20:28:57 UTC

Build for 4.4.9 available here in case you want to test this it
before the official compose.
https://jenkins.ovirt.org/job/vdsm_standard-check-patch/30223/artifact/build-artifacts.build-py3.el8stream.x86_64/

Comment 9 Avihai 2021-10-19 06:56:31 UTC

Hi Nir,
Is this a RHEL8.5/RHV4.4.9 new issue(regression) ?

Will we need this workaround backported to earlier RHV4.4.8/RHEL8.4 builds as well?

Comment 11 Nir Soffer 2021-10-20 11:51:54 UTC

(In reply to Avihai from comment #9)
> Is this a RHEL8.5/RHV4.4.9 new issue(regression) ?

I did not test.

> Will we need this workaround backported to earlier RHV4.4.8/RHEL8.4 builds
> as well?

Unlikely.

Comment 12 sshmulev 2021-10-21 08:25:02 UTC

Failed verification according to steps in comment

Versions:
ovirt-engine-4.4.9.3-0.3.el8ev.noarch
vdsm-4.40.90.3-1.el8ev.x86_64
libvirt-7.6.0-5.module+el8.5.0+12933+58cb48a1


steps to reproduce:

1. Start a VM with 20g thin disk
2. In the VM, starts dd process, writing 20g to the think disk

# dd if=/dev/zero bs=1M count=20480 of=/dev/sda oflag=direct conv=fsync status=progress

3. Perform full backup
[root@storage-ge3-vdsm3 examples]# ./backup_vm.py -c engine full 4faa9712-c1aa-4cc2-8396-169138858eea
[   0.0 ] Starting full backup for VM '4faa9712-c1aa-4cc2-8396-169138858eea'
Traceback (most recent call last):
  File "./backup_vm.py", line 525, in <module>
    main()
  File "./backup_vm.py", line 191, in main
    args.command(args)
  File "./backup_vm.py", line 205, in cmd_full
    backup = start_backup(connection, args)
  File "./backup_vm.py", line 336, in start_backup
    description=args.description
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 34151, in add
    return self._internal_add(backup, headers, query, wait)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 232, in _internal_add
    return future.wait() if wait else future
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in wait
    return self._code(response)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 229, in callback
    self._check_fault(response)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in _check_fault
    self._raise_error(response, body)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in _raise_error
    raise error
ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot backup VM. The Virtual Machine should be in Up or Down status.]". HTTP response code is 409.



Host:
2021-10-21 11:15:49,471+0300 INFO  (libvirt/events) [virt.vm] (vmId='4faa9712-c1aa-4cc2-8396-169138858eea') abnormal vm stop device ua-02921e71-459d-4b52-844d-380b91e3f110 error enospc (vm:5193)
2021-10-21 11:15:49,471+0300 INFO  (libvirt/events) [virt.vm] (vmId='4faa9712-c1aa-4cc2-8396-169138858eea') CPU stopped: onIOError (vm:6028)
2021-10-21 11:15:49,489+0300 INFO  (libvirt/events) [virt.vm] (vmId='4faa9712-c1aa-4cc2-8396-169138858eea') Requesting extension for volume 6296825e-4e12-4314-a13a-1be6df45950f on domain 759f583c-6331-4820-9ba6-ec971ef62b3c (apparent: 1073741824, capacity: 21474836480, allocated: 1074135040, physical: 1073741824 threshold_state: exceeded) (vm:1350)
2021-10-21 11:15:49,489+0300 DEBUG (libvirt/events) [storage.TaskManager.Task] (Task='e339e041-0511-4154-b807-261d8fc795bb') moving from state finished -> state preparing (task:624)

Comment 14 sshmulev 2021-10-21 11:45:37 UTC

Adding to some inforamtion about the VM during this error:

The VM was paused and was up during the writing to the disk couple of times and during all these time got these errors.
The disk was fully used at the end of the dd action and then the VM got back to Active state.

Comment 15 Nir Soffer 2021-10-21 11:52:29 UTC

(In reply to sshmulev from comment #12)
> Failed verification according to steps in comment
> 
> Versions:
> ovirt-engine-4.4.9.3-0.3.el8ev.noarch
> vdsm-4.40.90.3-1.el8ev.x86_64
> libvirt-7.6.0-5.module+el8.5.0+12933+58cb48a1
> 
> 
> steps to reproduce:
> 
> 1. Start a VM with 20g thin disk
> 2. In the VM, starts dd process, writing 20g to the think disk
> 
> # dd if=/dev/zero bs=1M count=20480 of=/dev/sda oflag=direct conv=fsync
> status=progress
> 
> 3. Perform full backup
> [root@storage-ge3-vdsm3 examples]# ./backup_vm.py -c engine full
> 4faa9712-c1aa-4cc2-8396-169138858eea
> [   0.0 ] Starting full backup for VM '4faa9712-c1aa-4cc2-8396-169138858eea'
> Traceback (most recent call last):

...
> ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
> "[Cannot backup VM. The Virtual Machine should be in Up or Down status.]".
> HTTP response code is 409.

This is expected failure, we don't allow backup when VM is paused.

You should retry the backup operation in this case.

> Host:
> 2021-10-21 11:15:49,471+0300 INFO  (libvirt/events) [virt.vm]
> (vmId='4faa9712-c1aa-4cc2-8396-169138858eea') abnormal vm stop device
> ua-02921e71-459d-4b52-844d-380b91e3f110 error enospc (vm:5193)
> 2021-10-21 11:15:49,471+0300 INFO  (libvirt/events) [virt.vm]
> (vmId='4faa9712-c1aa-4cc2-8396-169138858eea') CPU stopped: onIOError
> (vm:6028)
> 2021-10-21 11:15:49,489+0300 INFO  (libvirt/events) [virt.vm]
> (vmId='4faa9712-c1aa-4cc2-8396-169138858eea') Requesting extension for
> volume 6296825e-4e12-4314-a13a-1be6df45950f on domain
> 759f583c-6331-4820-9ba6-ec971ef62b3c (apparent: 1073741824, capacity:
> 21474836480, allocated: 1074135040, physical: 1073741824 threshold_state:
> exceeded) (vm:1350)

If this happens during the backup, it show that the issue is fixed. Without
the fix we do not see extend attempts.

Please check carefully comment 5 and comment 6, they show what to look for
in vdsm logs.

Comment 16 sshmulev 2021-10-21 15:51:38 UTC

Verified.

Versions:
ovirt-engine-4.4.9.3-0.3.el8ev.noarch
vdsm-4.40.90.3-1.el8ev.x86_64
libvirt-7.6.0-5.module+el8.5.0+12933+58cb48a1

Steps to reproduce:
1. Start a VM with 20g thin disk
2. In the VM, starts dd process, writing 20g to the think disk

# dd if=/dev/zero bs=1M count=20480 of=/dev/sda oflag=direct conv=fsync status=progress
during the writing started backup
Full backup:
[root@storage-ge3-vdsm3 examples]# ./backup_vm.py -c engine full dc411543-5a9c-440e-9567-12b83214ea28
[   0.0 ] Starting full backup for VM 'dc411543-5a9c-440e-9567-12b83214ea28'
[   1.3 ] Waiting until backup 'bf26ebff-b3b3-447e-b284-5871e54f553f' is ready
[  21.6 ] Created checkpoint '09a6a04a-8b82-4844-b2dc-06cd8f2fdd9b' (to use in --from-checkpoint-uuid for the next incremental backup)
[  21.7 ] Downloading full backup for disk '27616873-0fb2-4df5-9336-1a52b95614d8'
[  22.9 ] Image transfer '2a4551db-bf02-4e8d-8e8f-30ed1fbe6edc' is ready
[ 100.00% ] 20.00 GiB, 17.28 seconds, 1.16 GiB/s                               
[  40.1 ] Finalizing image transfer
[  48.3 ] Download completed successfully
[  48.3 ] Downloading full backup for disk 'a95c2a14-8bd1-4ab4-8123-a5967bf7787a'
[  49.4 ] Image transfer '360cd5b3-e09e-4cff-8029-2f777719eda2' is ready
[ 100.00% ] 10.00 GiB, 13.56 seconds, 755.12 MiB/s                             
[  63.0 ] Finalizing image transfer
[  74.2 ] Download completed successfully
[  74.2 ] Finalizing backup
[  86.5 ] Full backup 'bf26ebff-b3b3-447e-b284-5871e54f553f' completed successfully

and then Incremental backup:
[root@storage-ge3-vdsm3 examples]# ./backup_vm.py -c engine incremental --from-checkpoint-uuid 09a6a04a-8b82-4844-b2dc-06cd8f2fdd9b b08f99b4-ddad-46c3-93ff-74b3140e4824
[   0.0 ] Starting incremental backup for VM 'b08f99b4-ddad-46c3-93ff-74b3140e4824'
[   0.2 ] VM 'b08f99b4-ddad-46c3-93ff-74b3140e4824' does not exist
[root@storage-ge3-vdsm3 examples]# ./backup_vm.py -c engine incremental --from-checkpoint-uuid 09a6a04a-8b82-4844-b2dc-06cd8f2fdd9b dc411543-5a9c-440e-9567-12b83214ea28
[   0.0 ] Starting incremental backup for VM 'dc411543-5a9c-440e-9567-12b83214ea28'
[   1.4 ] Waiting until backup '3d217d41-f249-471f-9c23-2277ce434bdb' is ready
[  18.6 ] Created checkpoint 'e160c9ed-1d37-455e-8c9d-7b3d3e786c5e' (to use in --from-checkpoint-uuid for the next incremental backup)
[  18.7 ] Downloading incremental backup for disk '27616873-0fb2-4df5-9336-1a52b95614d8'
[  19.8 ] Image transfer '2ee53e1c-b0b3-40c7-b506-6a2b3ef13fc0' is ready
[ 100.00% ] 20.00 GiB, 48.53 seconds, 422.04 MiB/s                             
[  68.3 ] Finalizing image transfer
[  75.5 ] Download completed successfully
[  75.5 ] Downloading incremental backup for disk 'a95c2a14-8bd1-4ab4-8123-a5967bf7787a'
[  76.6 ] Image transfer '98be34d3-5c19-4295-bc9a-eb80ab4df2ac' is ready
[ 100.00% ] 10.00 GiB, 0.17 seconds, 57.49 GiB/s                               
[  76.7 ] Finalizing image transfer
[  79.8 ] Download completed successfully
[  79.8 ] Finalizing backup
[  84.0 ] Incremental backup '3d217d41-f249-471f-9c23-2277ce434bdb' completed successfully


start_backup:
2021-10-21 18:12:11,043+0300 INFO  (jsonrpc/1) [api.virt] START start_backup(config={'backup_id': 'bf26ebff-b3b3-447e-b284-5871e54f553f', 'disks': [{'checkpoint': True, 'scratch_disk': {'path': '/rhev/data-center/mnt/blockSD/d754ab8a-cffc-49d0-b45b-e41b9502a7a1/images/32ede744-e380-415f-aa59-2138a38ce618/8707d29a-1ff4-4326-b3e5-3833c918edb7', 'type': 'block'}, 'imageID': '27616873-0fb2-4df5-9336-1a52b95614d8', 'volumeID': '7e392d9b-4ba0-4bbb-81b1-b9ee3562e048', 'backup_mode': 'full', 'domainID': 'd754ab8a-cffc-49d0-b45b-e41b9502a7a1'}, {'checkpoint': True, 'scratch_disk': {'path': '/rhev/data-center/mnt/blockSD/759f583c-6331-4820-9ba6-ec971ef62b3c/images/05a23cf2-0288-495b-b152-f20175619b66/3a029b94-1119-498a-825c-019bd6e5b020', 'type': 'block'}, 'imageID': 'a95c2a14-8bd1-4ab4-8123-a5967bf7787a', 'volumeID': '01205056-6033-4b98-8cda-751a1d2150a3', 'backup_mode': 'full', 'domainID': '759f583c-6331-4820-9ba6-ec971ef62b3c'}], 'from_checkpoint_id': None, 'parent_checkpoint_id': None, 'require_consistency': False, 'to_checkpoint_id': '09a6a04a-8b82-4844-b2dc-06cd8f2fdd9b'}) from=::ffff:10.46.12.144,52796, flow_id=e747aac2-b2e0-43e8-bc7c-db6eb09df60d, vmId=dc411543-5a9c-440e-9567-12b83214ea28 (api:48)

VM paused:
2021-10-21 18:12:11,870+0300 INFO  (libvirt/events) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') CPU stopped: onIOError (vm:6028)
2021-10-21 18:12:11,892+0300 INFO  (libvirt/events) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') CPU stopped: onSuspend (vm:6028)
2021-10-21 18:12:11,044+0300 INFO  (jsonrpc/1) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') Freezing guest filesystems (vm:4276)
2021-10-21 18:12:11,097+0300 INFO  (jsonrpc/1) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') 1 guest filesystems frozen (vm:4295)


Requesting for extension:
2021-10-21 18:12:11,870+0300 INFO  (libvirt/events) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') abnormal vm stop device  error enospc (vm:5193)
2021-10-21 18:12:11,870+0300 INFO  (libvirt/events) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') CPU stopped: onIOError (vm:6028)
2021-10-21 18:12:11,887+0300 WARN  (libvirt/events) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') No allocation info for drive sda, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-21 18:12:11,887+0300 INFO  (libvirt/events) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') Requesting extension for volume 7e392d9b-4ba0-4bbb-81b1-b9ee3562e048 on domain d754ab8a-cffc-49d0-b45b-e41b9502a7a1 (apparent: 5368709120, capacity: 21474836480, allocated: 0, physical: 5368709120 threshold_state: exceeded) (vm:1350)

2021-10-21 18:12:11,098+0300 INFO  (jsonrpc/1) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') Starting backup for backup_id: 'bf26ebff-b3b3-447e-b284-5871e54f553f', backup xml: <?xml version='1.0' encoding='utf-8'?>

VM running again:
2021-10-21 18:12:11,219+0300 INFO  (jsonrpc/1) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') Thawing guest filesystems (vm:4303)
2021-10-21 18:12:11,230+0300 INFO  (jsonrpc/1) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') 1 guest filesystems thawed (vm:4322)
2021-10-21 18:12:15,221+0300 INFO  (mailbox-hsm/3) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') CPU running: continue (vm:6028)
2021-10-21 18:12:15,319+0300 INFO  (libvirt/events) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') CPU running: onResume (vm:6028)


start_backup:
2021-10-21 18:13:49,512+0300 INFO  (jsonrpc/5) [api.virt] START start_backup(config={'backup_id': '3d217d41-f249-471f-9c23-2277ce434bdb', 'disks': [{'checkpoint': True, 'scratch_disk': {'path': '/rhev/data-center/mnt/blockSD/d754ab8a-cffc-49d0-b45b-e41b9502a7a1/images/4db0c94c-5702-4802-8baa-5c1b4b7c1f30/85733056-4b8b-4bce-b3d7-79de890c6a18', 'type': 'block'}, 'imageID': '27616873-0fb2-4df5-9336-1a52b95614d8', 'volumeID': '7e392d9b-4ba0-4bbb-81b1-b9ee3562e048', 'backup_mode': 'incremental', 'domainID': 'd754ab8a-cffc-49d0-b45b-e41b9502a7a1'}, {'checkpoint': True, 'scratch_disk': {'path': '/rhev/data-center/mnt/blockSD/759f583c-6331-4820-9ba6-ec971ef62b3c/images/a8e4b6e7-a3db-46e9-8cf1-1bc7f21c54e8/e1461e78-d7fa-4a1f-93c7-569e5367f57e', 'type': 'block'}, 'imageID': 'a95c2a14-8bd1-4ab4-8123-a5967bf7787a', 'volumeID': '01205056-6033-4b98-8cda-751a1d2150a3', 'backup_mode': 'incremental', 'domainID': '759f583c-6331-4820-9ba6-ec971ef62b3c'}], 'from_checkpoint_id': '09a6a04a-8b82-4844-b2dc-06cd8f2fdd9b', 'parent_checkpoint_id': '09a6a04a-8b82-4844-b2dc-06cd8f2fdd9b', 'require_consistency': False, 'to_checkpoint_id': 'e160c9ed-1d37-455e-8c9d-7b3d3e786c5e'}) from=::ffff:10.46.12.144,52796, flow_id=f297676e-4aeb-4e04-a497-bf0bdd9a2769, vmId=dc411543-5a9c-440e-9567-12b83214ea28 (api:48)

VM paused:
2021-10-21 18:13:03,892+0300 INFO  (libvirt/events) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') CPU stopped: onIOError (vm:6028)
2021-10-21 18:13:03,914+0300 INFO  (libvirt/events) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') CPU stopped: onSuspend (vm:6028)
2021-10-21 18:13:49,513+0300 INFO  (jsonrpc/5) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') Freezing guest filesystems (vm:4276)
2021-10-21 18:13:49,544+0300 INFO  (jsonrpc/5) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') 1 guest filesystems frozen (vm:4295)

Requesting for extension:
2021-10-21 18:13:52,728+0300 WARN  (periodic/3) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') No allocation info for drive sda, but block threshold was exceeded - assuming that drive needs extension (drivemonitor:251)
2021-10-21 18:13:52,729+0300 INFO  (periodic/3) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') Requesting extension for volume 7e392d9b-4ba0-4bbb-81b1-b9ee3562e048 on domain d754ab8a-cffc-49d0-b45b-e41b9502a7a1 (apparent: 20401094656, capacity: 21474836480, allocated: 0, physical: 20401094656 threshold_state: exceeded) (vm:1350)

VM resumes:
2021-10-21 18:13:08,894+0300 INFO  (mailbox-hsm/3) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') CPU stopped: continue (vm:6028)
2021-10-21 18:13:08,989+0300 INFO  (libvirt/events) [virt.vm] (vmId='dc411543-5a9c-440e-9567-12b83214ea28') CPU running: onResume (vm:6028)

stop_backup:
2021-10-21 18:14:51,156+0300 INFO  (jsonrpc/2) [api.virt] START stop_backup(backup_id='3d217d41-f249-471f-9c23-2277ce434bdb') from=::ffff:10.46.12.144,52796, flow_id=f350c4f4-f8f3-4e06-b2e4-aafd07d9cc65, vmId=dc411543-5a9c-440e-9567-12b83214ea28 (api:48)
2021-10-21 18:14:51,165+0300 INFO  (jsonrpc/2) [storage.backup] Removing scratch disks for backup id: 3d217d41-f249-471f-9c23-2277ce434bdb (backup:587)
2021-10-21 18:14:51,166+0300 DEBUG (jsonrpc/2) [storage.TaskManager.Task] (Task='effafdd1-7f1f-4830-a578-14d641cb3308') moving from state finished -> state preparing (task:624)
2021-10-21 18:14:51,166+0300 INFO  (jsonrpc/2) [vdsm.api] START list_transient_disks(owner_name='dc411543-5a9c-440e-9567-12b83214ea28') from=::ffff:10.46.12.144,52796, flow_id=f350c4f4-f8f3-4e06-b2e4-aafd07d9cc65, task_id=effafdd1-7f1f-4830-a578-14d641cb3308 (api:48)

Note You need to log in before you can comment on or make changes to this bug.