1913387 – [CBT] [RFE] Extend backup scratch disk as needed

Bug 1913387 - [CBT] [RFE] Extend backup scratch disk as needed

Summary: [CBT] [RFE] Extend backup scratch disk as needed

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	Core
Sub Component:
Version:	4.40.40
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nir Soffer
QA Contact:	Evelina Shames
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2043175 (view as bug list)
Depends On:	1913315 2017928
Blocks:	1913389
TreeView+	depends on / blocked

Reported:	2021-01-06 16:02 UTC by Nir Soffer
Modified:	2022-03-22 10:40 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-16 12:51:09 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	sbonazzo: ovirt-4.5-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	oVirt vdsm pull 58	None	Draft	Prepare for monitoring scratch disks	2022-01-31 20:40:51 UTC
Github	oVirt vdsm pull 64	None	Draft	Cleanup VolumeMonitor API	2022-02-02 16:00:53 UTC
oVirt gerrit	115436	master	MERGED	virt: storage: Support top volume index	2021-07-19 12:10:03 UTC
oVirt gerrit	115437	master	MERGED	drivemonitor: Allow setting threshold with index	2021-07-19 17:02:33 UTC
oVirt gerrit	115755	master	MERGED	tests: Add missing state for extending drives	2021-07-19 17:02:40 UTC
oVirt gerrit	115780	master	MERGED	drivemonitor: Always use block threshold events	2021-07-21 09:48:41 UTC
oVirt gerrit	117452	master	MERGED	drivemonitor: Fetch block info using block stats API	2021-11-18 20:09:42 UTC
oVirt gerrit	117456	master	MERGED	virdomain: Allow access to the underling libvirt.virDomain	2021-11-18 20:09:39 UTC
oVirt gerrit	117464	master	MERGED	virt: Improve comments	2021-11-06 07:34:10 UTC
oVirt gerrit	117465	master	MERGED	vm: Pass BlockInfo instead of unpacking it	2021-11-18 20:06:36 UTC
oVirt gerrit	117626	master	MERGED	livemerge: Do not use getExtendInfo for base volume	2021-11-18 20:09:44 UTC
oVirt gerrit	117628	master	MERGED	vm: Fix handling of ImprobableResizeRequestError	2021-11-18 20:09:49 UTC
oVirt gerrit	117629	master	MERGED	vm: Split the query from loop	2021-11-18 20:14:37 UTC
oVirt gerrit	117630	master	MERGED	vm: Replace blockInfo with block stats	2021-11-19 15:58:07 UTC
oVirt gerrit	117637	master	MERGED	tests: Remove unused fake blockInfo	2021-11-18 20:09:46 UTC
oVirt gerrit	117677	master	MERGED	virt: Remove vmdevices.storage.BlockInfo	2021-11-19 23:49:15 UTC
oVirt gerrit	117690	master	MERGED	tests: Access drives via fake vm	2021-11-18 14:04:15 UTC
oVirt gerrit	117691	master	MERGED	tests: Remove unneeded globals	2021-11-18 14:04:17 UTC
oVirt gerrit	117692	master	MERGED	transientdisk: Make owner_dir and disk_path public	2021-11-18 14:33:20 UTC
oVirt gerrit	117693	master	MERGED	tests: Use transientdisk.disk_path()	2021-11-18 16:50:28 UTC
oVirt gerrit	117700	master	MERGED	tests: Remove python 2 future imports	2021-11-18 16:50:30 UTC
oVirt gerrit	117701	master	MERGED	tests: Generate backup xml automatically	2021-11-18 20:08:05 UTC
oVirt gerrit	117715	master	MERGED	backup: Parse index from backup xml	2021-12-14 11:51:54 UTC
oVirt gerrit	117727	master	MERGED	backup: Parse incremental element	2021-12-14 11:52:44 UTC
oVirt gerrit	117728	master	MERGED	backup: Use exportname to create backup url	2021-12-14 11:52:46 UTC
oVirt gerrit	117729	master	MERGED	backup: Parse disk type from backup xml	2021-12-14 11:58:29 UTC
oVirt gerrit	117730	master	MERGED	backup: Improve naming	2021-12-14 11:58:31 UTC
oVirt gerrit	117731	master	MERGED	backup: Keep drive object in backup disk	2021-12-14 11:58:33 UTC
oVirt gerrit	117732	master	MERGED	backup: Keep scratch disk info in drive	2021-12-22 14:57:21 UTC

Description Nir Soffer 2021-01-06 16:02:02 UTC

Description of problem:

During backup, when guest write data that is part of the backup, qemu 
coopies the old data from the disk to the scratch disk before writing to the 
data to disk. When the scratch disk becomes too full, vdsm need to extend
the disk.

Libvirt will support monitoring scratch disk block threshold in RHEL 8.4
(bug 1913315). Here is an example of backup xml showing the scratch disk
deails:

$ virsh backup-dumpxml backup-test
<domainbackup mode='pull'>
  <server transport='tcp' name='localhost' port='1234'/>
  <disks>
    <disk name='vda' backup='yes' type='file' backupmode='full' exportname='vda' index='4'>
      <driver type='qcow2'/>
      <scratch file='/tmp/backup-test-images/scratch-vda.qcow2'/>
    </disk>
    <disk name='hda' backup='no'/>
  </disks>
</domainbackup>

Vdsm need to extract the name ('vda') and index (index='4') and use them ("vda[4]") as the name argument to virDomain.setBlockThreshold().

After starting backup, vdsm need to set write threshold on the scratch disk.

When handling block threshold event, vdsm need to mark the scratch disk for
extension, and schedule async extend operation.

Async extend operation need to send an extend message to the spm, and wait
for extend reply.

If an extend request fails or times out, vdsm need to retry and send a new
extend request.

When the scratch disk is extended successfully, vdsm needs to refresh the
volume, and set a new block threshold.

This mechanism is similar to the way normal disks are extended, but the
current mechanism assumes that only the top volume of the disk is monitored
or extended, so it cannot be used as is.

If vdsm was restarted during a backup, it may miss the block threshold event,
so it needs to query the allocation of the scratch disk, and trigger an
extend if needed. To check the current allocation we can use the bulk stats 
API (virConnectGetAllDomainStats/virDomainListGetStats) as "block.
<num>.allocation" in the VIR_DOMAIN_STATS_BLOCK group. The current threshold
value is reported as "block.<num>.threshold".

The size of the chunks and the threshold can use the existing configuration
(irs:volume_utilization_percent, irs:volume_utilization_chunk_mb).

Same mechanism is needed also for live merge and live storage migration.

- In live merge, we use only initial extension. This may not be enough for
  merging the active layer. In this case live merge will fail and the user
  will have to retry the merge.

- In live storage migration we have complicated mechanism monitoring the
  source disk and extending the target disk. This can be replaced by
  monitoring the target disk and extending it separately.

This will be hard to implement and may require more than one zstream cycle.

Comment 1 Jean-Louis Dupond 2021-03-22 09:22:57 UTC

I think this is a major issue currently and causing incremental backups to be useless at this moment.
Cause if you are on oVirt 4.4.5 (which already uses scratch disks) but without this, you end up with pauzed VM's as the scratch disk is never extended.

Shouldn't we create scratch disks with size == disk size instead of thin provision them until this is fixed?
Its not an ideal situation as you might need much more storage, but the current situation is even worse I think.

Comment 2 Nir Soffer 2021-03-22 09:32:15 UTC

(In reply to Jean-Louis Dupond from comment #1)
> Shouldn't we create scratch disks with size == disk size instead of thin
> provision them until this is fixed?

This is what we do now - we create thin disk with initial size equal equal
to the original disk virtual size.

When the disk is created, you should be able to see the initial_size= argument,
it must be the virtual size of the original disk. This will allocated logical
volume of virtual size * 1.1 on storage, and create a qcow2 image on this
logical volume.

If this is not the case, this is a bug.

Comment 3 Jean-Louis Dupond 2021-08-30 06:56:49 UTC

I don't know what's the ETA is for oVirt 4.5.0, but I think this bug deserves some more priority :)

The biggest issue now is that if you do concurrent incremental backups, you really need a ton of additional diskspace on your iSCSI LUN.
Take you backup 2 500G VM's, you need an additional 1TB extra free diskapace to be able to backup them.
This is a huge blocker to start using incremental backups in production.

Also ain't most of the work already done by Nir?

Comment 4 Eyal Shenitzky 2021-08-30 10:38:53 UTC

(In reply to Jean-Louis Dupond from comment #3)
> I don't know what's the ETA is for oVirt 4.5.0, but I think this bug
> deserves some more priority :)
> 
> The biggest issue now is that if you do concurrent incremental backups, you
> really need a ton of additional diskspace on your iSCSI LUN.
> Take you backup 2 500G VM's, you need an additional 1TB extra free diskapace
> to be able to backup them.
> This is a huge blocker to start using incremental backups in production.
> 
> Also ain't most of the work already done by Nir?

You are right and this feature is under development.
There is still much work to do but this one should be delivered in oVirt 4.5.

Comment 6 Arik 2022-01-19 15:32:09 UTC

Nir, do we miss anything for this bz?

Comment 7 Nir Soffer 2022-01-19 17:11:23 UTC

Yes, finish the work. The merged patches are just preparation for the actual work.

Comment 8 Arik 2022-01-19 17:21:47 UTC

ah wow, that's a lot of preparation patches :)
ok, thanks

Comment 9 Arik 2022-01-24 15:15:03 UTC

*** Bug 2043175 has been marked as a duplicate of this bug. ***

Comment 10 Arik 2022-03-16 12:51:09 UTC

We took a different approach for backups that renders this bz redundant (as the new method doesn't involve using scratch disks).
This new method will be available for testing as from oVirt 4.5 alpha (see bz 2053669)

Note You need to log in before you can comment on or make changes to this bug.