Bug 1913387 - [CBT] [RFE] Extend backup scratch disk as needed
Summary: [CBT] [RFE] Extend backup scratch disk as needed
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.40.40
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Nir Soffer
QA Contact: Evelina Shames
URL:
Whiteboard:
: 2043175 (view as bug list)
Depends On: 1913315 2017928
Blocks: 1913389
TreeView+ depends on / blocked
 
Reported: 2021-01-06 16:02 UTC by Nir Soffer
Modified: 2022-03-22 10:40 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-16 12:51:09 UTC
oVirt Team: Storage
Embargoed:
sbonazzo: ovirt-4.5-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt vdsm pull 58 0 None Draft Prepare for monitoring scratch disks 2022-01-31 20:40:51 UTC
Github oVirt vdsm pull 64 0 None Draft Cleanup VolumeMonitor API 2022-02-02 16:00:53 UTC
oVirt gerrit 115436 0 master MERGED virt: storage: Support top volume index 2021-07-19 12:10:03 UTC
oVirt gerrit 115437 0 master MERGED drivemonitor: Allow setting threshold with index 2021-07-19 17:02:33 UTC
oVirt gerrit 115755 0 master MERGED tests: Add missing state for extending drives 2021-07-19 17:02:40 UTC
oVirt gerrit 115780 0 master MERGED drivemonitor: Always use block threshold events 2021-07-21 09:48:41 UTC
oVirt gerrit 117452 0 master MERGED drivemonitor: Fetch block info using block stats API 2021-11-18 20:09:42 UTC
oVirt gerrit 117456 0 master MERGED virdomain: Allow access to the underling libvirt.virDomain 2021-11-18 20:09:39 UTC
oVirt gerrit 117464 0 master MERGED virt: Improve comments 2021-11-06 07:34:10 UTC
oVirt gerrit 117465 0 master MERGED vm: Pass BlockInfo instead of unpacking it 2021-11-18 20:06:36 UTC
oVirt gerrit 117626 0 master MERGED livemerge: Do not use getExtendInfo for base volume 2021-11-18 20:09:44 UTC
oVirt gerrit 117628 0 master MERGED vm: Fix handling of ImprobableResizeRequestError 2021-11-18 20:09:49 UTC
oVirt gerrit 117629 0 master MERGED vm: Split the query from loop 2021-11-18 20:14:37 UTC
oVirt gerrit 117630 0 master MERGED vm: Replace blockInfo with block stats 2021-11-19 15:58:07 UTC
oVirt gerrit 117637 0 master MERGED tests: Remove unused fake blockInfo 2021-11-18 20:09:46 UTC
oVirt gerrit 117677 0 master MERGED virt: Remove vmdevices.storage.BlockInfo 2021-11-19 23:49:15 UTC
oVirt gerrit 117690 0 master MERGED tests: Access drives via fake vm 2021-11-18 14:04:15 UTC
oVirt gerrit 117691 0 master MERGED tests: Remove unneeded globals 2021-11-18 14:04:17 UTC
oVirt gerrit 117692 0 master MERGED transientdisk: Make owner_dir and disk_path public 2021-11-18 14:33:20 UTC
oVirt gerrit 117693 0 master MERGED tests: Use transientdisk.disk_path() 2021-11-18 16:50:28 UTC
oVirt gerrit 117700 0 master MERGED tests: Remove python 2 future imports 2021-11-18 16:50:30 UTC
oVirt gerrit 117701 0 master MERGED tests: Generate backup xml automatically 2021-11-18 20:08:05 UTC
oVirt gerrit 117715 0 master MERGED backup: Parse index from backup xml 2021-12-14 11:51:54 UTC
oVirt gerrit 117727 0 master MERGED backup: Parse incremental element 2021-12-14 11:52:44 UTC
oVirt gerrit 117728 0 master MERGED backup: Use exportname to create backup url 2021-12-14 11:52:46 UTC
oVirt gerrit 117729 0 master MERGED backup: Parse disk type from backup xml 2021-12-14 11:58:29 UTC
oVirt gerrit 117730 0 master MERGED backup: Improve naming 2021-12-14 11:58:31 UTC
oVirt gerrit 117731 0 master MERGED backup: Keep drive object in backup disk 2021-12-14 11:58:33 UTC
oVirt gerrit 117732 0 master MERGED backup: Keep scratch disk info in drive 2021-12-22 14:57:21 UTC

Description Nir Soffer 2021-01-06 16:02:02 UTC
Description of problem:

During backup, when guest write data that is part of the backup, qemu 
coopies the old data from the disk to the scratch disk before writing to the 
data to disk. When the scratch disk becomes too full, vdsm need to extend
the disk.

Libvirt will support monitoring scratch disk block threshold in RHEL 8.4
(bug 1913315). Here is an example of backup xml showing the scratch disk
deails:

$ virsh backup-dumpxml backup-test
<domainbackup mode='pull'>
  <server transport='tcp' name='localhost' port='1234'/>
  <disks>
    <disk name='vda' backup='yes' type='file' backupmode='full' exportname='vda' index='4'>
      <driver type='qcow2'/>
      <scratch file='/tmp/backup-test-images/scratch-vda.qcow2'/>
    </disk>
    <disk name='hda' backup='no'/>
  </disks>
</domainbackup>

Vdsm need to extract the name ('vda') and index (index='4') and use them ("vda[4]") as the name argument to virDomain.setBlockThreshold().

After starting backup, vdsm need to set write threshold on the scratch disk.

When handling block threshold event, vdsm need to mark the scratch disk for
extension, and schedule async extend operation.

Async extend operation need to send an extend message to the spm, and wait
for extend reply.

If an extend request fails or times out, vdsm need to retry and send a new
extend request.

When the scratch disk is extended successfully, vdsm needs to refresh the
volume, and set a new block threshold.

This mechanism is similar to the way normal disks are extended, but the
current mechanism assumes that only the top volume of the disk is monitored
or extended, so it cannot be used as is.

If vdsm was restarted during a backup, it may miss the block threshold event,
so it needs to query the allocation of the scratch disk, and trigger an
extend if needed. To check the current allocation we can use the bulk stats 
API (virConnectGetAllDomainStats/virDomainListGetStats) as "block.
<num>.allocation" in the VIR_DOMAIN_STATS_BLOCK group. The current threshold
value is reported as "block.<num>.threshold".

The size of the chunks and the threshold can use the existing configuration
(irs:volume_utilization_percent, irs:volume_utilization_chunk_mb).

Same mechanism is needed also for live merge and live storage migration.

- In live merge, we use only initial extension. This may not be enough for
  merging the active layer. In this case live merge will fail and the user
  will have to retry the merge.

- In live storage migration we have complicated mechanism monitoring the
  source disk and extending the target disk. This can be replaced by
  monitoring the target disk and extending it separately.

This will be hard to implement and may require more than one zstream cycle.

Comment 1 Jean-Louis Dupond 2021-03-22 09:22:57 UTC
I think this is a major issue currently and causing incremental backups to be useless at this moment.
Cause if you are on oVirt 4.4.5 (which already uses scratch disks) but without this, you end up with pauzed VM's as the scratch disk is never extended.

Shouldn't we create scratch disks with size == disk size instead of thin provision them until this is fixed?
Its not an ideal situation as you might need much more storage, but the current situation is even worse I think.

Comment 2 Nir Soffer 2021-03-22 09:32:15 UTC
(In reply to Jean-Louis Dupond from comment #1)
> Shouldn't we create scratch disks with size == disk size instead of thin
> provision them until this is fixed?

This is what we do now - we create thin disk with initial size equal equal
to the original disk virtual size.

When the disk is created, you should be able to see the initial_size= argument,
it must be the virtual size of the original disk. This will allocated logical
volume of virtual size * 1.1 on storage, and create a qcow2 image on this
logical volume.

If this is not the case, this is a bug.

Comment 3 Jean-Louis Dupond 2021-08-30 06:56:49 UTC
I don't know what's the ETA is for oVirt 4.5.0, but I think this bug deserves some more priority :)

The biggest issue now is that if you do concurrent incremental backups, you really need a ton of additional diskspace on your iSCSI LUN.
Take you backup 2 500G VM's, you need an additional 1TB extra free diskapace to be able to backup them.
This is a huge blocker to start using incremental backups in production.

Also ain't most of the work already done by Nir?

Comment 4 Eyal Shenitzky 2021-08-30 10:38:53 UTC
(In reply to Jean-Louis Dupond from comment #3)
> I don't know what's the ETA is for oVirt 4.5.0, but I think this bug
> deserves some more priority :)
> 
> The biggest issue now is that if you do concurrent incremental backups, you
> really need a ton of additional diskspace on your iSCSI LUN.
> Take you backup 2 500G VM's, you need an additional 1TB extra free diskapace
> to be able to backup them.
> This is a huge blocker to start using incremental backups in production.
> 
> Also ain't most of the work already done by Nir?

You are right and this feature is under development.
There is still much work to do but this one should be delivered in oVirt 4.5.

Comment 6 Arik 2022-01-19 15:32:09 UTC
Nir, do we miss anything for this bz?

Comment 7 Nir Soffer 2022-01-19 17:11:23 UTC
Yes, finish the work. The merged patches are just preparation for the actual work.

Comment 8 Arik 2022-01-19 17:21:47 UTC
ah wow, that's a lot of preparation patches :)
ok, thanks

Comment 9 Arik 2022-01-24 15:15:03 UTC
*** Bug 2043175 has been marked as a duplicate of this bug. ***

Comment 10 Arik 2022-03-16 12:51:09 UTC
We took a different approach for backups that renders this bz redundant (as the new method doesn't involve using scratch disks).
This new method will be available for testing as from oVirt 4.5 alpha (see bz 2053669)


Note You need to log in before you can comment on or make changes to this bug.