Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1913387

Summary: [CBT] [RFE] Extend backup scratch disk as needed
Product: [oVirt] vdsm Reporter: Nir Soffer <nsoffer>
Component: CoreAssignee: Nir Soffer <nsoffer>
Status: CLOSED WONTFIX QA Contact: Evelina Shames <eshames>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.40.40CC: aefrat, ahadas, bugs, eshames, jean-louis, md, yisun, Yury.Panchenko
Target Milestone: ---Keywords: FutureFeature, ZStream
Target Release: ---Flags: sbonazzo: ovirt-4.5-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-16 12:51:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1913315, 2017928    
Bug Blocks: 1913389    

Description Nir Soffer 2021-01-06 16:02:02 UTC
Description of problem:

During backup, when guest write data that is part of the backup, qemu 
coopies the old data from the disk to the scratch disk before writing to the 
data to disk. When the scratch disk becomes too full, vdsm need to extend
the disk.

Libvirt will support monitoring scratch disk block threshold in RHEL 8.4
(bug 1913315). Here is an example of backup xml showing the scratch disk
deails:

$ virsh backup-dumpxml backup-test
<domainbackup mode='pull'>
  <server transport='tcp' name='localhost' port='1234'/>
  <disks>
    <disk name='vda' backup='yes' type='file' backupmode='full' exportname='vda' index='4'>
      <driver type='qcow2'/>
      <scratch file='/tmp/backup-test-images/scratch-vda.qcow2'/>
    </disk>
    <disk name='hda' backup='no'/>
  </disks>
</domainbackup>

Vdsm need to extract the name ('vda') and index (index='4') and use them ("vda[4]") as the name argument to virDomain.setBlockThreshold().

After starting backup, vdsm need to set write threshold on the scratch disk.

When handling block threshold event, vdsm need to mark the scratch disk for
extension, and schedule async extend operation.

Async extend operation need to send an extend message to the spm, and wait
for extend reply.

If an extend request fails or times out, vdsm need to retry and send a new
extend request.

When the scratch disk is extended successfully, vdsm needs to refresh the
volume, and set a new block threshold.

This mechanism is similar to the way normal disks are extended, but the
current mechanism assumes that only the top volume of the disk is monitored
or extended, so it cannot be used as is.

If vdsm was restarted during a backup, it may miss the block threshold event,
so it needs to query the allocation of the scratch disk, and trigger an
extend if needed. To check the current allocation we can use the bulk stats 
API (virConnectGetAllDomainStats/virDomainListGetStats) as "block.
<num>.allocation" in the VIR_DOMAIN_STATS_BLOCK group. The current threshold
value is reported as "block.<num>.threshold".

The size of the chunks and the threshold can use the existing configuration
(irs:volume_utilization_percent, irs:volume_utilization_chunk_mb).

Same mechanism is needed also for live merge and live storage migration.

- In live merge, we use only initial extension. This may not be enough for
  merging the active layer. In this case live merge will fail and the user
  will have to retry the merge.

- In live storage migration we have complicated mechanism monitoring the
  source disk and extending the target disk. This can be replaced by
  monitoring the target disk and extending it separately.

This will be hard to implement and may require more than one zstream cycle.

Comment 1 Jean-Louis Dupond 2021-03-22 09:22:57 UTC
I think this is a major issue currently and causing incremental backups to be useless at this moment.
Cause if you are on oVirt 4.4.5 (which already uses scratch disks) but without this, you end up with pauzed VM's as the scratch disk is never extended.

Shouldn't we create scratch disks with size == disk size instead of thin provision them until this is fixed?
Its not an ideal situation as you might need much more storage, but the current situation is even worse I think.

Comment 2 Nir Soffer 2021-03-22 09:32:15 UTC
(In reply to Jean-Louis Dupond from comment #1)
> Shouldn't we create scratch disks with size == disk size instead of thin
> provision them until this is fixed?

This is what we do now - we create thin disk with initial size equal equal
to the original disk virtual size.

When the disk is created, you should be able to see the initial_size= argument,
it must be the virtual size of the original disk. This will allocated logical
volume of virtual size * 1.1 on storage, and create a qcow2 image on this
logical volume.

If this is not the case, this is a bug.

Comment 3 Jean-Louis Dupond 2021-08-30 06:56:49 UTC
I don't know what's the ETA is for oVirt 4.5.0, but I think this bug deserves some more priority :)

The biggest issue now is that if you do concurrent incremental backups, you really need a ton of additional diskspace on your iSCSI LUN.
Take you backup 2 500G VM's, you need an additional 1TB extra free diskapace to be able to backup them.
This is a huge blocker to start using incremental backups in production.

Also ain't most of the work already done by Nir?

Comment 4 Eyal Shenitzky 2021-08-30 10:38:53 UTC
(In reply to Jean-Louis Dupond from comment #3)
> I don't know what's the ETA is for oVirt 4.5.0, but I think this bug
> deserves some more priority :)
> 
> The biggest issue now is that if you do concurrent incremental backups, you
> really need a ton of additional diskspace on your iSCSI LUN.
> Take you backup 2 500G VM's, you need an additional 1TB extra free diskapace
> to be able to backup them.
> This is a huge blocker to start using incremental backups in production.
> 
> Also ain't most of the work already done by Nir?

You are right and this feature is under development.
There is still much work to do but this one should be delivered in oVirt 4.5.

Comment 6 Arik 2022-01-19 15:32:09 UTC
Nir, do we miss anything for this bz?

Comment 7 Nir Soffer 2022-01-19 17:11:23 UTC
Yes, finish the work. The merged patches are just preparation for the actual work.

Comment 8 Arik 2022-01-19 17:21:47 UTC
ah wow, that's a lot of preparation patches :)
ok, thanks

Comment 9 Arik 2022-01-24 15:15:03 UTC
*** Bug 2043175 has been marked as a duplicate of this bug. ***

Comment 10 Arik 2022-03-16 12:51:09 UTC
We took a different approach for backups that renders this bz redundant (as the new method doesn't involve using scratch disks).
This new method will be available for testing as from oVirt 4.5 alpha (see bz 2053669)