Bug 1282795

Summary: RFE: streams: vol-upload/download to efficiently transfer sparseness
Product: [Community] Virtualization Tools Reporter: Daniel BerrangĂ© <berrange>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED DUPLICATE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: crobinso, kchamart, mbooth, rbalakri
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1282859 (view as bug list) Environment:
Last Closed: 2016-04-10 22:44:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1282859    

Description Daniel Berrangé 2015-11-17 12:54:13 UTC
Description of problem:
The RPC protocol behind the virStorageVolUpload/Download APIs is pretty inefficient when it comes to handling sparse files. They are backed by the virStreamPtr APIs which asynchronously send the data packets as a continuous stream.  For sparse files this means we'll be potentially transferring many GBs worth of zeros. This is clearly stupid.

We could potentially improve this with a small enhancement to the RPC protocol.

Extend the virNetMessageType enum to add a  VIR_NET_MESSAGE_TYPE_STREAM_HOLE.

This is a variant on the VIR_NET_MESSAGE_TYPE_STREAM packet. Instead of the payload being the actual data to transfer, the payload would be a single 64-bit integer. This would represent the number of zero bytes associated with the hole.

We can wire this up to virStorageVolUpload/Download reasonably easily.

 - virStorageVolUpload - examine the data from the client app for regions of zeros, and turn these into VIR_NET_MESSAGE_TYPE_STREAM_HOLE, instead of VIR_NET_MESSAGE_TYPE_STREAM if there are > N continuous zeros, where N is say 512 bytes.

 - virStorageVolDownload - when receiving a VIR_NET_MESAGE_TYPE_STREAM_HOLE packet, allocate a buffer of the suitable size and fill it with zeros and pass it onto the client app.

This avoids the need for any public API changes.

If we want to allow apps to opt-in to public API changes though, we could define new variants of virStreamSend/virStreamRecv that allowed for handling holes, without passing around buffers full of zeros.

The overall goal is that using virStorageVolUpload/Download should be on a par with rsync in terms of the amount of data it needs to transfer.

Separately, we should also consider whether to enable compression of storage vol uploads/downloads

Version-Release number of selected component (if applicable):
1.2.19

Comment 1 Matthew Booth 2015-11-17 16:17:34 UTC
Don't know if this is the appropriate forum for a libvirt api discussion.

There are 2 parts to this. There's the protocol part which you mention above, which will be a huge performance improvement. With just this in place, the flow would be:

1. Client reads hole from relevant metadata.
2. Client generates 4GB of zeroes.
3. Client passes 4GB of zeroes to libvirt.
4. libvirt scans 4GB of zeroes, and determines that they're all zeroes.
5. libvirt sends hole across network.
6. dest libvirt generates 4GB of zeroes.
7. dest scans 4GB of zeroes, and determines that they're all zeroes.
8. dest writes hole to disk.

While it would require a new api, or an extension to the existing api, it would be much nicer to be able to do:

1. Client reads hole from relevant metadata.
2. Client sends hole to libvirt.
3. libvirt sends hole to dest.
4. dest writes hole to disk.

Comment 2 Cole Robinson 2016-04-10 22:44:17 UTC
Since this is being tracked against RHEL, which is actually going to motivate change compared to the upstream tracker, duping to the RHEL bug

*** This bug has been marked as a duplicate of bug 1282859 ***