Bug 1282859 - RFE: Improve streams / virStorageVolUpload/Download to efficiently transfer sparseness.
RFE: Improve streams / virStorageVolUpload/Download to efficiently transfer s...
Status: ON_QA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.2
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Michal Privoznik
yisun
: FutureFeature, Upstream
: 1282795 (view as bug list)
Depends On: 1282795
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-17 11:18 EST by Matthew Booth
Modified: 2017-09-05 05:30 EDT (History)
10 users (show)

See Also:
Fixed In Version: libvirt-3.7.0-1.el7
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1282795
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Booth 2015-11-17 11:18:17 EST
+++ This bug was initially created as a clone of Bug #1282795 +++

Description of problem:
The RPC protocol behind the virStorageVolUpload/Download APIs is pretty inefficient when it comes to handling sparse files. They are backed by the virStreamPtr APIs which asynchronously send the data packets as a continuous stream.  For sparse files this means we'll be potentially transferring many GBs worth of zeros. This is clearly stupid.

We could potentially improve this with a small enhancement to the RPC protocol.

Extend the virNetMessageType enum to add a  VIR_NET_MESSAGE_TYPE_STREAM_HOLE.

This is a variant on the VIR_NET_MESSAGE_TYPE_STREAM packet. Instead of the payload being the actual data to transfer, the payload would be a single 64-bit integer. This would represent the number of zero bytes associated with the hole.

We can wire this up to virStorageVolUpload/Download reasonably easily.

 - virStorageVolUpload - examine the data from the client app for regions of zeros, and turn these into VIR_NET_MESSAGE_TYPE_STREAM_HOLE, instead of VIR_NET_MESSAGE_TYPE_STREAM if there are > N continuous zeros, where N is say 512 bytes.

 - virStorageVolDownload - when receiving a VIR_NET_MESAGE_TYPE_STREAM_HOLE packet, allocate a buffer of the suitable size and fill it with zeros and pass it onto the client app.

This avoids the need for any public API changes.

If we want to allow apps to opt-in to public API changes though, we could define new variants of virStreamSend/virStreamRecv that allowed for handling holes, without passing around buffers full of zeros.

The overall goal is that using virStorageVolUpload/Download should be on a par with rsync in terms of the amount of data it needs to transfer.

Separately, we should also consider whether to enable compression of storage vol uploads/downloads

Version-Release number of selected component (if applicable):
1.2.19

--- Additional comment from Matthew Booth on 2015-11-17 16:17:34 GMT ---

Don't know if this is the appropriate forum for a libvirt api discussion.

There are 2 parts to this. There's the protocol part which you mention above, which will be a huge performance improvement. With just this in place, the flow would be:

1. Client reads hole from relevant metadata.
2. Client generates 4GB of zeroes.
3. Client passes 4GB of zeroes to libvirt.
4. libvirt scans 4GB of zeroes, and determines that they're all zeroes.
5. libvirt sends hole across network.
6. dest libvirt generates 4GB of zeroes.
7. dest scans 4GB of zeroes, and determines that they're all zeroes.
8. dest writes hole to disk.

While it would require a new api, or an extension to the existing api, it would be much nicer to be able to do:

1. Client reads hole from relevant metadata.
2. Client sends hole to libvirt.
3. libvirt sends hole to dest.
4. dest writes hole to disk.
Comment 2 Michal Privoznik 2015-12-07 08:21:32 EST
Matthew, this is very interesting topic indeed. Let me post RFC onto the libvirt's mailing list and see what are our options.
Comment 3 Michal Privoznik 2015-12-07 08:47:35 EST
Posted here:

https://www.redhat.com/archives/libvir-list/2015-December/msg00249.html
Comment 4 Cole Robinson 2016-04-10 18:44:17 EDT
*** Bug 1282795 has been marked as a duplicate of this bug. ***
Comment 5 Michal Privoznik 2016-04-28 06:07:23 EDT
Patches proposed here:

https://www.redhat.com/archives/libvir-list/2016-April/msg01869.html
Comment 9 Michal Privoznik 2017-04-13 09:33:19 EDT
Another attempt:

https://www.redhat.com/archives/libvir-list/2017-April/msg00671.html
Comment 10 Michal Privoznik 2017-04-20 07:58:47 EDT
And another one:

https://www.redhat.com/archives/libvir-list/2017-April/msg00889.html
Comment 11 Michal Privoznik 2017-05-16 10:05:05 EDT
And another one:

https://www.redhat.com/archives/libvir-list/2017-May/msg00499.html
Comment 12 Michal Privoznik 2017-05-18 02:07:42 EDT
I've just pushed the patches upstream:

commit 7823e2561b59d3738d62dd8e4b88d5d552f156e9 (HEAD -> master, origin/master, origin/HEAD, sparse_streams2)
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Wed Apr 27 14:21:10 2016 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Thu May 18 07:42:13 2017 +0200

    virsh: Implement sparse stream to vol-upload
    
    Similarly to previous commit, implement sparse streams feature
    for vol-upload. This is, however, slightly different approach,
    because we must implement a function that will tell us whether
    we are in a data section or in a hole. But there's no magic
    hidden in here.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>

commit f03b44b2dfeae1a0a3ee122a181c0159c9a18400
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Tue Apr 12 15:35:04 2016 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Thu May 18 07:42:13 2017 +0200

    virsh: Implement sparse stream to vol-download
    
    Add a new --sparse switch that does nothing more than
    enables the sparse streams feature for this command. Among with
    the switch new helper function is introduced: virshStreamSkip().
    This is the callback that is called whenever daemon sends us a
    hole. In the callback we reflect the hole in underlying file by
    seeking as many bytes as told.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>

/* plus a ton of patches before these two fellas */

v3.3.0-88-g7823e2561

Note You need to log in before you can comment on or make changes to this bug.