Description of problem: The RPC protocol behind the virStorageVolUpload/Download APIs is pretty inefficient when it comes to handling sparse files. They are backed by the virStreamPtr APIs which asynchronously send the data packets as a continuous stream. For sparse files this means we'll be potentially transferring many GBs worth of zeros. This is clearly stupid. We could potentially improve this with a small enhancement to the RPC protocol. Extend the virNetMessageType enum to add a VIR_NET_MESSAGE_TYPE_STREAM_HOLE. This is a variant on the VIR_NET_MESSAGE_TYPE_STREAM packet. Instead of the payload being the actual data to transfer, the payload would be a single 64-bit integer. This would represent the number of zero bytes associated with the hole. We can wire this up to virStorageVolUpload/Download reasonably easily. - virStorageVolUpload - examine the data from the client app for regions of zeros, and turn these into VIR_NET_MESSAGE_TYPE_STREAM_HOLE, instead of VIR_NET_MESSAGE_TYPE_STREAM if there are > N continuous zeros, where N is say 512 bytes. - virStorageVolDownload - when receiving a VIR_NET_MESAGE_TYPE_STREAM_HOLE packet, allocate a buffer of the suitable size and fill it with zeros and pass it onto the client app. This avoids the need for any public API changes. If we want to allow apps to opt-in to public API changes though, we could define new variants of virStreamSend/virStreamRecv that allowed for handling holes, without passing around buffers full of zeros. The overall goal is that using virStorageVolUpload/Download should be on a par with rsync in terms of the amount of data it needs to transfer. Separately, we should also consider whether to enable compression of storage vol uploads/downloads Version-Release number of selected component (if applicable): 1.2.19
Don't know if this is the appropriate forum for a libvirt api discussion. There are 2 parts to this. There's the protocol part which you mention above, which will be a huge performance improvement. With just this in place, the flow would be: 1. Client reads hole from relevant metadata. 2. Client generates 4GB of zeroes. 3. Client passes 4GB of zeroes to libvirt. 4. libvirt scans 4GB of zeroes, and determines that they're all zeroes. 5. libvirt sends hole across network. 6. dest libvirt generates 4GB of zeroes. 7. dest scans 4GB of zeroes, and determines that they're all zeroes. 8. dest writes hole to disk. While it would require a new api, or an extension to the existing api, it would be much nicer to be able to do: 1. Client reads hole from relevant metadata. 2. Client sends hole to libvirt. 3. libvirt sends hole to dest. 4. dest writes hole to disk.
Since this is being tracked against RHEL, which is actually going to motivate change compared to the upstream tracker, duping to the RHEL bug *** This bug has been marked as a duplicate of bug 1282859 ***