Bug 1282859 - RFE: Improve streams / virStorageVolUpload/Download to efficiently transfer sparseness.
RFE: Improve streams / virStorageVolUpload/Download to efficiently transfer s...
Status: VERIFIED
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.2
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Michal Privoznik
yisun
: FutureFeature, Upstream
: 1282795 (view as bug list)
Depends On: 1282795
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-17 11:18 EST by Matthew Booth
Modified: 2017-11-20 08:35 EST (History)
12 users (show)

See Also:
Fixed In Version: libvirt-3.7.0-1.el7
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1282795
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Booth 2015-11-17 11:18:17 EST
+++ This bug was initially created as a clone of Bug #1282795 +++

Description of problem:
The RPC protocol behind the virStorageVolUpload/Download APIs is pretty inefficient when it comes to handling sparse files. They are backed by the virStreamPtr APIs which asynchronously send the data packets as a continuous stream.  For sparse files this means we'll be potentially transferring many GBs worth of zeros. This is clearly stupid.

We could potentially improve this with a small enhancement to the RPC protocol.

Extend the virNetMessageType enum to add a  VIR_NET_MESSAGE_TYPE_STREAM_HOLE.

This is a variant on the VIR_NET_MESSAGE_TYPE_STREAM packet. Instead of the payload being the actual data to transfer, the payload would be a single 64-bit integer. This would represent the number of zero bytes associated with the hole.

We can wire this up to virStorageVolUpload/Download reasonably easily.

 - virStorageVolUpload - examine the data from the client app for regions of zeros, and turn these into VIR_NET_MESSAGE_TYPE_STREAM_HOLE, instead of VIR_NET_MESSAGE_TYPE_STREAM if there are > N continuous zeros, where N is say 512 bytes.

 - virStorageVolDownload - when receiving a VIR_NET_MESAGE_TYPE_STREAM_HOLE packet, allocate a buffer of the suitable size and fill it with zeros and pass it onto the client app.

This avoids the need for any public API changes.

If we want to allow apps to opt-in to public API changes though, we could define new variants of virStreamSend/virStreamRecv that allowed for handling holes, without passing around buffers full of zeros.

The overall goal is that using virStorageVolUpload/Download should be on a par with rsync in terms of the amount of data it needs to transfer.

Separately, we should also consider whether to enable compression of storage vol uploads/downloads

Version-Release number of selected component (if applicable):
1.2.19

--- Additional comment from Matthew Booth on 2015-11-17 16:17:34 GMT ---

Don't know if this is the appropriate forum for a libvirt api discussion.

There are 2 parts to this. There's the protocol part which you mention above, which will be a huge performance improvement. With just this in place, the flow would be:

1. Client reads hole from relevant metadata.
2. Client generates 4GB of zeroes.
3. Client passes 4GB of zeroes to libvirt.
4. libvirt scans 4GB of zeroes, and determines that they're all zeroes.
5. libvirt sends hole across network.
6. dest libvirt generates 4GB of zeroes.
7. dest scans 4GB of zeroes, and determines that they're all zeroes.
8. dest writes hole to disk.

While it would require a new api, or an extension to the existing api, it would be much nicer to be able to do:

1. Client reads hole from relevant metadata.
2. Client sends hole to libvirt.
3. libvirt sends hole to dest.
4. dest writes hole to disk.
Comment 2 Michal Privoznik 2015-12-07 08:21:32 EST
Matthew, this is very interesting topic indeed. Let me post RFC onto the libvirt's mailing list and see what are our options.
Comment 3 Michal Privoznik 2015-12-07 08:47:35 EST
Posted here:

https://www.redhat.com/archives/libvir-list/2015-December/msg00249.html
Comment 4 Cole Robinson 2016-04-10 18:44:17 EDT
*** Bug 1282795 has been marked as a duplicate of this bug. ***
Comment 5 Michal Privoznik 2016-04-28 06:07:23 EDT
Patches proposed here:

https://www.redhat.com/archives/libvir-list/2016-April/msg01869.html
Comment 9 Michal Privoznik 2017-04-13 09:33:19 EDT
Another attempt:

https://www.redhat.com/archives/libvir-list/2017-April/msg00671.html
Comment 10 Michal Privoznik 2017-04-20 07:58:47 EDT
And another one:

https://www.redhat.com/archives/libvir-list/2017-April/msg00889.html
Comment 11 Michal Privoznik 2017-05-16 10:05:05 EDT
And another one:

https://www.redhat.com/archives/libvir-list/2017-May/msg00499.html
Comment 12 Michal Privoznik 2017-05-18 02:07:42 EDT
I've just pushed the patches upstream:

commit 7823e2561b59d3738d62dd8e4b88d5d552f156e9 (HEAD -> master, origin/master, origin/HEAD, sparse_streams2)
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Wed Apr 27 14:21:10 2016 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Thu May 18 07:42:13 2017 +0200

    virsh: Implement sparse stream to vol-upload
    
    Similarly to previous commit, implement sparse streams feature
    for vol-upload. This is, however, slightly different approach,
    because we must implement a function that will tell us whether
    we are in a data section or in a hole. But there's no magic
    hidden in here.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>

commit f03b44b2dfeae1a0a3ee122a181c0159c9a18400
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Tue Apr 12 15:35:04 2016 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Thu May 18 07:42:13 2017 +0200

    virsh: Implement sparse stream to vol-download
    
    Add a new --sparse switch that does nothing more than
    enables the sparse streams feature for this command. Among with
    the switch new helper function is introduced: virshStreamSkip().
    This is the callback that is called whenever daemon sends us a
    hole. In the callback we reflect the hole in underlying file by
    seeking as many bytes as told.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>

/* plus a ton of patches before these two fellas */

v3.3.0-88-g7823e2561
Comment 14 yisun 2017-10-31 06:53:27 EDT
Hi Michal, 
I have 2 questions about this fix, pls help with them. (with libvirt-3.8.0-1.el7.x86_64)

1. when vol-download a local sparse file, the original image's allocation is different with the new one, is this a problem? steps as follow:
=====================

## virsh vol-download /var/lib/libvirt/images/3G.raw /tmp/sparse --sparse

## qemu-img info /var/lib/libvirt/images/3G.raw 
image: /var/lib/libvirt/images/3G.raw
file format: raw
virtual size: 3.0G (3221225472 bytes)
disk size: ** 214M **

## qemu-img info /tmp/sparse 
image: /tmp/sparse
file format: raw
virtual size: 3.0G (3221225472 bytes)
disk size: ** 352M **

## diff /var/lib/libvirt/images/3G.raw /tmp/sparse; echo $?
0
<=== well, they have same content, so seems some holes are actually filled with zeros.


2. does this work with remote volume such as iscsi/gluster/ceph volumes? I tried with iscsi pool and error produced, as follow:
=====================

## virsh pool-dumpxml iscsi_pool
<pool type='iscsi'>
  <name>iscsi_pool</name>
  <uuid>485b3368-b46a-4ad4-a0fb-0924ad35822a</uuid>
  <capacity unit='bytes'>1048576000</capacity>
  <allocation unit='bytes'>1048576000</allocation>
  <available unit='bytes'>0</available>
  <source>
    <host name='10.66.5.64' port='3260'/>
    <device path='iqn.2016-03.com.virttest:logical-pool.target'/>
  </source>
  <target>
    <path>/dev/disk/by-path</path>
  </target>
</pool>

## virsh vol-list iscsi_pool
 Name                 Path                                    
------------------------------------------------------------------------------
 unit:0:0:0           /dev/disk/by-path/ip-10.66.5.64:3260-iscsi-iqn.2016-03.com.virttest:logical-pool.target-lun-0

## virsh vol-download --pool iscsi_pool unit:0:0:0 /tmp/iscsi_no_sparse --sparse
error: cannot close volume unit:0:0:0
error: Unable to seek to data: Invalid argument
<====== error happened
Comment 15 Michal Privoznik 2017-10-31 09:30:47 EDT
(In reply to yisun from comment #14)
> Hi Michal, 
> I have 2 questions about this fix, pls help with them. (with
> libvirt-3.8.0-1.el7.x86_64)
> 
> 1. when vol-download a local sparse file, the original image's allocation is
> different with the new one, is this a problem? steps as follow:
> =====================
> 
> ## virsh vol-download /var/lib/libvirt/images/3G.raw /tmp/sparse --sparse
> 
> ## qemu-img info /var/lib/libvirt/images/3G.raw 
> image: /var/lib/libvirt/images/3G.raw
> file format: raw
> virtual size: 3.0G (3221225472 bytes)
> disk size: ** 214M **
> 
> ## qemu-img info /tmp/sparse 
> image: /tmp/sparse
> file format: raw
> virtual size: 3.0G (3221225472 bytes)
> disk size: ** 352M **
> 
> ## diff /var/lib/libvirt/images/3G.raw /tmp/sparse; echo $?
> 0
> <=== well, they have same content, so seems some holes are actually filled
> with zeros.

This is expected. When "creating" a hole in the file, libvirt does lseek() behind the EOF. And then writes the actual data. Some filesystems try to be clever and just allocate the hole fully if small enough. Typically, XFS does this. On EXT4 I had better results. One way out of this would be to punch holes at the end of copying process, but I think that'd be overkill after all. The point of this feature is to transfer data effectively between two hosts, not to preserve sparseness 1:1.

> 
> 
> 2. does this work with remote volume such as iscsi/gluster/ceph volumes? I
> tried with iscsi pool and error produced, as follow:
> =====================
> 
> ## virsh pool-dumpxml iscsi_pool
> <pool type='iscsi'>
>   <name>iscsi_pool</name>
>   <uuid>485b3368-b46a-4ad4-a0fb-0924ad35822a</uuid>
>   <capacity unit='bytes'>1048576000</capacity>
>   <allocation unit='bytes'>1048576000</allocation>
>   <available unit='bytes'>0</available>
>   <source>
>     <host name='10.66.5.64' port='3260'/>
>     <device path='iqn.2016-03.com.virttest:logical-pool.target'/>
>   </source>
>   <target>
>     <path>/dev/disk/by-path</path>
>   </target>
> </pool>
> 
> ## virsh vol-list iscsi_pool
>  Name                 Path                                    
> -----------------------------------------------------------------------------
> -
>  unit:0:0:0          
> /dev/disk/by-path/ip-10.66.5.64:3260-iscsi-iqn.2016-03.com.virttest:logical-
> pool.target-lun-0
> 
> ## virsh vol-download --pool iscsi_pool unit:0:0:0 /tmp/iscsi_no_sparse
> --sparse
> error: cannot close volume unit:0:0:0
> error: Unable to seek to data: Invalid argument
> <====== error happened

Again, this is expected. iSCSI is a block layer. Not file system layer. Therefore it knows nothing about sparse files. In fact, it knows nothing about files at all.
Comment 16 yisun 2017-11-01 03:30:40 EDT
So I tried with nfs pool and still failed, the error message is not "error: Unable to seek to data: Invalid argument" but "error: Unable to seek to data: Operation not supported"
On NFS server:
# cat /etc/exports
/home/nfs *(rw,no_root_squash)

On test host:
## virsh pool-dumpxml nfs
<pool type='netfs'>
  <name>nfs</name>
  <uuid>e6fce6a1-fda5-445b-bce7-feabfa940def</uuid>
  <capacity unit='bytes'>437290795008</capacity>
  <allocation unit='bytes'>473956352</allocation>
  <available unit='bytes'>436816838656</available>
  <source>
    <host name='10.66.5.64'/>
    <dir path='/home/nfs/'/>
    <format type='auto'/>
  </source>
  <target>
    <path>/nfs</path>
    <permissions>
      <mode>0755</mode>
      <owner>0</owner>
      <group>0</group>
      <label>system_u:object_r:nfs_t:s0</label>
    </permissions>
  </target>
</pool>

## virsh vol-list --pool nfs
 Name                 Path                                    
------------------------------------------------------------------------------
 2G.raw               /nfs/2G.raw  


## qemu-img info /nfs/2G.raw 
image: /nfs/2G.raw
file format: raw
virtual size: 2.0G (2147483648 bytes)
disk size: 397M

## virsh vol-download --pool nfs 2G.raw /tmp/2GDownload_sparse.raw --sparse
error: cannot close volume 2G.raw
error: Unable to seek to data: Operation not supported


(In reply to Michal Privoznik from comment #15)
> Again, this is expected. iSCSI is a block layer. Not file system layer.
> Therefore it knows nothing about sparse files. In fact, it knows nothing
> about files at all.
Comment 17 yisun 2017-11-01 03:32:12 EDT
btw, I turned on virt_use_nfs
## getsebool virt_use_nfs
virt_use_nfs --> on
Comment 18 Michal Privoznik 2017-11-01 05:08:04 EDT
(In reply to yisun from comment #16)
> So I tried with nfs pool and still failed, the error message is not "error:
> Unable to seek to data: Invalid argument" but "error: Unable to seek to
> data: Operation not supported"

Yeah. NFS is stupid when it comes to sparse files. It allows you to create sparse file but then it acts like the file is fully allocated. Thus it fails on SEEK_DATA/SEEK_HOLE. I think it's better if we fail in this case since we can't really preserve sparseness, can we?
Comment 19 yisun 2017-11-13 03:28:06 EST
(In reply to Michal Privoznik from comment #18)
> (In reply to yisun from comment #16)
> > So I tried with nfs pool and still failed, the error message is not "error:
> > Unable to seek to data: Invalid argument" but "error: Unable to seek to
> > data: Operation not supported"
> 
> Yeah. NFS is stupid when it comes to sparse files. It allows you to create
> sparse file but then it acts like the file is fully allocated. Thus it fails
> on SEEK_DATA/SEEK_HOLE. I think it's better if we fail in this case since we
> can't really preserve sparseness, can we?

OK, I'll cover this rfe with simple case for now, let's see if other guys need more functions about this in future. 

Verified on libvirt-3.9.0-1.el7.x86_64

1. create a raw file and add it in vm
## qemu-img create -f raw /var/lib/libvirt/images/test.raw 1G
Formatting '/var/lib/libvirt/images/test.raw', fmt=raw size=1073741824

## virsh edit v
          <disk type='file' device='disk'>
            <driver name='qemu' type='raw'/>
            <source file='/var/lib/libvirt/images/test.raw'/>
            <target dev='sdb' bus='sata'/>
          </disk>


## virsh start v

2. mkfs the disk in vm and use dd to write 100M data into it
## virsh console v
	@guest# mkfs.ext4 -F /dev/sdb
	@guest# mount /dev/sdb /mnt
	@guest# # dd if=/dev/urandom of=/mnt/file bs=1M count=100; sync


3. check the raw file disk size
## qemu-img info /var/lib/libvirt/images/test.raw
image: /var/lib/libvirt/images/test.raw
file format: raw
virtual size: 1.0G (1073741824 bytes)
disk size: 149M

4. use vol-download --sparse to download the volume and check the new file's disk size, which should not be larger than 149M but much smaller than 1.0G

## virsh vol-download /var/lib/libvirt/images/test.raw /tmp/download-sparse.raw --sparse

## qemu-img info /tmp/download-sparse.raw 
image: /tmp/download-sparse.raw
file format: raw
virtual size: 1.0G (1073741824 bytes)
disk size: 224M

5. prepare a exiting vol in default pool
## qemu-img create -f raw /var/lib/libvirt/images/upload-sparse 1G; qemu-img info /var/lib/libvirt/images/upload-sparse 
Formatting '/var/lib/libvirt/images/upload-sparse', fmt=raw size=1073741824
image: /var/lib/libvirt/images/upload-sparse
file format: raw
virtual size: 1.0G (1073741824 bytes)
disk size: 0

## virsh pool-refresh default
Pool default refreshed

6. upload the sparse file to the default pool with --sparse
## virsh vol-upload --pool default /var/lib/libvirt/images/upload-sparse /tmp/download-sparse.raw --sparse

7. check the existing vol, it should actually use around 224M disk size.
## qemu-img info /var/lib/libvirt/images/upload-sparse 
image: /var/lib/libvirt/images/upload-sparse
file format: raw
virtual size: 1.0G (1073741824 bytes)
disk size: 224M

Note You need to log in before you can comment on or make changes to this bug.