Bug 1852528 - Support for Block Storage Sparseness
Summary: Support for Block Storage Sparseness
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.3
Assignee: Michal Privoznik
QA Contact: gaojianan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-30 16:03 UTC by Steven Rosenberg
Modified: 2021-03-22 06:09 UTC (History)
10 users (show)

Fixed In Version: libvirt-6.6.0-3.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-17 17:49:34 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Steven Rosenberg 2020-06-30 16:03:20 UTC
Description of problem: When utilizing the libvirt sparseness functionality, specifying libvirt.VIR_STORAGE_VOL_DOWNLOAD_SPARSE_STREAM, the implementation does not include support for Block Storage. This is because a sparse file can exist only on a file system. However, an alternative is to use this dd implementation:

https://github.com/coreutils/coreutils/blob/947c553ff92b6edbc6f5a9f43172ebf7e58a1d21/src/system.h#L512

This would allow libvirt to send only the data parts of the image and report
the holes, and allow the client to write only the data parts, so we can save
network bandwidth twice, once on the libvirt connection, and once on when
writing to storage to shared storage, thus providing sparseness.

The mechanisms from the examples would need to change such as:

88 def recvSkipHandler(stream, length, opaque):
 89     opaque.done += length
 90     progress = min(99, opaque.done * 100 // opaque.estimated_size)
 91     write_progress(progress)
 92     fd = opaque.opaque
 93     cur = os.lseek(fd, length, os.SEEK_CUR)
 94     return os.ftruncate(fd, cur)


Because ftruncate would not work on Block Storage.

Truncating may be wrong for file storage since you change the image
size, which for raw-sparse volumes must be exactly the size specified
in vdsm metadata. If volume size does not match the size in vdsm
metadata, using this volume can lead to data corruption later.

 83 def bytesWriteHandler(stream, buf, opaque):
 84     fd = opaque.opaque
 85     return os.write(fd, buf)

os.write() does not support direct I/O. If you write lot of data to
oVirt storgae without using direct I/O you are risking the stability
of the entire system. Sanlock may have timeout, vms may become
unresponsive, and storage monitoring may fail, causing a host to become
non-operational.

Here is a similar bug we had in glance import:
https://bugzilla.redhat.com/1832967

167 def download_disk_sparse(stream, estimated_size, size, dest, bufsize):
168     fd = os.open(dest, os.O_WRONLY | os.O_CREAT | os.O_TRUNC)
169     op = Sparseness(fd, estimated_size)
170     with progress(op, estimated_size):
171         stream.sparseRecvAll(bytesWriteHandler, recvSkipHandler, op)
172     stream.finish()
173     os.close(fd)

You don't flush data to storage at the end of the download. The data
in the host page cache or in the storage server caches will be lost
if the host or storage have critial failure at this point, and the
disk may be corrupted silently even when the download succeeded. 

This should support at least COW file format with Allocation Type Sparseness on Block Storage. If it allows for RAW file format Allocation Type Sparseness on Block Storage, that could have added value being that we do not support that in other places. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Call the libvirt.VIR_STORAGE_VOL_DOWNLOAD_SPARSE_STREAM APIs for download files from a stream to a Block Storage Device. 
2.
3.

Actual results: Block Storage is not supported.


Expected results: The image is downloaded to the Block Storage with Thin Provisioning (Sparseness).


Additional info:

Comment 1 Michal Privoznik 2020-07-01 11:42:00 UTC
There are two problems here. The first one is that virStream callbacks provided by libvirt-python (recvSkipHandler for instance) do not work with block devices. The second problem is that libvirt doesn't do any zero block detection and thus sparse streams (where only non-zero blocks are transfered) can't be really used with block devices. The first one should be trivial to fix and can be worked around - users can implement their own callbacks to handle streams and pass them to stream.sparseRecvAll(). The second one is slightly more problematic.

Comment 2 Michal Privoznik 2020-07-03 11:29:45 UTC
v1 posted upstream:

https://www.redhat.com/archives/libvir-list/2020-July/msg00145.html

Comment 4 Michal Privoznik 2020-08-24 11:47:15 UTC
Pushed upstream as:

fd6b531cb2 virfdstream: Emulate skip for block devices
9e0ba037cd virshStreamInData: Handle block devices
6e0306fa26 virfdstream: Allow sparse stream vol-download
c2e1c414ef virshStreamSkip: Emulate skip for block devices
8a0c327f11 virsh: Track if vol-upload or vol-download work over a block device
9e745a9717 virsh: Pass virshStreamCallbackDataPtr to virshStreamSink() and virshStreamSkip()
70b67c98d9 libvirt-storage: Document volume upload/download stream format

Note, the automatic sparsification was dropped, because it was deemed undesirable and abusing sparse stream. We would need a separate flag for automatic sparsification at which point it could be used for regular files too, not just block devices. At this point, it is better to use some other tool instead of duplicating the code in libvirt. Anyway, linked patches allow block devices to be at sending/receiving end of sparse stream (hole sections are emulated on write, and never detected on read).

Comment 5 Michal Privoznik 2020-08-28 10:39:22 UTC
Removing FutureFeature keyword per comment 4.

Comment 10 gaojianan 2020-09-17 01:16:22 UTC
Verified on libvirt version:
libvirt-6.6.0-5.virtcov.el8.x86_64

Step:
Setup env:
 1. create a raw file and add it in vm
# qemu-img create -f raw /var/lib/libvirt/images/vol1 500M
Formatting '/var/lib/libvirt/images/vol1', fmt=raw size=524288000

# virsh edit v
          <disk type='file' device='disk'>
            <driver name='qemu' type='raw'/>
            <source file='/var/lib/libvirt/images/vol1'/>
            <target dev='sdb' bus='sata'/>
          </disk>

2. mkfs the disk in vm and use dd to write 100M data into it
# virsh console v
@guest# mkfs.ext4 -F /dev/sdb
@guest# mount /dev/sdb /mnt
@guest# # dd if=/dev/urandom of=/mnt/file bs=1M count=100; sync

Scenario1:Volume target is block device  --download
# cat pool.xml
<pool type='logical'>
<name>logical-pool</name>
<uuid>04edd85b-137b-4f13-8eec-31949a9ee9f5</uuid>
<capacity unit='bytes'>53682896896</capacity>
<allocation unit='bytes'>4261412864</allocation>
<available unit='bytes'>49421484032</available>
<source>
<device path='/dev/sda'/>
<name>logical-pool</name>
<format type='lvm2'/>
</source>
<target>
<path>/dev/logical-pool</path>
<permissions>
<mode>0755</mode>
<owner>-1</owner>
<group>-1</group>
</permissions>
</target>
</pool>

1.1 build and start the pool
# virsh pool-define pool.xml
Pool logical-pool defined from pool.xml

# virsh pool-build logical-pool
Pool logical-pool built

# virsh pool-start logical-pool
Pool logical-pool started

2.Prepare a volume in this logical pool
# virsh vol-create-as --pool logical-pool vol1 100M
Vol vol1 created

3.Try to download this volume with --sparse
# # virsh vol-download --pool images vol1  /dev/logical-pool/vol1 --sparse

3.1 check the vol download before
# qemu-img info /dev/logical-pool/vol1
image: /dev/logical-pool/vol1
file format: raw
virtual size: 500 MiB (524288000 bytes)
disk size: 132 MiB

Scenario 2: Volume source and target are both block device   --download
3.1 Create a new lv to the logical pool
# virsh vol-create-as --pool logical-pool vol2 200M
Vol vol2 created

# qemu-img info /dev/logical-pool/vol2
image: /dev/logical-pool/vol2
file format: raw
virtual size: 200 MiB (209715200 bytes)
disk size: 0 B

Try again with step 3.1 and set the target as block device
3.2# virsh vol-download --pool logical-pool vol1 /dev/logical-pool/vol2 --sparse

Check the volume download again:
# qemu-img info /dev/logical-pool/vol3
image: /dev/logical-pool/vol3
file format: raw
virtual size: 200 MiB (209715200 bytes)
disk size: 200 MiB

Scenario 3: Volume source is both block device --upload
Create a new file for uploading in the logical pool
# virsh vol-upload --pool images vol1 /dev/logical-pool/vol6 --sparse
Check the volume uploaded again:
# qemu-img info /dev/logical-pool/vol6
image: /dev/logical-pool/vol6
file format: raw
virtual size: 300 MiB (314572800 bytes)
disk size: 4 KiB

Scenario 4: Volume source and target are both block device --upload
# qemu-img create -f raw /dev/logical-pool/vol4 300M
Formatting '/dev/logical-pool/vol4', fmt=raw size=314572800
# virsh vol-upload --pool logical-pool vol2 /dev/logical-pool/vol4 --sparse

Check the volume uploaded again:
# qemu-img info /dev/logical-pool/vol4
image: /dev/logical-pool/vol4
file format: raw
virtual size: 300 MiB (314572800 bytes)
disk size: 4 KiB

Work as expect,set status to verified.

Comment 13 errata-xmlrpc 2020-11-17 17:49:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137

Comment 14 Meina Li 2021-03-19 06:41:51 UTC
Hi Michal,

I tested this bug with same verified steps but got different test result with comment 10 both in RHEL-AV 8.3.1 and RHEL-AV 8.4.0. I'm not sure if the previous test results are correct or if it's a new bug. 
Can you help to check the issues? Thanks in advance. 

RHEL-AV 8.3.1:
libvirt-6.6.0-13.module+el8.3.1+9548+0a8fede5.x86_64
qemu-kvm-5.1.0-18.module+el8.3.1+9507+32d6953c.x86_64

RHEL-AV 8.4.0:
libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64
qemu-kvm-5.2.0-12.module+el8.4.0+10354+98272afe.x86_64

Test Steps:
1. Create a image and use dd to write 100M data into it in vm.
# qemu-img create /var/lib/libvirt/images/test.img 500M 
Formatting '/var/lib/libvirt/images/test.img', fmt=raw size=524288000
# virsh console vm
@guest# mkfs.ext4 -F /dev/vdb
@guest# mount /dev/vdb /mnt
@guest# # dd if=/dev/urandom of=/mnt/file bs=1M count=100; sync
# qemu-img info /var/lib/libvirt/images/test.img 
image: /var/lib/libvirt/images/test.img
file format: raw
virtual size: 500 MiB (524288000 bytes)
disk size: 120 MiB


2. Prepare a logical pool and create a volume in the logical pool
# virsh pool-list 
 Name           State    Autostart
------------------------------------
 default        active   yes
 logical-pool   active   no
# virsh vol-create-as --pool logical-pool vol1 100M   ---The size of vol1 is same with the previous verified step
Vol vol1 created
# virsh vol-create-as --pool logical-pool vol2 500M   ---The size of vol2 is larger than the disk size of test.img
Vol vol2 created

3. Use vol1 to test vol-download
# virsh vol-download --pool default test.img  /dev/logical-pool/vol1 --sparse
error: cannot receive data from volume vol1
error: recv handler failed: No space left on device    --- Has an error now but it may be correct?

4. Use vol2 to test vol-download
# virsh vol-download --pool default test.img /dev/logical-pool/vol2 --sparse
# qemu-img info /dev/logical-pool/vol2
image: /dev/logical-pool/vol2
file format: raw
virtual size: 500 MiB (524288000 bytes)
disk size: 0 B                                         ---Not be changed as expected

5. Use 'qemu-io' to check if /dev/logical-pool/vol2 has same data as '--pool default vol1'
# qemu-io -c "read -vC 0 500M" /dev/logical-pool/vol2 -r -U  | sed '$d' > vol2.download.sparse
# qemu-io -c "read -vC 0 500M" /var/lib/libvirt/images/test.img -r -U  | sed '$d' > vol1.original

Comment 15 Michal Privoznik 2021-03-19 08:41:49 UTC
(In reply to Meina Li from comment #14)
> Hi Michal,
> 
> I tested this bug with same verified steps but got different test result
> with comment 10 both in RHEL-AV 8.3.1 and RHEL-AV 8.4.0. I'm not sure if the
> previous test results are correct or if it's a new bug. 
> Can you help to check the issues? Thanks in advance. 
> 
> RHEL-AV 8.3.1:
> libvirt-6.6.0-13.module+el8.3.1+9548+0a8fede5.x86_64
> qemu-kvm-5.1.0-18.module+el8.3.1+9507+32d6953c.x86_64
> 
> RHEL-AV 8.4.0:
> libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64
> qemu-kvm-5.2.0-12.module+el8.4.0+10354+98272afe.x86_64
> 
> Test Steps:
> 1. Create a image and use dd to write 100M data into it in vm.
> # qemu-img create /var/lib/libvirt/images/test.img 500M 
> Formatting '/var/lib/libvirt/images/test.img', fmt=raw size=524288000
> # virsh console vm
> @guest# mkfs.ext4 -F /dev/vdb
> @guest# mount /dev/vdb /mnt
> @guest# # dd if=/dev/urandom of=/mnt/file bs=1M count=100; sync
> # qemu-img info /var/lib/libvirt/images/test.img 
> image: /var/lib/libvirt/images/test.img
> file format: raw
> virtual size: 500 MiB (524288000 bytes)
> disk size: 120 MiB
> 
> 
> 2. Prepare a logical pool and create a volume in the logical pool
> # virsh pool-list 
>  Name           State    Autostart
> ------------------------------------
>  default        active   yes
>  logical-pool   active   no
> # virsh vol-create-as --pool logical-pool vol1 100M   ---The size of vol1 is
> same with the previous verified step
> Vol vol1 created
> # virsh vol-create-as --pool logical-pool vol2 500M   ---The size of vol2 is
> larger than the disk size of test.img
> Vol vol2 created
> 
> 3. Use vol1 to test vol-download
> # virsh vol-download --pool default test.img  /dev/logical-pool/vol1 --sparse
> error: cannot receive data from volume vol1
> error: recv handler failed: No space left on device    --- Has an error now
> but it may be correct?

This is definitely correct. You want to store 500MiB worth of data into an LV that has just 100MiB.
Stream sparseness is not strictly about preserving sparse files, but about using bandwidth effectively. That is, if you had 200GiB image, full of zeroes except for the last 1GiB where some data is stored, then it makes no sense to read those 199GiB full of zeroes. What --sparse does is that it tells the other side how many zeroes there are (199GiB), and then sends just that last 1GiB. It's only a neat side effect that the receiving side (might) create a sparse file as it advances the file it is storing data into for those 199GiB.
Now on block devices there is no such thing as "holes" and writer (stream receiver = virsh vol-download) has to write all those zeroes out (199GiB in my example, 500MiB - 120MiB in your example). At any rate, block devices have to be capable to store at least the 'virtual size'.

> 
> 4. Use vol2 to test vol-download
> # virsh vol-download --pool default test.img /dev/logical-pool/vol2 --sparse
> # qemu-img info /dev/logical-pool/vol2
> image: /dev/logical-pool/vol2
> file format: raw
> virtual size: 500 MiB (524288000 bytes)
> disk size: 0 B                                         ---Not be changed as
> expected

Yes, I believe that here qemu-img failed to count how many bytes are actually written. I don't know how exactly does qemu-img count this, but if it deducts it from file size, then this obviously won't work on block devices (which have fixed size).

> 
> 5. Use 'qemu-io' to check if /dev/logical-pool/vol2 has same data as '--pool
> default vol1'
> # qemu-io -c "read -vC 0 500M" /dev/logical-pool/vol2 -r -U  | sed '$d' >
> vol2.download.sparse
> # qemu-io -c "read -vC 0 500M" /var/lib/libvirt/images/test.img -r -U  | sed
> '$d' > vol1.original

Is there a difference here? Or what seems to be the problem?

Comment 16 Meina Li 2021-03-19 09:44:33 UTC
(In reply to Michal Privoznik from comment #15)
> > 3. Use vol1 to test vol-download
> > # virsh vol-download --pool default test.img  /dev/logical-pool/vol1 --sparse
> > error: cannot receive data from volume vol1
> > error: recv handler failed: No space left on device    --- Has an error now
> > but it may be correct?
> 
> This is definitely correct. You want to store 500MiB worth of data into an
> LV that has just 100MiB.
> Stream sparseness is not strictly about preserving sparse files, but about
> using bandwidth effectively. That is, if you had 200GiB image, full of
> zeroes except for the last 1GiB where some data is stored, then it makes no
> sense to read those 199GiB full of zeroes. What --sparse does is that it
> tells the other side how many zeroes there are (199GiB), and then sends just
> that last 1GiB. It's only a neat side effect that the receiving side (might)
> create a sparse file as it advances the file it is storing data into for
> those 199GiB.
> Now on block devices there is no such thing as "holes" and writer (stream# diff vol2.download.sparse vol1.original
> receiver = virsh vol-download) has to write all those zeroes out (199GiB in
> my example, 500MiB - 120MiB in your example). At any rate, block devices
> have to be capable to store at least the 'virtual size'.
>

Thanks for your detailed introduction about this. 
So the verified steps in comment 10 was wrong, which tested with a lower size and finally success. 
I'll test again.

> > 
> > 4. Use vol2 to test vol-download
> > # virsh vol-download --pool default test.img /dev/logical-pool/vol2 --sparse
> > # qemu-img info /dev/logical-pool/vol2
> > image: /dev/logical-pool/vol2
> > file format: raw
> > virtual size: 500 MiB (524288000 bytes)
> > disk size: 0 B                                         ---Not be changed as
> > expected
> 
> Yes, I believe that here qemu-img failed to count how many bytes are
> actually written. I don't know how exactly does qemu-img count this, but if
> it deducts it from file size, then this obviously won't work on block
> devices (which have fixed size).
> 
> > 
> > 5. Use 'qemu-io' to check if /dev/logical-pool/vol2 has same data as '--pool
> > default vol1'
> > # qemu-io -c "read -vC 0 500M" /dev/logical-pool/vol2 -r -U  | sed '$d' >
> > vol2.download.sparse
> > # qemu-io -c "read -vC 0 500M" /var/lib/libvirt/images/test.img -r -U  | sed
> > '$d' > vol1.original
> 
> Is there a difference here? Or what seems to be the problem?

Sorry for omit one step:
# diff vol2.download.sparse vol1.original
This step means the original file and the download vol have the same datas.
No other problems on this.

Thanks again.


Note You need to log in before you can comment on or make changes to this bug.