Description of problem: When utilizing the libvirt sparseness functionality, specifying libvirt.VIR_STORAGE_VOL_DOWNLOAD_SPARSE_STREAM, the implementation does not include support for Block Storage. This is because a sparse file can exist only on a file system. However, an alternative is to use this dd implementation: https://github.com/coreutils/coreutils/blob/947c553ff92b6edbc6f5a9f43172ebf7e58a1d21/src/system.h#L512 This would allow libvirt to send only the data parts of the image and report the holes, and allow the client to write only the data parts, so we can save network bandwidth twice, once on the libvirt connection, and once on when writing to storage to shared storage, thus providing sparseness. The mechanisms from the examples would need to change such as: 88 def recvSkipHandler(stream, length, opaque): 89 opaque.done += length 90 progress = min(99, opaque.done * 100 // opaque.estimated_size) 91 write_progress(progress) 92 fd = opaque.opaque 93 cur = os.lseek(fd, length, os.SEEK_CUR) 94 return os.ftruncate(fd, cur) Because ftruncate would not work on Block Storage. Truncating may be wrong for file storage since you change the image size, which for raw-sparse volumes must be exactly the size specified in vdsm metadata. If volume size does not match the size in vdsm metadata, using this volume can lead to data corruption later. 83 def bytesWriteHandler(stream, buf, opaque): 84 fd = opaque.opaque 85 return os.write(fd, buf) os.write() does not support direct I/O. If you write lot of data to oVirt storgae without using direct I/O you are risking the stability of the entire system. Sanlock may have timeout, vms may become unresponsive, and storage monitoring may fail, causing a host to become non-operational. Here is a similar bug we had in glance import: https://bugzilla.redhat.com/1832967 167 def download_disk_sparse(stream, estimated_size, size, dest, bufsize): 168 fd = os.open(dest, os.O_WRONLY | os.O_CREAT | os.O_TRUNC) 169 op = Sparseness(fd, estimated_size) 170 with progress(op, estimated_size): 171 stream.sparseRecvAll(bytesWriteHandler, recvSkipHandler, op) 172 stream.finish() 173 os.close(fd) You don't flush data to storage at the end of the download. The data in the host page cache or in the storage server caches will be lost if the host or storage have critial failure at this point, and the disk may be corrupted silently even when the download succeeded. This should support at least COW file format with Allocation Type Sparseness on Block Storage. If it allows for RAW file format Allocation Type Sparseness on Block Storage, that could have added value being that we do not support that in other places. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Call the libvirt.VIR_STORAGE_VOL_DOWNLOAD_SPARSE_STREAM APIs for download files from a stream to a Block Storage Device. 2. 3. Actual results: Block Storage is not supported. Expected results: The image is downloaded to the Block Storage with Thin Provisioning (Sparseness). Additional info:
There are two problems here. The first one is that virStream callbacks provided by libvirt-python (recvSkipHandler for instance) do not work with block devices. The second problem is that libvirt doesn't do any zero block detection and thus sparse streams (where only non-zero blocks are transfered) can't be really used with block devices. The first one should be trivial to fix and can be worked around - users can implement their own callbacks to handle streams and pass them to stream.sparseRecvAll(). The second one is slightly more problematic.
v1 posted upstream: https://www.redhat.com/archives/libvir-list/2020-July/msg00145.html
v2: https://www.redhat.com/archives/libvir-list/2020-July/msg00292.html
Pushed upstream as: fd6b531cb2 virfdstream: Emulate skip for block devices 9e0ba037cd virshStreamInData: Handle block devices 6e0306fa26 virfdstream: Allow sparse stream vol-download c2e1c414ef virshStreamSkip: Emulate skip for block devices 8a0c327f11 virsh: Track if vol-upload or vol-download work over a block device 9e745a9717 virsh: Pass virshStreamCallbackDataPtr to virshStreamSink() and virshStreamSkip() 70b67c98d9 libvirt-storage: Document volume upload/download stream format Note, the automatic sparsification was dropped, because it was deemed undesirable and abusing sparse stream. We would need a separate flag for automatic sparsification at which point it could be used for regular files too, not just block devices. At this point, it is better to use some other tool instead of duplicating the code in libvirt. Anyway, linked patches allow block devices to be at sending/receiving end of sparse stream (hole sections are emulated on write, and never detected on read).
Removing FutureFeature keyword per comment 4.
To POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2020-August/msg00261.html Scratch build can be found here: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=31002229 http://brew-task-repos.usersys.redhat.com/repos/scratch/mprivozn/libvirt/6.6.0/3.el8_rc.9b168aa093/libvirt-6.6.0-3.el8_rc.9b168aa093-scratch.repo
Verified on libvirt version: libvirt-6.6.0-5.virtcov.el8.x86_64 Step: Setup env: 1. create a raw file and add it in vm # qemu-img create -f raw /var/lib/libvirt/images/vol1 500M Formatting '/var/lib/libvirt/images/vol1', fmt=raw size=524288000 # virsh edit v <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/vol1'/> <target dev='sdb' bus='sata'/> </disk> 2. mkfs the disk in vm and use dd to write 100M data into it # virsh console v @guest# mkfs.ext4 -F /dev/sdb @guest# mount /dev/sdb /mnt @guest# # dd if=/dev/urandom of=/mnt/file bs=1M count=100; sync Scenario1:Volume target is block device --download # cat pool.xml <pool type='logical'> <name>logical-pool</name> <uuid>04edd85b-137b-4f13-8eec-31949a9ee9f5</uuid> <capacity unit='bytes'>53682896896</capacity> <allocation unit='bytes'>4261412864</allocation> <available unit='bytes'>49421484032</available> <source> <device path='/dev/sda'/> <name>logical-pool</name> <format type='lvm2'/> </source> <target> <path>/dev/logical-pool</path> <permissions> <mode>0755</mode> <owner>-1</owner> <group>-1</group> </permissions> </target> </pool> 1.1 build and start the pool # virsh pool-define pool.xml Pool logical-pool defined from pool.xml # virsh pool-build logical-pool Pool logical-pool built # virsh pool-start logical-pool Pool logical-pool started 2.Prepare a volume in this logical pool # virsh vol-create-as --pool logical-pool vol1 100M Vol vol1 created 3.Try to download this volume with --sparse # # virsh vol-download --pool images vol1 /dev/logical-pool/vol1 --sparse 3.1 check the vol download before # qemu-img info /dev/logical-pool/vol1 image: /dev/logical-pool/vol1 file format: raw virtual size: 500 MiB (524288000 bytes) disk size: 132 MiB Scenario 2: Volume source and target are both block device --download 3.1 Create a new lv to the logical pool # virsh vol-create-as --pool logical-pool vol2 200M Vol vol2 created # qemu-img info /dev/logical-pool/vol2 image: /dev/logical-pool/vol2 file format: raw virtual size: 200 MiB (209715200 bytes) disk size: 0 B Try again with step 3.1 and set the target as block device 3.2# virsh vol-download --pool logical-pool vol1 /dev/logical-pool/vol2 --sparse Check the volume download again: # qemu-img info /dev/logical-pool/vol3 image: /dev/logical-pool/vol3 file format: raw virtual size: 200 MiB (209715200 bytes) disk size: 200 MiB Scenario 3: Volume source is both block device --upload Create a new file for uploading in the logical pool # virsh vol-upload --pool images vol1 /dev/logical-pool/vol6 --sparse Check the volume uploaded again: # qemu-img info /dev/logical-pool/vol6 image: /dev/logical-pool/vol6 file format: raw virtual size: 300 MiB (314572800 bytes) disk size: 4 KiB Scenario 4: Volume source and target are both block device --upload # qemu-img create -f raw /dev/logical-pool/vol4 300M Formatting '/dev/logical-pool/vol4', fmt=raw size=314572800 # virsh vol-upload --pool logical-pool vol2 /dev/logical-pool/vol4 --sparse Check the volume uploaded again: # qemu-img info /dev/logical-pool/vol4 image: /dev/logical-pool/vol4 file format: raw virtual size: 300 MiB (314572800 bytes) disk size: 4 KiB Work as expect,set status to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137
Hi Michal, I tested this bug with same verified steps but got different test result with comment 10 both in RHEL-AV 8.3.1 and RHEL-AV 8.4.0. I'm not sure if the previous test results are correct or if it's a new bug. Can you help to check the issues? Thanks in advance. RHEL-AV 8.3.1: libvirt-6.6.0-13.module+el8.3.1+9548+0a8fede5.x86_64 qemu-kvm-5.1.0-18.module+el8.3.1+9507+32d6953c.x86_64 RHEL-AV 8.4.0: libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64 qemu-kvm-5.2.0-12.module+el8.4.0+10354+98272afe.x86_64 Test Steps: 1. Create a image and use dd to write 100M data into it in vm. # qemu-img create /var/lib/libvirt/images/test.img 500M Formatting '/var/lib/libvirt/images/test.img', fmt=raw size=524288000 # virsh console vm @guest# mkfs.ext4 -F /dev/vdb @guest# mount /dev/vdb /mnt @guest# # dd if=/dev/urandom of=/mnt/file bs=1M count=100; sync # qemu-img info /var/lib/libvirt/images/test.img image: /var/lib/libvirt/images/test.img file format: raw virtual size: 500 MiB (524288000 bytes) disk size: 120 MiB 2. Prepare a logical pool and create a volume in the logical pool # virsh pool-list Name State Autostart ------------------------------------ default active yes logical-pool active no # virsh vol-create-as --pool logical-pool vol1 100M ---The size of vol1 is same with the previous verified step Vol vol1 created # virsh vol-create-as --pool logical-pool vol2 500M ---The size of vol2 is larger than the disk size of test.img Vol vol2 created 3. Use vol1 to test vol-download # virsh vol-download --pool default test.img /dev/logical-pool/vol1 --sparse error: cannot receive data from volume vol1 error: recv handler failed: No space left on device --- Has an error now but it may be correct? 4. Use vol2 to test vol-download # virsh vol-download --pool default test.img /dev/logical-pool/vol2 --sparse # qemu-img info /dev/logical-pool/vol2 image: /dev/logical-pool/vol2 file format: raw virtual size: 500 MiB (524288000 bytes) disk size: 0 B ---Not be changed as expected 5. Use 'qemu-io' to check if /dev/logical-pool/vol2 has same data as '--pool default vol1' # qemu-io -c "read -vC 0 500M" /dev/logical-pool/vol2 -r -U | sed '$d' > vol2.download.sparse # qemu-io -c "read -vC 0 500M" /var/lib/libvirt/images/test.img -r -U | sed '$d' > vol1.original
(In reply to Meina Li from comment #14) > Hi Michal, > > I tested this bug with same verified steps but got different test result > with comment 10 both in RHEL-AV 8.3.1 and RHEL-AV 8.4.0. I'm not sure if the > previous test results are correct or if it's a new bug. > Can you help to check the issues? Thanks in advance. > > RHEL-AV 8.3.1: > libvirt-6.6.0-13.module+el8.3.1+9548+0a8fede5.x86_64 > qemu-kvm-5.1.0-18.module+el8.3.1+9507+32d6953c.x86_64 > > RHEL-AV 8.4.0: > libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64 > qemu-kvm-5.2.0-12.module+el8.4.0+10354+98272afe.x86_64 > > Test Steps: > 1. Create a image and use dd to write 100M data into it in vm. > # qemu-img create /var/lib/libvirt/images/test.img 500M > Formatting '/var/lib/libvirt/images/test.img', fmt=raw size=524288000 > # virsh console vm > @guest# mkfs.ext4 -F /dev/vdb > @guest# mount /dev/vdb /mnt > @guest# # dd if=/dev/urandom of=/mnt/file bs=1M count=100; sync > # qemu-img info /var/lib/libvirt/images/test.img > image: /var/lib/libvirt/images/test.img > file format: raw > virtual size: 500 MiB (524288000 bytes) > disk size: 120 MiB > > > 2. Prepare a logical pool and create a volume in the logical pool > # virsh pool-list > Name State Autostart > ------------------------------------ > default active yes > logical-pool active no > # virsh vol-create-as --pool logical-pool vol1 100M ---The size of vol1 is > same with the previous verified step > Vol vol1 created > # virsh vol-create-as --pool logical-pool vol2 500M ---The size of vol2 is > larger than the disk size of test.img > Vol vol2 created > > 3. Use vol1 to test vol-download > # virsh vol-download --pool default test.img /dev/logical-pool/vol1 --sparse > error: cannot receive data from volume vol1 > error: recv handler failed: No space left on device --- Has an error now > but it may be correct? This is definitely correct. You want to store 500MiB worth of data into an LV that has just 100MiB. Stream sparseness is not strictly about preserving sparse files, but about using bandwidth effectively. That is, if you had 200GiB image, full of zeroes except for the last 1GiB where some data is stored, then it makes no sense to read those 199GiB full of zeroes. What --sparse does is that it tells the other side how many zeroes there are (199GiB), and then sends just that last 1GiB. It's only a neat side effect that the receiving side (might) create a sparse file as it advances the file it is storing data into for those 199GiB. Now on block devices there is no such thing as "holes" and writer (stream receiver = virsh vol-download) has to write all those zeroes out (199GiB in my example, 500MiB - 120MiB in your example). At any rate, block devices have to be capable to store at least the 'virtual size'. > > 4. Use vol2 to test vol-download > # virsh vol-download --pool default test.img /dev/logical-pool/vol2 --sparse > # qemu-img info /dev/logical-pool/vol2 > image: /dev/logical-pool/vol2 > file format: raw > virtual size: 500 MiB (524288000 bytes) > disk size: 0 B ---Not be changed as > expected Yes, I believe that here qemu-img failed to count how many bytes are actually written. I don't know how exactly does qemu-img count this, but if it deducts it from file size, then this obviously won't work on block devices (which have fixed size). > > 5. Use 'qemu-io' to check if /dev/logical-pool/vol2 has same data as '--pool > default vol1' > # qemu-io -c "read -vC 0 500M" /dev/logical-pool/vol2 -r -U | sed '$d' > > vol2.download.sparse > # qemu-io -c "read -vC 0 500M" /var/lib/libvirt/images/test.img -r -U | sed > '$d' > vol1.original Is there a difference here? Or what seems to be the problem?
(In reply to Michal Privoznik from comment #15) > > 3. Use vol1 to test vol-download > > # virsh vol-download --pool default test.img /dev/logical-pool/vol1 --sparse > > error: cannot receive data from volume vol1 > > error: recv handler failed: No space left on device --- Has an error now > > but it may be correct? > > This is definitely correct. You want to store 500MiB worth of data into an > LV that has just 100MiB. > Stream sparseness is not strictly about preserving sparse files, but about > using bandwidth effectively. That is, if you had 200GiB image, full of > zeroes except for the last 1GiB where some data is stored, then it makes no > sense to read those 199GiB full of zeroes. What --sparse does is that it > tells the other side how many zeroes there are (199GiB), and then sends just > that last 1GiB. It's only a neat side effect that the receiving side (might) > create a sparse file as it advances the file it is storing data into for > those 199GiB. > Now on block devices there is no such thing as "holes" and writer (stream# diff vol2.download.sparse vol1.original > receiver = virsh vol-download) has to write all those zeroes out (199GiB in > my example, 500MiB - 120MiB in your example). At any rate, block devices > have to be capable to store at least the 'virtual size'. > Thanks for your detailed introduction about this. So the verified steps in comment 10 was wrong, which tested with a lower size and finally success. I'll test again. > > > > 4. Use vol2 to test vol-download > > # virsh vol-download --pool default test.img /dev/logical-pool/vol2 --sparse > > # qemu-img info /dev/logical-pool/vol2 > > image: /dev/logical-pool/vol2 > > file format: raw > > virtual size: 500 MiB (524288000 bytes) > > disk size: 0 B ---Not be changed as > > expected > > Yes, I believe that here qemu-img failed to count how many bytes are > actually written. I don't know how exactly does qemu-img count this, but if > it deducts it from file size, then this obviously won't work on block > devices (which have fixed size). > > > > > 5. Use 'qemu-io' to check if /dev/logical-pool/vol2 has same data as '--pool > > default vol1' > > # qemu-io -c "read -vC 0 500M" /dev/logical-pool/vol2 -r -U | sed '$d' > > > vol2.download.sparse > > # qemu-io -c "read -vC 0 500M" /var/lib/libvirt/images/test.img -r -U | sed > > '$d' > vol1.original > > Is there a difference here? Or what seems to be the problem? Sorry for omit one step: # diff vol2.download.sparse vol1.original This step means the original file and the download vol have the same datas. No other problems on this. Thanks again.