Description of problem: imageio supports zero API since 1.3.0, but using it requires advanced client that understand sparseness, and writing non-standard HTTP API. Users using simple clients such as curl, or uploading preallocated images full of zeros from engine UI will get a preallocated image on the destination volume. We can support zero detection on the server side, writing zero blocks using zero apis. This will create sparse image on storage, and will be much master when using storage supporting sparseness. Here is how it can work: 1. User send PUT request 2. Server read chunk from the socket 3. Server use ioutil.is_zero() to detect if the chunk is full of zeros 4. If the chunk is zero, use zero() or trim() internally. 5. Otherwise use write(). Here is example flow: client daemon put ----------------> write zero write <----------------- For simplicity, I think we can use the same buffer_size configuration for zeroing granularity. If would be nice to support this in the proxy, avoiding sending zeros over the wire, but we need chunked encoding support to do this. It can work like this: 1. client send PUT request 2. proxy reach chunk and detect zeros 3. If current request is data and chunk is zero, or current request is zero and chunk is data, end the current request 4. proxy start a new request for this chunk (PUT or PATCH) 5. When all chunk read and send to daemon, return response Here is a flow: client proxy daemon put ----------------> put ---------------> zero ---------------> put ---------------> <------------------ If the storage does not support efficient zero or trim (detected by the first attempt to zero or trim, we can disable zero detection for the upload. Alternative solution - use qemu-nbd. Once we support qemu-nbd, we will get this feature for free.
I think this can be solved by using the NBD backend with these options: # preallocated images --detect-zeroes=on # sparse images on file storage --detect-zeroes=unmap --discard=unmap The options must be set in NBD.start_server() in vdsm. We may need to add new configuration to the API, and engine can request zero detection. Zero detection may slow down uploads when it is not needed, so we should not enable it by default without testing how it affects performance.
Fixing this on the server may be good idea to support sparse uploads from UI, but when we upload via SDK, we should detect the zeros earlier. Looks like "qemu-img convert" does zero detection when reading, so maybe it works also for qemu-nbd with some --image-opts.
This request is not currently committed to 4.4.z, moving it to 4.5
I tested the server side, this works: $ fallocate --length 100m test.img $ ls -lhs test.img 100M -rw-rw-r--. 1 nsoffer nsoffer 100M Jun 29 15:34 test.img $ qemu-nbd --socket /tmp/nbd.sock --format raw --discard=unmap --detect-zeroes=unmap --cache=none --aio=native copy.img $ python >>> from ovirt_imageio._internal import nbd >>> buf = b"\0" * 1024**2 >>> with nbd.Client(nbd.UnixAddress("/tmp/nbd.sock")) as c: ... for i in range(0, 100 * 1024**2, 1024**2): ... c.write(i, buf) ... c.flush() ... $ ls -lhs test.img 0 -rw-rw-r--. 1 nsoffer nsoffer 100M Jun 29 15:37 test.img We can enable this for uploads to sparse volumes.
This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.
This bug didn't get any attention in a long time, and it's not planned in foreseeable future. oVirt development team has no plans to work on it. Please feel free to reopen if you have a plan how to contribute this feature/bug fix.
This is still relevant, and we have an easy way to implement this in qemu-nbd. (see comment 4).
This should be fixed in vdsm by configuring --detect-zeroes=unmap for sparse volumes, and --detect-zeroes=on for preallocated volumes.
Vdsm side is handled by https://gerrit.ovirt.org/c/vdsm/+/117009 Engine side needs trivial patch to enable the new detect_zeroes option.
How to test - uploading empty image via browser 1. User uploads empty image from the browser qemu-img create -f raw empty.raw 6g Upload the image via browser... 2. Engine send detect_zero: True 2021-10-07 02:56:14,120+0300 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] Calling 'NBD.start_server' in bridge with {'server_id': '758fe7ca-9465-42b4-8b73-6e861764dd98', 'config': {'detect_zeroes': True, 'discard': True, 'readonly': False, 'bitmap': None, 'sd_id': '8ece2aae-5c72-4a5c-b23b-74bae65c88e1', 'img_id': 'e396be65-7821-4978-a14c-d39019e049b6', 'vol_id': '75a8af60-7d92-4 7ef-8483-1792a1006c21', 'backing_chain': True}} (__init__:333) 2. supervdsm starts qemu-nbd with --detect-zeros=unmap MainProcess|jsonrpc/1::DEBUG::2021-10-07 02:56:14,128::commands::153::common.commands::(start) /usr/bin/taskset --cpu-list 0-1 /usr/bin/systemd-run --unit=vdsm-nbd-758fe7ca-9465-42b4-8b73-6e861764dd98.service --uid=36 --gid=36 /usr/bin/qemu-nbd --socke t /run/vdsm/nbd/758fe7ca-9465-42b4-8b73-6e861764dd98.sock --persistent --shared=8 --export-name= --allocation-depth --cache=no ne --aio=native --discard=unmap --detect-zeroes=unmap 'json:{"driver": "raw", "file": {"driver": "file", "filename": "/rhev/da ta-center/mnt/alpine:_00/8ece2aae-5c72-4a5c-b23b-74bae65c88e1/images/e396be65-7821-4978-a14c-d39019e049b6/75a8af60-7d92-47ef-8 483-1792a1006c21"}}' (cwd None) 3. Browser send image data (6g of zeroes) 4. Disk actual size remain 0 <disk href="/ovirt-engine/api/disks/e396be65-7821-4978-a14c-d39019e049b6" id="e396be65-7821-4978-a14c-d39019e049b6"> <name>empty-6g</name> <actual_size>0</actual_size> <alias>empty-6g</alias> <format>raw</format> <provisioned_size>6442450944</provisioned_size> <sparse>true</sparse> ... </disk>
How to test - uploading empty image via browser with older vdsm Example flow when using old vdsm that does not support zero detection: 1. Uploading empty image from the browser See comment 10. 2. Engine sends detect_zero: True 2021-10-07 02:45:20,485+0300 DEBUG (jsonrpc/3) [jsonrpc.JsonRpcServer] Calling 'NBD.start_server' in bridge with {'server_id': '6f7687ce-ff8f-450b-a3d2-ee40f802d349', 'config': {'detect_zeroes': True, 'discard': True, 'readonly': False, 'bitmap': None, 'sd_id': '8ece2aae-5c72-4a5c-b23b-74bae65c88e1', 'img_id': 'd21e682b-a7a4-4cbf-b38d-ef86cd5e8ae8', 'vol_id': '40e9b046-89ab-4 cd6-8992-a5bfd05daadf', 'backing_chain': True}} (__init__:333) 3. Vdsm ignores the unknown option and starts nbd server normally MainProcess|jsonrpc/3::DEBUG::2021-10-07 02:45:20,493::supervdsm_server::95::SuperVdsm.ServerCallback::(wrapper) call nbd_star t_transient_service with ('6f7687ce-ff8f-450b-a3d2-ee40f802d349', QemuNBDConfig(format='raw', readonly=False, discard=True, pa th='/rhev/data-center/mnt/alpine:_00/8ece2aae-5c72-4a5c-b23b-74bae65c88e1/images/d21e682b-a7a4-4cbf-b38d-ef86cd5e8ae8/40e9b046 -89ab-4cd6-8992-a5bfd05daadf', backing_chain=True, is_block=False, bitmap=None)) {} MainProcess|jsonrpc/3::DEBUG::2021-10-07 02:45:20,494::commands::153::common.commands::(start) /usr/bin/taskset --cpu-list 0-1 /usr/bin/systemd-run --unit=vdsm-nbd-6f7687ce-ff8f-450b-a3d2-ee40f802d349.service --uid=36 --gid=36 /usr/bin/qemu-nbd --socke t /run/vdsm/nbd/6f7687ce-ff8f-450b-a3d2-ee40f802d349.sock --persistent --shared=8 --export-name= --allocation-depth --cache=no ne --aio=native --discard=unmap 'json:{"driver": "raw", "file": {"driver": "file", "filename": "/rhev/data-center/mnt/alpine:_ 00/8ece2aae-5c72-4a5c-b23b-74bae65c88e1/images/d21e682b-a7a4-4cbf-b38d-ef86cd5e8ae8/40e9b046-89ab-4cd6-8992-a5bfd05daadf"}}' ( cwd None) 4. Browser sends image data (all zeroes), creating fully allocated image. Check disk actual size in using SDK, in this case the actual size will be equal to the virtual size.
How to test - uploading fully allocated zero image via SDK Upload fully allocated image from SDK flow: 1. Create fully allocated zero image $ dd if=/dev/zero bs=1M count=6144 of=zero-6g.raw 6144+0 records in 6144+0 records out 6442450944 bytes (6.4 GB, 6.0 GiB) copied, 2.17719 s, 3.0 GB/s 2. Upload image to raw-sparse volume $ ./upload_disk.py -c engine-dev --sd-name=alpine-nfs-00 --disk-sparse zero-6g.raw [ 0.0 ] Checking image... [ 0.0 ] Image format: raw [ 0.0 ] Disk format: raw [ 0.0 ] Disk content type: data [ 0.0 ] Disk provisioned size: 6442450944 [ 0.0 ] Disk initial size: 6442450944 [ 0.0 ] Disk name: zero-6g.raw [ 0.0 ] Disk backup: False [ 0.0 ] Connecting... [ 0.0 ] Creating disk... [ 15.7 ] Disk ID: c35377e7-00e9-4020-a574-2eac0182adcb [ 15.7 ] Creating image transfer... [ 17.0 ] Transfer ID: 13058c68-4a3e-4b28-84b7-77cff2fb7d15 [ 17.0 ] Transfer host name: host4 [ 17.0 ] Uploading image... [ 100.00% ] 6.00 GiB, 10.39 seconds, 591.46 MiB/s [ 27.4 ] Finalizing image transfer... [ 31.5 ] Upload completed successfully 3. in imageio log, we see 4 connections writing data, no zero requests 2021-10-07 03:15:21,828 INFO (Thread-34) [http] CLOSE connection=34 client=::ffff:192.168.122.1 [connection 1 ops, 10.243405 s] [dispatch 14 ops, 10.171128 s] [write 12 ops, 10.160605 s, 1.50 GiB, 151.17 MiB/s] [write.read 192 ops, 7.048409 s, 1.50 GiB, 217.92 MiB/s] [write.write 192 ops, 3.096443 s, 1.50 GiB, 496.05 MiB/s] [flush 1 ops, 0.000577 s] 2021-10-07 03:15:21,853 INFO (Thread-37) [http] CLOSE connection=37 client=::ffff:192.168.122.1 [connection 1 ops, 10.205645 s] [dispatch 13 ops, 10.177969 s] [write 12 ops, 10.152687 s, 1.50 GiB, 151.29 MiB/s] [write.read 192 ops, 7.063180 s, 1.50 GiB, 217.47 MiB/s] [write.write 192 ops, 3.073920 s, 1.50 GiB, 499.69 MiB/s] [flush 1 ops, 0.004650 s] 2021-10-07 03:15:21,945 INFO (Thread-35) [http] CLOSE connection=35 client=::ffff:192.168.122.1 [connection 1 ops, 10.306756 s] [dispatch 13 ops, 10.292804 s] [write 12 ops, 10.284349 s, 1.50 GiB, 149.35 MiB/s] [write.read 192 ops, 7.294485 s, 1.50 GiB, 210.57 MiB/s] [write.write 192 ops, 2.973314 s, 1.50 GiB, 516.60 MiB/s] [flush 1 ops, 0.000138 s] 2021-10-07 03:15:21,963 INFO (Thread-36) [http] CLOSE connection=36 client=::ffff:192.168.122.1 [connection 1 ops, 10.321620 s] [dispatch 13 ops, 10.305040 s] [write 12 ops, 10.288394 s, 1.50 GiB, 149.29 MiB/s] [write.read 192 ops, 7.098438 s, 1.50 GiB, 216.39 MiB/s] [write.write 192 ops, 3.174500 s, 1.50 GiB, 483.86 MiB/s] [flush 1 ops, 0.000137 s] 4. Image actual size is 0 <disk href="/ovirt-engine/api/disks/c35377e7-00e9-4020-a574-2eac0182adcb" id="c35377e7-00e9-4020-a574-2eac0182adcb"> <name>zero-6g.raw</name> <description>Uploaded disk</description> <actual_size>0</actual_size> <alias>zero-6g.raw</alias> <format>raw</format> <provisioned_size>6442450944</provisioned_size> <sparse>true</sparse> ... </disk>
Performance improvement example: Uploading to raw preallocated image $ ./upload_disk.py -c engine-dev --sd-name=alpine-nfs-00 /data/scratch/nsoffer/zero-6g.raw [ 0.0 ] Checking image... [ 0.0 ] Image format: raw [ 0.0 ] Disk format: raw [ 0.0 ] Disk content type: data [ 0.0 ] Disk provisioned size: 6442450944 [ 0.0 ] Disk initial size: 6442450944 [ 0.0 ] Disk name: zero-6g.raw [ 0.0 ] Disk backup: False [ 0.0 ] Connecting... [ 0.0 ] Creating disk... [ 8.5 ] Disk ID: d3657525-2c25-4121-b873-bde56fb5ac36 [ 8.5 ] Creating image transfer... [ 9.8 ] Transfer ID: c0e243ff-ea56-49e2-b0c6-1852eb4baf76 [ 9.8 ] Transfer host name: host4 [ 9.8 ] Uploading image... [ 100.00% ] 6.00 GiB, 10.53 seconds, 583.75 MiB/s [ 20.4 ] Finalizing image transfer... [ 23.5 ] Upload completed successfully Uploading same image when zero detection is disabled: $ ./upload_disk.py -c engine-dev --sd-name=alpine-nfs-00 /data/scratch/nsoffer/zero-6g.raw [ 0.0 ] Checking image... [ 0.0 ] Image format: raw [ 0.0 ] Disk format: raw [ 0.0 ] Disk content type: data [ 0.0 ] Disk provisioned size: 6442450944 [ 0.0 ] Disk initial size: 6442450944 [ 0.0 ] Disk name: zero-6g.raw [ 0.0 ] Disk backup: False [ 0.0 ] Connecting... [ 0.0 ] Creating disk... [ 8.5 ] Disk ID: f9e9a7bc-be5c-4d6f-8272-648aef3d4c6d [ 8.5 ] Creating image transfer... [ 9.8 ] Transfer ID: 5fbf5a82-6302-466d-8370-48046ca9aa9b [ 9.8 ] Transfer host name: host4 [ 9.8 ] Uploading image... [ 100.00% ] 6.00 GiB, 18.16 seconds, 338.35 MiB/s [ 28.0 ] Finalizing image transfer... [ 34.1 ] Upload completed successfully 1.72 times faster with zero detection.
More info on the problem and how it is fixed now. ## The problem There are 2 use cases: 1. Uploading a sparse image with a client that does not understand sparseness (e.g. browser). When uploading using smart client (such as ovirt-imageio-client) unallocated areas in raw image are uploaded using efficient PATCH/zero request, without sending the zeroes over the wire. When uploading to disk with thin allocation policy, these areas remain unallocated. Regardless of the disk type, uploading using PATCH/zero request is usually extremely fast. When uploading using client that does not understand sparseness like a browser, the unallocated areas are sent as actual zeroes to imageio server, and written to storage, allocating space. This is incorrect when using disk with thin allocation policy, and much slower. 2. Uploading an image with lot of data extents containing actual zeroes. Images containing lot of zeroes can be the result of downloading disks via a browser, or with older imageio client that did not support sparsifying downloads (bug 2010067), or old qcow2 disks using qcow2 v2 format. When uploading such images, smart client should detect the zeroes in the data extents and convert them to fast zero request, but no client supports this yet. These zeroes are sent over the write and written to storage as data, which is incorrect when using disk with thin allocation policy and much slower. ## How we fixed it With this fix, the actual zeroes are still sent over the wire, but they are detected on the server and converted to fast zero method. When uploading to disk with thin allocation policy the space is deallocated to keep the disk sparse. ## Testing uploading sparse file via browser 1. Create empty raw image qemu-img create -f raw test.img 10g 2. Upload image via browser In the upload dialog,select file storage domain and thin allocation policy. WARNING: Creating sparse images works corectly only on NFS 4.2 and GlusterFS. 3. Check disk actual size using the UI/API The disk actual size should be 0 (<1GiB in the UI). See comment 10 for more details. ## Testing uploading image with lot of zeroed areas 1. Create fully allocated zeroed raw image dd if=/dev/zero bs=1M count=10240 of=zero.img 2. Upload image via browser In the upload dialog,select file storage domain and thin allocation policy. 3. Check disk actual size using the UI/API The disk actual size should be 0 (<1GiB in the UI). 4. Upload image via SDK ./upload_disk.py -c myengine --sd-name=nfs01 --disk-sparse zero.img 5. Check disk actual size using the UI/API The disk actual size should be 0 (<1GiB in the UI). See comment 12 for more details. ## Testing compatibility with older vdsm Older vdsm did not support zero detection, creating fully allocated disks on storage in the the same cases. To test you can use vdsm < 4.50.1 (e.g. vdsm 4.40.*) on the host. Same tests will create fully allocated image (actual size == 10 GiB). ## Testing performance Comparing upload time between old vdsm (4.40.*) and new vdsm (>= 4.50.1) should show significant improvement. See comment 14 for more details.
(In reply to Nir Soffer from comment #12) > How to test - uploading fully allocated zero image via SDK > > Upload fully allocated image from SDK flow: > > 1. Create fully allocated zero image > > $ dd if=/dev/zero bs=1M count=6144 of=zero-6g.raw > 6144+0 records in > 6144+0 records out > 6442450944 bytes (6.4 GB, 6.0 GiB) copied, 2.17719 s, 3.0 GB/s > > 2. Upload image to raw-sparse volume > > $ ./upload_disk.py -c engine-dev --sd-name=alpine-nfs-00 --disk-sparse > zero-6g.raw > [ 0.0 ] Checking image... > [ 0.0 ] Image format: raw > [ 0.0 ] Disk format: raw > [ 0.0 ] Disk content type: data > [ 0.0 ] Disk provisioned size: 6442450944 > [ 0.0 ] Disk initial size: 6442450944 > [ 0.0 ] Disk name: zero-6g.raw > [ 0.0 ] Disk backup: False > [ 0.0 ] Connecting... > [ 0.0 ] Creating disk... > [ 15.7 ] Disk ID: c35377e7-00e9-4020-a574-2eac0182adcb > [ 15.7 ] Creating image transfer... > [ 17.0 ] Transfer ID: 13058c68-4a3e-4b28-84b7-77cff2fb7d15 > [ 17.0 ] Transfer host name: host4 > [ 17.0 ] Uploading image... > [ 100.00% ] 6.00 GiB, 10.39 seconds, 591.46 MiB/s > > [ 27.4 ] Finalizing image transfer... > [ 31.5 ] Upload completed successfully > > 3. in imageio log, we see 4 connections writing data, no zero requests > > 2021-10-07 03:15:21,828 INFO (Thread-34) [http] CLOSE connection=34 > client=::ffff:192.168.122.1 [connection 1 ops, 10.243405 s] [dispatch 14 > ops, 10.171128 s] [write 12 ops, 10.160605 s, 1.50 GiB, 151.17 MiB/s] > [write.read 192 ops, 7.048409 s, 1.50 GiB, 217.92 MiB/s] [write.write 192 > ops, 3.096443 s, 1.50 GiB, 496.05 MiB/s] [flush 1 ops, 0.000577 s] > > 2021-10-07 03:15:21,853 INFO (Thread-37) [http] CLOSE connection=37 > client=::ffff:192.168.122.1 [connection 1 ops, 10.205645 s] [dispatch 13 > ops, 10.177969 s] [write 12 ops, 10.152687 s, 1.50 GiB, 151.29 MiB/s] > [write.read 192 ops, 7.063180 s, 1.50 GiB, 217.47 MiB/s] [write.write 192 > ops, 3.073920 s, 1.50 GiB, 499.69 MiB/s] [flush 1 ops, 0.004650 s] > > 2021-10-07 03:15:21,945 INFO (Thread-35) [http] CLOSE connection=35 > client=::ffff:192.168.122.1 [connection 1 ops, 10.306756 s] [dispatch 13 > ops, 10.292804 s] [write 12 ops, 10.284349 s, 1.50 GiB, 149.35 MiB/s] > [write.read 192 ops, 7.294485 s, 1.50 GiB, 210.57 MiB/s] [write.write 192 > ops, 2.973314 s, 1.50 GiB, 516.60 MiB/s] [flush 1 ops, 0.000138 s] > > 2021-10-07 03:15:21,963 INFO (Thread-36) [http] CLOSE connection=36 > client=::ffff:192.168.122.1 [connection 1 ops, 10.321620 s] [dispatch 13 > ops, 10.305040 s] [write 12 ops, 10.288394 s, 1.50 GiB, 149.29 MiB/s] > [write.read 192 ops, 7.098438 s, 1.50 GiB, 216.39 MiB/s] [write.write 192 > ops, 3.174500 s, 1.50 GiB, 483.86 MiB/s] [flush 1 ops, 0.000137 s] > > 4. Image actual size is 0 > > <disk href="/ovirt-engine/api/disks/c35377e7-00e9-4020-a574-2eac0182adcb" > id="c35377e7-00e9-4020-a574-2eac0182adcb"> > <name>zero-6g.raw</name> > <description>Uploaded disk</description> > <actual_size>0</actual_size> > <alias>zero-6g.raw</alias> > <format>raw</format> > <provisioned_size>6442450944</provisioned_size> > <sparse>true</sparse> > ... > </disk> Verified on vdsm-4.50.0.10-1.el8ev.x86_64 and engine-4.5.0-0.237.el8ev, on gluster and nfs 4.2 Image actual size is 0 as expected. Moving to 'Verified'
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.