Description of problem: imageio client.upload() supports only local file, but qemu-nbd used to by it supports also URLs like "http://", "https://", "ftp://", "ftps://". Using a URL allows upload directly from remote web server, without downloading the image to local file. Using a URL may require additional settings (e.g. timeout, authentication). The way to specify these settings is using --image-opts argument. When using it the source file should be json text with image options instead of a filename. We have old initial patch here, that need rebase and adapting to current code: https://gerrit.ovirt.org/c/106022/ Initial tests show that uploading from URL is not less efficient compared with downloading the file and uploading from local file: $ time python3 upload_disk.py ... --disk-format raw \ https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1907.qcow2c Image format: qcow2 Disk format: raw Disk content type: data Disk provisioned size: 8589934592 Disk initial size: 8589934592 Disk name: CentOS-7-x86_64-GenericCloud-1907.raw ... [ 100.00% ] 8.00 GiB, 103.66 seconds, 79.03 MiB/s real 1m56.339s user 0m8.399s sys 0m4.211s For reference, here is the previous flow, downloading the disk to temporary file, and uploading the temporary file: $ time wget https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1907.qcow2c ... 2020-01-01 00:02:33 (9.08 MB/s) - ‘CentOS-7-x86_64-GenericCloud-1907.qcow2c’ saved [417088512/417088512] real 0m44.631s user 0m1.851s sys 0m2.788s $ time python3 upload_disk.py ... --disk-format raw CentOS-7-x86_64-GenericCloud-1907.qcow2c ... Image format: qcow2 Disk format: raw Disk content type: data Disk provisioned size: 8589934592 Disk initial size: 8589934592 Disk name: CentOS-7-x86_64-GenericCloud-1907.raw ... [ 100.00% ] 8.00 GiB, 2.82 seconds, 2.84 GiB/s real 0m7.609s user 0m5.440s sys 0m1.764s Maybe we can optimize this by tuning curl driver in qemu-nbd.
TODO: - Profile to see when time is spent - Compare with qemu-img convert from https: to local file
Moving to 4.4.3. We have proof of concept that need a rewrite on current imageio code, but the performance results so far are not good enough. This requires more research and may need changes in qemu to get decent performance. The main advantage is doing an upload from URL in one step without using temporary files, but if using a temporary file is faster, we can make one step process (from user point of view) downloading to a temporary file, uploding the temporary file and deleting the file. The user experience would be the same with much less effort.
We have a new use case, uploading disks from ova file, which can use upload from NBD URL. To expose images from ova, we need to find the offset of the image data in the ova file. The info is available via tarfile module: $ python -c ' import tarfile f = tarfile.open("vm.ova") print(list({"name": m.name, "offset": m.offset_data} for m in iter(f))) ' [{'name': 'disk1.qcow2', 'offset': 512}, {'name': 'disk2.qcow2', 'offset': 455168}] We can expose the image using qemu-nbd: $ qemu-nbd --persistent --socket=/tmp/nbd1.sock --read-only --offset=512 vm.ova Now the image qcow2 data is exposed using /tmp/nbd1.sock: $ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd1.sock" { "virtual-size": 209715200, "filename": "nbd+unix://?socket=/tmp/nbd1.sock", "cluster-size": 65536, "format": "qcow2", "format-specific": { "type": "qcow2", "data": { "compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false } }, "dirty-flag": false } imageio client cannot consume this data if we want to support format conversion but we can add another layer of nbd server to convert the format: $ qemu-nbd --persistent --socket=/tmp/nbd2.sock --read-only nbd+unix://?socket=/tmp/nbd1.sock And now we have access to raw guest data: $ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd2.sock" { "virtual-size": 209715200, "filename": "nbd+unix://?socket=/tmp/nbd2.sock", "format": "raw" } Another option is using nbdkit tar plugin, but it is not available downstream. So from imageio point of view, the missing part is being able to upload from URL, in this case, the NBD URL: nbd+unix://?socket=/tmp/nbd1.sock. So the pipeline will be: ova file -> qemu-nbd 1 -> qemu-nbd 2 -> imageio client In client.upload(filename, transfer_url), filename should support: - local file: "/path/to/image" - NBD URL: nbd+unix://?socket=/tmp/nbd1.sock
For the ova use case we have a simpler solution, see bug 1849981.
The attached patch needs a rewrite, moving back to NEW.
We are past 4.5.0 feature freeze, please re-target.
Nir, is there a concrete use case that this can be used for or is it a general enhancement to open the door for future changes/improvements (in which case we better covert this to a github issue on imageio)?
The main use case is uploading an image publish via http. Currently you need to download the image to a temporary file, and then upload from the temporary file. This is is less efficient and hard to do compared to uploading directly from the URL. I don't know about any customer request for this feature, so converting it to upstream imageio issue is a good idea.
Ack, filed: https://github.com/oVirt/ovirt-imageio/issues/65