Bug 1829047 - [RFE] Support upload from URL
Summary: [RFE] Support upload from URL
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: ovirt-imageio
Classification: oVirt
Component: Common
Version: 2.0.0
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: ---
: ---
Assignee: Nir Soffer
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-28 18:56 UTC by Nir Soffer
Modified: 2022-04-24 16:02 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-24 16:02:31 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?
pm-rhel: planning_ack?
pm-rhel: devel_ack?
pm-rhel: testing_ack?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 106022 0 master ABANDONED client: Support upload from URL 2020-08-03 12:11:17 UTC

Description Nir Soffer 2020-04-28 18:56:18 UTC
Description of problem:
 
imageio client.upload() supports only local file, but qemu-nbd used to by it
supports also URLs like "http://", "https://", "ftp://", "ftps://". Using 
a URL allows upload directly from remote web server, without downloading
the image to local file.

Using a URL may require additional settings (e.g. timeout, authentication).
The way to specify these settings is using --image-opts argument. When using
it the source file should be json text with image options instead of 
a filename.

We have old initial patch here, that need rebase and adapting to current code:
https://gerrit.ovirt.org/c/106022/

Initial tests show that uploading from URL is not less efficient compared with
downloading the file and uploading from local file:

$ time python3 upload_disk.py ... --disk-format raw \
    https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1907.qcow2c
Image format: qcow2
Disk format: raw
Disk content type: data
Disk provisioned size: 8589934592
Disk initial size: 8589934592
Disk name: CentOS-7-x86_64-GenericCloud-1907.raw
...
[ 100.00% ] 8.00 GiB, 103.66 seconds, 79.03 MiB/s

real	1m56.339s
user	0m8.399s
sys	0m4.211s

For reference, here is the previous flow, downloading the disk to
temporary file, and uploading the temporary file:

$ time wget https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1907.qcow2c
...
2020-01-01 00:02:33 (9.08 MB/s) - ‘CentOS-7-x86_64-GenericCloud-1907.qcow2c’ saved [417088512/417088512]

real	0m44.631s
user	0m1.851s
sys	0m2.788s

$ time python3 upload_disk.py ... --disk-format raw CentOS-7-x86_64-GenericCloud-1907.qcow2c
...
Image format: qcow2
Disk format: raw
Disk content type: data
Disk provisioned size: 8589934592
Disk initial size: 8589934592
Disk name: CentOS-7-x86_64-GenericCloud-1907.raw
...
[ 100.00% ] 8.00 GiB, 2.82 seconds, 2.84 GiB/s

real	0m7.609s
user	0m5.440s
sys	0m1.764s


Maybe we can optimize this by tuning curl driver in qemu-nbd.

Comment 1 Nir Soffer 2020-04-28 19:14:51 UTC
TODO:
- Profile to see when time is spent
- Compare with qemu-img convert from https: to local file

Comment 2 Nir Soffer 2020-06-17 12:28:57 UTC
Moving to 4.4.3. We have proof of concept that need a rewrite on current
imageio code, but the performance results so far are not good enough.

This requires more research and may need changes in qemu to get decent
performance.

The main advantage is doing an upload from URL in one step without using
temporary files, but if using a temporary file is faster, we can make one
step process (from user point of view) downloading to a temporary file, 
uploding the temporary file and deleting the file. The user experience 
would be the same with much less effort.

Comment 3 Nir Soffer 2020-06-22 20:08:20 UTC
We have a new use case, uploading disks from ova file, which can use 
upload from NBD URL.

To expose images from ova, we need to find the offset of the image
data in the ova file. The info is available via tarfile module:

$ python -c '
import tarfile
f = tarfile.open("vm.ova")
print(list({"name": m.name, "offset": m.offset_data} for m in iter(f)))
'
[{'name': 'disk1.qcow2', 'offset': 512}, {'name': 'disk2.qcow2', 'offset': 455168}]

We can expose the image using qemu-nbd:

$ qemu-nbd --persistent --socket=/tmp/nbd1.sock --read-only --offset=512 vm.ova

Now the image qcow2 data is exposed using /tmp/nbd1.sock:

$ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd1.sock"
{
    "virtual-size": 209715200,
    "filename": "nbd+unix://?socket=/tmp/nbd1.sock",
    "cluster-size": 65536,
    "format": "qcow2",
    "format-specific": {
        "type": "qcow2",
        "data": {
            "compat": "1.1",
            "lazy-refcounts": false,
            "refcount-bits": 16,
            "corrupt": false
        }
    },
    "dirty-flag": false
}

imageio client cannot consume this data if we want to support format conversion
but we can add another layer of nbd server to convert the format:

$ qemu-nbd --persistent --socket=/tmp/nbd2.sock --read-only nbd+unix://?socket=/tmp/nbd1.sock

And now we have access to raw guest data:

$ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd2.sock"
{
    "virtual-size": 209715200,
    "filename": "nbd+unix://?socket=/tmp/nbd2.sock",
    "format": "raw"
}

Another option is using nbdkit tar plugin, but it is not available downstream.

So from imageio point of view, the missing part is being able to upload from URL,
in this case, the NBD URL: nbd+unix://?socket=/tmp/nbd1.sock.

So the pipeline will be:

 ova file -> qemu-nbd 1 -> qemu-nbd 2 -> imageio client

In client.upload(filename, transfer_url), filename should support:

- local file: "/path/to/image"
- NBD URL: nbd+unix://?socket=/tmp/nbd1.sock

Comment 4 Nir Soffer 2020-06-23 10:49:46 UTC
For the ova use case we have a simpler solution, see bug 1849981.

Comment 5 Nir Soffer 2020-06-23 10:50:49 UTC
The attached patch needs a rewrite, moving back to NEW.

Comment 7 Sandro Bonazzola 2022-03-29 16:16:40 UTC
We are past 4.5.0 feature freeze, please re-target.

Comment 8 Arik 2022-04-13 13:20:28 UTC
Nir, is there a concrete use case that this can be used for or is it a general enhancement to open the door for future changes/improvements (in which case we better covert this to a github issue on imageio)?

Comment 9 Nir Soffer 2022-04-13 13:40:30 UTC
The main use case is uploading an image publish via http. Currently you need to download
the image to a temporary file, and then upload from the temporary file. This is is less
efficient and hard to do compared to uploading directly from the URL.

I don't know about any customer request for this feature, so converting it to upstream
imageio issue is a good idea.

Comment 10 Arik 2022-04-24 16:02:31 UTC
Ack, filed: https://github.com/oVirt/ovirt-imageio/issues/65


Note You need to log in before you can comment on or make changes to this bug.