Bug 1591439 - [RFE] [v2v] - imageio performance - concurrent I/O
Summary: [RFE] [v2v] - imageio performance - concurrent I/O
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-imageio
Classification: oVirt
Component: Common
Version: 1.2.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.4.1
: 2.0.8
Assignee: Nir Soffer
QA Contact: Tzahi Ashkenazi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-14 17:54 UTC by Nir Soffer
Modified: 2020-08-05 06:25 UTC (History)
16 users (show)

Fixed In Version: ovirt-imageio-2.0.8
Clone Of: 1527050
Environment:
Last Closed: 2020-08-05 06:25:31 UTC
oVirt Team: Storage
Embargoed:
dfediuck: ovirt-4.4?
mtessun: planning_ack+
rule-engine: devel_ack?
mlehrer: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1527050 0 high CLOSED [v2v] - ImageIO - Performance - upload disk is slow on a pure 10G environment 2021-02-22 00:41:40 UTC
oVirt gerrit 91645 0 master ABANDONED fileio: Use concurrent I/O for upload 2020-10-11 09:58:24 UTC
oVirt gerrit 105949 0 master MERGED client: Support multiple connections 2020-10-11 09:58:14 UTC

Internal Links: 1527050

Description Nir Soffer 2018-06-14 17:54:29 UTC
+++ This bug was initially created as a clone of Bug #1527050 +++

In current release (<=1.3.0), read and writes are not concurrent.

Here is example upload:

Client
------
$ ./upload_disk.py --direct /dev/shm/10G.raw
Uploaded 10.00g in 37.32 seconds (274.40m/s)

Server
------
Operation stats: <Clock(total=37.32, read=20.55, write=14.41, sync=0.00)>

write:  710 MiB/s
read:   498 MiB/s
total:  274 MiB/s

Upload throughput is limited by the naive read-write loop - when we read from
the socket, we don't write to storage, and we write to storage we don't read
data from the socket.

We have patches in review showing good improvement:
https://gerrit.ovirt.org/#/q/topic:cio+project:ovirt-imageio+is:open

Client
------
$ ./upload_disk.py --direct /dev/shm/10G.raw
Uploaded 10.00g in 23.29 seconds (439.73m/s)

Server
------
Operation stats: <Clock(total=23.28, read=23.21, write=14.56, sync=0.00)>

write:  703 MiB/s
read:   441 MiB/s
total:  439 MiB/s

Upload throughput is limited by reading from SSL socket. If we could read faster
from the SSL socket, we can upload in the same rate we can write to storage.

We probably have the same problem when downloading images.

We probably have the same issue in the proxy.

This effects importing vms (virt-v2v, backup and restore, or user uploading and
downloading images for other reasons.

Comment 1 Nir Soffer 2018-07-10 12:11:31 UTC
We have patches for improving upload throughput which show 50-60% improvement
in the upload_disk.py example, however when using lot of small requrests like
virt-v2v is using, these changes do not help.

Also these changes are not compatible yet with fast zero, which does give great
improvement to virt-v2v, so we will need more work to support this.

To support concurent I/O that will be useful to all use cases, we need to do
something like this:

1. Add the concept of a session - we need to be able to detect the start of an
   upload or download. Currently we have a add ticket and remove ticket event, but
   since a ticket supports multiple clients, we cannot use it for sessions.

2. When a session starts, open the underlying file, and start a worker thread for
   this session

3. When the user send a PUT or PATCH/zero request, submit the request for
   processing in the worker thread queue

4. If the request does not require flushing, or need to return data from the
   image, return ack to the user, so it can send the next request - while the
   worker is processing the request.

5. When the worker queue is full, the http thread should wait until there is 
   room for handling new requests. This means we reach the maximum concurrency
   we want.

This is not an easy change, and lot of work, so I think we should defer this to
4.3. Supporting fast zero and nbd is much more important and needed also for 
incremental backup.

Comment 4 Daniel Gur 2018-11-20 09:33:28 UTC
Please Provide Validation instructions


(Like -  Do we need to run V2V or some  upload command would be enough)

Comment 5 Nir Soffer 2018-11-20 09:41:24 UTC
There is nothing to test yet, so no instructions. We will update the bug if and
when we work on this.

Comment 6 Daniel Gur 2018-11-20 13:20:52 UTC
Nir, The bug is in post meaning the implementation finished.
Should it be moved back to assigned?

Comment 7 Nir Soffer 2018-11-20 13:50:10 UTC
POST means that some patches were submitted. This does not mean that development
finished. Since we need to rework the patches, and we don't work on this now,
moving to NEW.

Comment 8 Sandro Bonazzola 2019-01-28 09:36:55 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 9 Nir Soffer 2020-01-14 22:30:27 UTC
We have pending patches adding support for multiple connections:
https://gerrit.ovirt.org/c/105949/

I think this is the best only feasible solution. If virt-v2v should switch to use
imageio client.upload(), supporting multiple connections.

Comment 10 Nir Soffer 2020-06-08 00:41:23 UTC
I updated the old patches. I think this can be ready for 4.4.2.

Comment 12 Nir Soffer 2020-06-16 12:19:08 UTC
The problem with older imageio is using synchronous I/O and single thread
per transfer. This limits the possible throughput when transferring single
disk. When transferring multiple disks we get good overall throughput even
with older imageio server.

We considered supporting pipelining in the server, which makes the I/O
kind of asynchronous. This is tricky to get right, and typically not
supported by http clients (e.g. python http.client used in virt-v2v)
and would not give enough concurrency (see comment 1).

Instead we added support for multiple connections in the server side, and we
are using multiple connections on client side to improve upload and download
throughput when transferring a single or few disks.

With this change, imageio client uses 4 connections per transfer, with 4
threads on both client and server side to transfer image data (like 
"qemu-img convert -W" option). This makes single transfer fast.

I tested this change in f17-h31-000-1029p.rdu2.scalelab - both single
upload and multiple uploads.

transfers     size            time            rate
-----------------------------------------------------
 1           100 GiB     89.7 seconds     1.18 GiB/s
10          1000 GiB    895.0 seconds     1.11 GiB/s 

So with this change we upload and download single image efficiently and
we reach the maximum possible throughput with single transfer.

Unfortunately virt-v2v does not use imageio client for upload and and it
cannot use the client without a major rewrite of the rhv plugin, so this
change in imageio is not expected to improve anything in current virt-v2v.

However we made other changes in imageio server during 4.4 development
cycle, so virt-v2v imports should be faster with 4.4 even with current
virt-v2v code. The most interesting change is fixing the way we manage
buffers, we have this bug 1836858 to testing it.

Also there were some changes in nbdkit that can improve throughput like:
https://github.com/libguestfs/nbdkit/commit/39701410487c7e58c2829aeb44556f7dc9eacba9

I did quick test of virt-v2v import from local 100g disk with 48 GiB
of data.

for n in $(seq -w 10); do
    virt-v2v ... > v2v-10/v2v-$n.log 2>&1 &
done

$ grep Finishing v2v-10/*.log
v2v-10/v2v.01.log:[ 895.3] Finishing off
v2v-10/v2v.02.log:[ 893.2] Finishing off
v2v-10/v2v.03.log:[ 892.4] Finishing off
v2v-10/v2v.04.log:[ 880.2] Finishing off
v2v-10/v2v.05.log:[ 882.4] Finishing off
v2v-10/v2v.06.log:[ 906.6] Finishing off
v2v-10/v2v.07.log:[ 927.1] Finishing off
v2v-10/v2v.08.log:[ 881.0] Finishing off
v2v-10/v2v.09.log:[ 896.4] Finishing off
v2v-10/v2v.10.log:[ 897.2] Finishing off

In this test we imported 1000 GiB of data in 927 seconds (1.07 GiB/s).

This is the total time including creating the disk and vm in RHV,
and doing the initial conversion. If we look in a single virt-v2v log:

[   0.7] Opening the source -i disk ./fedora-31-100g-50p.raw
[   0.8] Creating an overlay to protect the source from being modified
[   0.8] Opening the overlay
[   4.2] Inspecting the overlay
[   8.5] Checking for sufficient free disk space in the guest
[   8.5] Estimating space required on target for each disk
[   8.5] Converting Fedora 31 (Thirty One) to run on KVM
virt-v2v: warning: could not determine a way to update the configuration of 
Grub2
virt-v2v: warning: /files/boot/grub2/device.map/hd0 references unknown 
device "vda".  You may have to fix this entry manually after conversion.
virt-v2v: This guest has virtio drivers installed.
[  59.8] Mapping filesystem data to avoid copying unused and blank areas
[  60.6] Closing the overlay
[  60.7] Assigning disks to buses
[  60.7] Checking if the guest needs BIOS or UEFI to boot
[  60.7] Initializing the target -o rhv-upload -oa preallocated -oc https://rhev-red-03.rdu2.scalelab.redhat.com/ovirt-engine/api -op password -os L0_Group_0_SD
[  62.0] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/var/tmp/rhvupload.4cWCCs/nbdkit0.sock", "file.export": "/" } (raw)
    (0.00/100%)^M    (1.00/100%)^M    ...   (100.00/100%)
[ 904.8] Creating output metadata
[ 906.6] Finishing off

We can see that the upload started at 62.0 and ended at 904.8 so upload
time was 842.8 seconds (121.49 MiB/s). This time includes the time to create
a disk. The actual transfer time can be extracted from imageio connection 
stats but I did not checked this.

When I tested the same flow few years ago I saw much lower total throughput
when using image with 33% utilization:
https://bugzilla.redhat.com/show_bug.cgi?id=1615144#c3

Since we have bug 1836858 for testing virt-v2v, I think this bug should
focus on testing upload and download using upload_disk.py and download_disk.py
scripts from ovirt-engine-sdk. These tests are mainly relevant to backup
vendors which will use the same APIs.


I think we should test:

- images with 30% 50%, 70% utilization, to test how image utilization affects
  throughput. If we want to limit the testing, 50% utilization looks like
  good choice for standard tests.

- download single disk to qcow2 format

- upload single qcow2 image to qcow2 disk

- 10 concurrent downloads

- 10 concurrent uploads

- iSCSI and FC storage

- local and remote transfer

- proxy transfer (add --use-proxy in download/upload commands)

Notes:

- remote transfers use the management network, so we need to use fast 10g network.

- testing downloads should use fast NFS server over 10g network. Testing download
  to slow local disk does not test imageio but the local disk.

- imageio client supports transfer from raw/qcow2 to raw/qcow2, so we have 4
  possible combinations (raw to raw, qcow2 to qcow2, raw to qcow2, qcow2 to raw).
  Since upload and download are important in backup and restore context, and in
  this context qcow2 is the most useful format, I think we should focus only on
  qcow2 format.

- downloading raw disks on block storage is not efficient since we don't have 
  any information on image sparseness. We can test this as the worst case 
  scenario.

- concurrent uploads is less important to test for in backup context, since we
  we don't expect mass restore from backup. We do expect mass backup operations
  daily. So we should focus on download tests. Concurrent uploads are relevant
  to virt-v2v.

- We should test both iSCSI and FC storage, but if we want to limit testing, we
  can focus on FC storage since our most important users use FC and we want to 
  make sure we have good performance on best storage.

- proxy transfer is not efficient and not recommended but it is possible that 
  some users will have to use the proxy for backup. I think we need to test at
  least single transfer with the proxy.

How to test download:

    $ /usr/share/doc/python3-ovirt-engine-sdk4/examples/download_disk.py \
        --engine-url https://my.engine/ \
        --username admin@internal \
        --password-file password \
        --cafile /etc/pki/vdsm/cets/cacert.pem \
        --format qcow2 \
        {disk-uuid} \
        /mnt/backup-media/{disk-uuid}.qcow2

How to test uploads:

    $ /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py \
        --engine-url https://my.engine/ \
        --username admin@internal \
        --password-file password \
        --cafile /etc/pki/vdsm/cets/cacert.pem \
        --disk-format qcow2 \
        --disk-sparse \
        --sd-name {my.storage} \
        /mnt/backup-media/{disk-uuid}.qcow2

Comment 13 Richard W.M. Jones 2020-06-17 09:58:46 UTC
"imageio-client" is https://github.com/danielerez/imageio-client ?
It seems as if it only supports uploads from a local file although
it's a bit hard to tell from the source.

What we would really like would be an NBD endpoint.

Comment 14 Nir Soffer 2020-06-17 10:56:54 UTC
(In reply to Richard W.M. Jones from comment #13)
> "imageio-client" is https://github.com/danielerez/imageio-client ?

No, this is an early version of the java client used by engine to control
ovirt-imageio service on engine host. 

> It seems as if it only supports uploads from a local file although
> it's a bit hard to tell from the source.

The client is here:

https://github.com/oVirt/ovirt-imageio/blob/master/daemon/ovirt_imageio/client/_api.py#L31

The upload function transfers the image supporting any image format,
using  multiple connections, and unix socket if possible, using this pipeline:

   local image -> qemu-nbd -> imageio client -> imageio server

On the server side we have similar pipline (since ovirt 4.3):

    imageio-server -> qemu-nbd -> shared storage

This only the transfer; for creating a disk you can use the upload_disk.py module
from ovirt-engine-sdk:
https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/upload_disk.py

Some stuff implemented in rhv-upload-plugin like selecting a host is still
missing, but we have patches to add that:
https://gerrit.ovirt.org/c/109609

We also have features that rhv-upload-plugin does not have, like transferring
via proxy if transfer_url is not available:
https://gerrit.ovirt.org/c/109305

It is currently significantly faster than "qemu-img convert" with block storage
because of bug 1847192 - see tests results here:
https://github.com/oVirt/ovirt-imageio/commit/97f2e277458db579023ba54a4a4bd122b36f543e

To use this we need to rewrite the rhv-upload flow like this:

1. pre checks
2. run virt-v2v with --no-copy to only do the conversion step
3. run upload_disk.py with the overlay
4. Create the vm with the uploaded disks

But note that in my tests, while uploading single disk is much slower with
current virt-v2v, uploading 10 disks concurrently is slightly faster.
I think because virt-v2v uses imageio file backend which is slightly faster.

We will package upload_disk.py as proper command line tool, see bug 1626262.

> What we would really like would be an NBD endpoint.

I agree, this should be available in 4.4.z.

But it will require the same rewrite of the plugin - separating the transfer
part from the management part.

Comment 16 Nir Soffer 2020-06-25 19:47:12 UTC
Moving to 4.4.2 based on comment 15.

Comment 17 Shir Fishbain 2020-06-30 15:37:12 UTC
Mordechai please give your qa_ack, it's a scale bug. Thanks.

Comment 21 Nir Soffer 2020-07-09 15:28:18 UTC
Since this was moved to 4.4.2 and we started testing it, I don't think the 
current needinfo requests are needed.

Comment 22 mlehrer 2020-07-09 15:51:20 UTC
(In reply to Nir Soffer from comment #21)
> Since this was moved to 4.4.2 and we started testing it, I don't think the 
> current needinfo requests are needed.

We had a request to push forward on this for 4.4.1 
currently we have a few cases remaining and then will update with results.

Comment 23 Tzahi Ashkenazi 2020-07-13 15:41:49 UTC
tested on version :
         vdsm-4.40.22-1.el8ev.x86_64
         rhv-release-4.4.1-10-001.noarch

On rhev-red01  with 2 hosts  using single SD , ISCSI for uploading disks. 
and on local NVMe device size 900GB for downloading method.

 Disk size 100GB , 66% disk Utilization as used in v2v.

two  supermicro 1029P  with 256GB Ram and 32 cores 

Network setup on the host was the following :
 1 network on 1 GiB - display and def route
 1 network on 10 GiB - management,
 1 network on 10 GiB -  migration, and vm

The full report can be found here : 

         https://docs.google.com/spreadsheets/d/1QTHnuq5nxRdFLeBD-QN5JRh3s0BrpAZ2L-he4OrrIWE/edit?usp=sharing

Comment 24 Tzahi Ashkenazi 2020-07-15 14:26:11 UTC
BZ 1591439 Summary test result 
Environment & info  : 
Engine :  rhev-red01  
    Vdsm-4.40.22-1.el8ev.x86_64
    Rhv-release-4.4.1-10-001.noarch
    ovirt-imageio-2.0.9-1
    2 supermicro model 1029P  with 256GB Ram and 32 cores
    single SD ISCSI for uploading disks 
    NVMe device size 900GB for downloading.
    Disk size 100GB , 66% disk Utilization 70GB
    Network setup on the host was the following :
        1 network on 1 GiB -   display and def route
        1 network on 10 GiB - management,
        1 network on 10 GiB - migration, and vm

Main cases was local & remote upload and download for single and 10 disks in parallel 
Measuring throughput in real time using ibmonitor & the API
and total time duration for each test case 

Results :
    Download  Local  :
        10 disk concurrent  - (Duration 52m24.617s) ( Throughput ibmonitor - 258.08 MiB/s  - API 3127.51MiB/s )
         1 disk - ( Duration 3m33.361s ) ( Throughput ibmonitor - 377.05MiB/s   - API 200.36MiB/s )

    Download remote :   
        10 disks concurrent -   (Duration 111m55.285s) 
                Throughput ibmonitor -( ens2f3 (H-26) - 117.58MiB/s  ovirtmgmt   (H-29)  -136.42MiB/s ) API (6700.42 seconds, 15.28 MiB/s)

        1 disk  - ( Duration 10m23.070s )  ( Throughput ibmonitor - ens2f3 (H-26) - 120.83MiB/s )  API ( ovirtmgmt (H-29) - 116.42MiB/s  )
    
    Upload Local :
	    10 disk concurrent - ( Duration 16m30.925s ) ( Throughput ibmonitor - 877.32 MiB/s  - API  108.54MiB/s )
 	    1 disk - ( Duration  2m11.530s )  ( Throughput ibmonitor - 757.297MiB/s - API 976.64MiB/s )

    Upload Remote  :
	   1 disk - ( Duration 10m23.070s ) ( Throughput ibmonitor - ens2f3 (H-26) - 120.83MiB/s  ovirtmgmt (H-29) - 116.42MiB/s  - API 168.64 MiB/s )

  
conclusions :
The local uploads results was as expected, download local tests performed slower then expected and can likely be improved  with running management & Default route both on 10g interface

Comment 25 Nir Soffer 2020-08-03 13:09:10 UTC
(In reply to Tzahi Ashkenazi from comment #24)
Tzahi, I want to inspect the imageio logs from these tests, but I cannot find the
logs in the full report.

Comment 26 Tzahi Ashkenazi 2020-08-03 13:33:57 UTC
hi Nir 

the full logs and files from the task can be found here :

https://drive.google.com/drive/folders/1qqyn7tYEvuHYra4_961_mPJEtvKfKjgP?usp=sharing

if you need any other logs please tell me and i can search them  on the servers 

in the link above , there are also the daemons logs that you requested on our last sync

Comment 27 Nir Soffer 2020-08-03 13:39:32 UTC
Thanks Tzahi, I need access to this folder.

Comment 28 Sandro Bonazzola 2020-08-05 06:25:31 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.