Bug 1836858 - [v2v] Improve performance when using small requests
Summary: [v2v] Improve performance when using small requests
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-imageio
Classification: oVirt
Component: Common
Version: 2.0.0
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ovirt-4.4.1
: 2.0.7
Assignee: Nir Soffer
QA Contact: Maayan Hadasi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-18 10:46 UTC by Nir Soffer
Modified: 2020-08-05 06:25 UTC (History)
6 users (show)

Fixed In Version: ovirt-imageio-2.0.7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-05 06:25:16 UTC
oVirt Team: Storage
Embargoed:
sbonazzo: ovirt-4.4?
sbonazzo: planning_ack?
sbonazzo: devel_ack?
sbonazzo: testing_ack?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 109068 0 master MERGED backends: Use buffer per connection 2021-01-25 15:27:14 UTC

Description Nir Soffer 2020-05-18 10:46:01 UTC
Description of problem:

imageio server has pretty good performance when uploading and image in minimal
number of requests. For example when using imageio client uploading Fedora 31
image results in:

2020-05-16 21:14:51,744 INFO    (Thread-62) [http] CLOSE client=local [connection 1 ops, 3.014458 s] [dispatch 201 ops, 2.904726 s] [operation 201 ops, 2.870450 s] [read 228 ops, 0.933881 s, 1.19 GiB, 1.28 GiB/s] [write 228 ops, 1.813568 s, 1.19 GiB, 673.19 MiB/s] [zero 99 ops, 0.076165 s, 4.81 GiB, 63.12 GiB/s] [flush 1 ops, 0.000139 s]

This is 201 requests:
- 100 write requests transferring 1.19 GiB
- 99 zero requests zeroing 4.81 GiB
- 1 flush request

However when using virt-v2v to upload the same image, qemu-img generates huge
number of small requests:

2020-05-16 22:55:30,071 INFO    (Thread-19) [http] CLOSE client=local [connection 1 ops, 6.682791 s] [dispatch 3560 ops, 4.570003 s] [operation 3560 ops, 3.996812 s] [read 2458 ops, 0.734830 s, 1.21 GiB, 1.64 GiB/s] [write 2458 ops, 2.336602 s, 1.21 GiB, 528.16 MiB/s] [zero 1099 ops, 0.591710 s, 4.79 GiB, 8.10 GiB/s] [flush 3 ops, 0.000016 s]

This is:

- 2458 write requests transferring 1.21 GiB
- 1099 zero requests zeroing 4.79 GiB
- 3 flush requests

Image content is not exactly the same, since virt-v2v is using fstrim to
sparsify the image before the upload, and may do some manipulation in the 
image that make the image more fragmented, but the images are very similar.

We can test this better by preparing the overlay using virt-v2v and uploading
it instead of the original image, but this is good enough to illustrate the 
problem.

It is impossible to change qemu-img to optimize requests because the way it is
designed and how it interacts with nbdkit NBD server.

For the mostly empty 6g image the difference is only few seconds which
is less than the time to create a new disk, but when testing big VM disks
(100 GiB) with larger utilization (70%) we have 60,000 requests per image,
and the upload takes around 15 minutes.

So imageio should be improved to work better with lot of small requests.

An easy change we can do is manage buffers in a better way. imageio allocates
one aligned buffer per request, and use it to transfer data from the socket
to the backend (file or nbd). This works great when uploading images in few
big requests (e.g.128 MiB per request) but does not make sense when handling
up to 2 MiB per request (virt-v2v case).

This can be avoided by reusing the same buffer for the entire connection, the
same way we reuse the backend - open on the first request accessing the
backend, and close when the connection is closed.

Comment 1 Nir Soffer 2020-05-18 10:48:06 UTC
This should be verified by the scale team, assigning to Daniel.

Comment 2 Nir Soffer 2020-05-18 10:57:41 UTC
To verify, do virt-v2v import of the standard VM with 1 and 10 concurrent imports.

Run once with imageio 2.0.6, not including the fix, and once with imageio 2.0.7
that should include the change. Switching imageio should be simple dnf
downngrade/upgrade.

Use the new detailed stats in connection CLOSE logs to compare results. We should
see lower "operation", "dispatch" and "connection" times.

I see 15% improvement on my development setup but the results with real server
and storage may be different.

Comment 3 Maayan Hadasi 2020-07-08 08:42:45 UTC
Verified as fixed.

Verification steps:
virt-v2v imports from VMware to RHV 4.4, of a single VM (100Gib disk, 66% utilization)
Once with ovirt-imageio-2.0.6-0 and the other with ovirt-imageio-2.0.8-1 (using the same VM)

The stats with ovirt-imageio-2.0.8-1 indeed show an improvement in "connection", "dispatch" and "operation" times:

ovirt-imageio-2.0.8:
2020-07-02 16:14:16,151 INFO    (Thread-4625) [images] [local] FLUSH ticket=deb69596-a46a-4427-bfe4-b52b0eee2e05
2020-07-02 16:14:16,151 INFO    (Thread-4625) [http] CLOSE client=local [connection 1 ops, 1309.313930 s] [dispatch 38429 ops, 219.900553 s] [operation 38429 ops, 212.216340 s] [read 36124 ops, 12.739692 s, 65.99 GiB, 5.18 GiB/s] [write 36124 ops, 197.436119 s, 65.99 GiB, 342.26 MiB/s] [zero 2302 ops, 0.614156 s, 34.01 GiB, 55.38 GiB/s] [flush 3 ops, 0.000030 s]

ovirt-imageio-2.0.6:
2020-07-02 17:02:18,645 INFO    (Thread-49) [images] [local] FLUSH ticket=3b507c0c-d12e-4fdc-b3ff-673041685649
2020-07-02 17:02:18,646 INFO    (Thread-49) [http] CLOSE client=local [connection 1 ops, 1331.895737 s] [dispatch 38425 ops, 258.481324 s] [operation 38425 ops, 251.419609 s] [read 36120 ops, 36.441080 s, 65.99 GiB, 1.81 GiB/s] [write 36120 ops, 202.668536 s, 65.99 GiB, 333.42 MiB/s] [zero 2302 ops, 0.655233 s, 34.01 GiB, 51.91 GiB/s] [flush 3 ops, 0.000066 s]


Was verified with versions:
CFME 5.11.6 + v2v-conversion-host-ansible-1.16.2-3.el8ev
RHV 4.4 host-RHEL 8.2 + v2v-conversion-host-wrapper-1.16.2-3.el8ev

Comment 4 Maayan Hadasi 2020-07-09 07:20:06 UTC
The issue was verified also with a local upload of 10 images, to RHV 4.4
Here are the results:

ovirt-imageio-2.0.8:

2020-07-08 11:10:56,491 INFO    (Thread-7) [images] [local] FLUSH ticket=66539021-083e-43d2-bfd0-6dc9f30127cc
2020-07-08 11:10:56,491 INFO    (Thread-7) [http] CLOSE client=local [connection 1 ops, 871.465700 s] [dispatch 49850 ops, 791.168056 s] [operation 49850 ops, 774.136375 s] [read 38388 ops, 55.257172 s, 65.96 GiB, 1.19 GiB/s] [write 38388 ops, 708.259856 s, 65.96 GiB, 95.37 MiB/s] [zero 11459 ops, 7.838978 s, 34.04 GiB, 4.34 GiB/s] [flush 3 ops, 0.000286 s]

2020-07-08 11:11:00,563 INFO    (Thread-3) [images] [local] FLUSH ticket=89319871-f39f-49e7-bb98-451379079b54
2020-07-08 11:11:00,563 INFO    (Thread-3) [http] CLOSE client=local [connection 1 ops, 876.750510 s] [dispatch 49850 ops, 796.260899 s] [operation 49850 ops, 779.298741 s] [read 38388 ops, 54.024483 s, 65.96 GiB, 1.22 GiB/s] [write 38388 ops, 714.781434 s, 65.96 GiB, 94.50 MiB/s] [zero 11459 ops, 7.769955 s, 34.04 GiB, 4.38 GiB/s] [flush 3 ops, 0.000059 s]

2020-07-08 11:11:28,860 INFO    (Thread-19) [images] [local] FLUSH ticket=f3c4e6c2-7bd9-43ae-a6df-00bc0f84d3b6
2020-07-08 11:11:28,863 INFO    (Thread-19) [http] CLOSE client=local [connection 1 ops, 894.309295 s] [dispatch 49841 ops, 808.544976 s] [operation 49841 ops, 785.396972 s] [read 38379 ops, 44.065759 s, 65.96 GiB, 1.50 GiB/s] [write 38379 ops, 722.600736 s, 65.96 GiB, 93.47 MiB/s] [zero 11459 ops, 15.960544 s, 34.04 GiB, 2.13 GiB/s] [flush 3 ops, 0.001410 s]

2020-07-08 11:11:31,633 INFO    (Thread-25) [images] [local] FLUSH ticket=b2d34101-fdf9-4ce6-85cf-9ea72a98e86d
2020-07-08 11:11:31,634 INFO    (Thread-25) [http] CLOSE client=local [connection 1 ops, 895.459180 s] [dispatch 49831 ops, 805.214064 s] [operation 49831 ops, 781.214521 s] [read 38369 ops, 40.833787 s, 65.96 GiB, 1.62 GiB/s] [write 38369 ops, 720.941077 s, 65.96 GiB, 93.69 MiB/s] [zero 11459 ops, 16.523259 s, 34.04 GiB, 2.06 GiB/s] [flush 3 ops, 0.000196 s]

2020-07-08 11:11:33,094 INFO    (Thread-21) [images] [local] FLUSH ticket=35609199-5785-4d8e-b6c3-e23bb3a0e8ef
2020-07-08 11:11:33,095 INFO    (Thread-21) [http] CLOSE client=local [connection 1 ops, 898.346917 s] [dispatch 49845 ops, 809.690525 s] [operation 49845 ops, 785.704530 s] [read 38383 ops, 45.190629 s, 65.96 GiB, 1.46 GiB/s] [write 38383 ops, 721.416630 s, 65.96 GiB, 93.63 MiB/s] [zero 11459 ops, 16.220123 s, 34.04 GiB, 2.10 GiB/s] [flush 3 ops, 0.000028 s]

2020-07-08 11:11:33,195 INFO    (Thread-46) [images] [local] FLUSH ticket=2d0fd1b4-00bf-41f0-95c9-1d46accc0a3c
2020-07-08 11:11:33,196 INFO    (Thread-46) [http] CLOSE client=local [connection 1 ops, 893.556515 s] [dispatch 49833 ops, 803.733619 s] [operation 49833 ops, 779.756088 s] [read 38371 ops, 42.029462 s, 65.96 GiB, 1.57 GiB/s] [write 38371 ops, 718.359141 s, 65.96 GiB, 94.03 MiB/s] [zero 11459 ops, 16.481716 s, 34.04 GiB, 2.07 GiB/s] [flush 3 ops, 0.000204 s]

2020-07-08 11:11:33,840 INFO    (Thread-17) [images] [local] FLUSH ticket=8e65bfeb-6626-45a4-8fa6-c240625bc7a2
2020-07-08 11:11:33,842 INFO    (Thread-17) [http] CLOSE client=local [connection 1 ops, 899.463596 s] [dispatch 49838 ops, 810.074476 s] [operation 49838 ops, 786.044137 s] [read 38376 ops, 43.681442 s, 65.96 GiB, 1.51 GiB/s] [write 38376 ops, 724.038678 s, 65.96 GiB, 93.29 MiB/s] [zero 11459 ops, 15.469207 s, 34.04 GiB, 2.20 GiB/s] [flush 3 ops, 0.000154 s]

2020-07-08 11:11:34,006 INFO    (Thread-39) [images] [local] FLUSH ticket=956f3222-8c6a-4119-a2d4-912e62edb1a8
2020-07-08 11:11:34,007 INFO    (Thread-39) [http] CLOSE client=local [connection 1 ops, 895.710220 s] [dispatch 49838 ops, 805.555595 s] [operation 49838 ops, 781.548155 s] [read 38376 ops, 39.687935 s, 65.96 GiB, 1.66 GiB/s] [write 38376 ops, 722.720946 s, 65.96 GiB, 93.46 MiB/s] [zero 11459 ops, 16.261900 s, 34.04 GiB, 2.09 GiB/s] [flush 3 ops, 0.000384 s]

2020-07-08 11:11:35,136 INFO    (Thread-34) [images] [local] FLUSH ticket=c19a7f99-6fe8-43bc-ae5f-f022fc0f917d
2020-07-08 11:11:35,136 INFO    (Thread-34) [http] CLOSE client=local [connection 1 ops, 897.288351 s] [dispatch 49848 ops, 808.532739 s] [operation 49848 ops, 785.021453 s] [read 38386 ops, 41.737599 s, 65.96 GiB, 1.58 GiB/s] [write 38386 ops, 724.452326 s, 65.96 GiB, 93.24 MiB/s] [zero 11459 ops, 16.000292 s, 34.04 GiB, 2.13 GiB/s] [flush 3 ops, 0.000116 s]

2020-07-08 11:11:35,580 INFO    (Thread-37) [images] [local] FLUSH ticket=59d8516d-9314-4fff-b105-f1d7087af719
2020-07-08 11:11:35,581 INFO    (Thread-37) [http] CLOSE client=local [connection 1 ops, 897.455723 s] [dispatch 49835 ops, 808.450046 s] [operation 49835 ops, 785.319982 s] [read 38373 ops, 44.122220 s, 65.96 GiB, 1.49 GiB/s] [write 38373 ops, 722.931654 s, 65.96 GiB, 93.43 MiB/s] [zero 11459 ops, 15.481154 s, 34.04 GiB, 2.20 GiB/s] [flush 3 ops, 0.000050 s]

========================================================================================================================================================================================================

ovirt-imageio-2.0.6:

2020-07-07 21:25:24,321 INFO    (Thread-4030) [images] [local] FLUSH ticket=ed8b3be0-9838-46a4-a388-89ac72353d9f
2020-07-07 21:25:24,322 INFO    (Thread-4030) [http] CLOSE client=local [connection 1 ops, 910.199021 s] [dispatch 49875 ops, 820.754516 s] [operation 49875 ops, 791.891278 s] [read 38413 ops, 73.624902 s, 65.96 GiB, 917.41 MiB/s] [write 38413 ops, 686.290078 s, 65.96 GiB, 98.42 MiB/s] [zero 11459 ops, 16.809187 s, 34.04 GiB, 2.03 GiB/s] [flush 3 ops, 0.000029 s]

2020-07-07 21:25:28,463 INFO    (Thread-4028) [images] [local] FLUSH ticket=607601b3-7bfd-464a-83ba-357973f32c16
2020-07-07 21:25:28,464 INFO    (Thread-4028) [http] CLOSE client=local [connection 1 ops, 914.445997 s] [dispatch 49876 ops, 824.511814 s] [operation 49876 ops, 795.921556 s] [read 38414 ops, 73.465722 s, 65.96 GiB, 919.40 MiB/s] [write 38414 ops, 689.927281 s, 65.96 GiB, 97.90 MiB/s] [zero 11459 ops, 17.007725 s, 34.04 GiB, 2.00 GiB/s] [flush 3 ops, 0.000413 s]

2020-07-07 21:25:40,472 INFO    (Thread-4042) [images] [local] FLUSH ticket=e6f710c2-f288-45d2-9cf5-4633ee11a212
2020-07-07 21:25:40,473 INFO    (Thread-4042) [http] CLOSE client=local [connection 1 ops, 923.752838 s] [dispatch 49842 ops, 830.510485 s] [operation 49842 ops, 799.672787 s] [read 38380 ops, 72.605780 s, 65.96 GiB, 930.29 MiB/s] [write 38380 ops, 692.509090 s, 65.96 GiB, 97.54 MiB/s] [zero 11459 ops, 19.191354 s, 34.04 GiB, 1.77 GiB/s] [flush 3 ops, 0.000838 s]

2020-07-07 21:25:43,917 INFO    (Thread-4044) [images] [local] FLUSH ticket=5aee93f3-fce0-4cbd-92ed-a057185ea126
2020-07-07 21:25:43,918 INFO    (Thread-4044) [http] CLOSE client=local [connection 1 ops, 927.019595 s] [dispatch 49830 ops, 833.128935 s] [operation 49830 ops, 801.519181 s] [read 38368 ops, 73.411818 s, 65.96 GiB, 920.07 MiB/s] [write 38368 ops, 693.239877 s, 65.96 GiB, 97.43 MiB/s] [zero 11459 ops, 19.376150 s, 34.04 GiB, 1.76 GiB/s] [flush 3 ops, 0.000757 s]

2020-07-07 21:25:44,435 INFO    (Thread-4039) [images] [local] FLUSH ticket=a0a3b322-c5a2-404a-bb40-a0fdd178e5e9
2020-07-07 21:25:44,437 INFO    (Thread-4039) [http] CLOSE client=local [connection 1 ops, 928.266260 s] [dispatch 49841 ops, 835.066073 s] [operation 49841 ops, 803.109747 s] [read 38379 ops, 73.307832 s, 65.96 GiB, 921.38 MiB/s] [write 38379 ops, 694.934859 s, 65.96 GiB, 97.20 MiB/s] [zero 11459 ops, 19.175492 s, 34.04 GiB, 1.78 GiB/s] [flush 3 ops, 0.000029 s]

2020-07-07 21:25:48,402 INFO    (Thread-4063) [images] [local] FLUSH ticket=6767b2b0-3151-4ef2-869b-c0b3f689080f
2020-07-07 21:25:48,403 INFO    (Thread-4063) [http] CLOSE client=local [connection 1 ops, 929.121078 s] [dispatch 49863 ops, 837.996348 s] [operation 49863 ops, 806.013008 s] [read 38401 ops, 73.732671 s, 65.96 GiB, 916.07 MiB/s] [write 38401 ops, 696.483147 s, 65.96 GiB, 96.98 MiB/s] [zero 11459 ops, 20.144000 s, 34.04 GiB, 1.69 GiB/s] [flush 3 ops, 0.000045 s]

2020-07-07 21:25:48,930 INFO    (Thread-4061) [images] [local] FLUSH ticket=721bdc8a-2af6-4932-9b68-e67d51a01ee9
2020-07-07 21:25:48,931 INFO    (Thread-4061) [http] CLOSE client=local [connection 1 ops, 929.870410 s] [dispatch 49846 ops, 836.822083 s] [operation 49846 ops, 805.053413 s] [read 38384 ops, 73.430449 s, 65.96 GiB, 919.84 MiB/s] [write 38384 ops, 696.269801 s, 65.96 GiB, 97.01 MiB/s] [zero 11459 ops, 19.567980 s, 34.04 GiB, 1.74 GiB/s] [flush 3 ops, 0.000207 s]

2020-07-07 21:25:48,942 INFO    (Thread-4055) [images] [local] FLUSH ticket=f0431525-4ec1-49cb-93dc-8b061f80651e
2020-07-07 21:25:48,943 INFO    (Thread-4055) [http] CLOSE client=local [connection 1 ops, 930.501935 s] [dispatch 49863 ops, 836.732170 s] [operation 49863 ops, 805.044100 s] [read 38401 ops, 73.842539 s, 65.96 GiB, 914.71 MiB/s] [write 38401 ops, 696.132520 s, 65.96 GiB, 97.03 MiB/s] [zero 11459 ops, 19.414753 s, 34.04 GiB, 1.75 GiB/s] [flush 3 ops, 0.000170 s]

2020-07-07 21:25:49,041 INFO    (Thread-4057) [images] [local] FLUSH ticket=4bf92dbf-8c36-4544-ac52-0c4fd45dc464
2020-07-07 21:25:49,041 INFO    (Thread-4057) [http] CLOSE client=local [connection 1 ops, 930.329503 s] [dispatch 49841 ops, 836.867675 s] [operation 49841 ops, 805.064638 s] [read 38379 ops, 73.163855 s, 65.96 GiB, 923.19 MiB/s] [write 38379 ops, 696.440017 s, 65.96 GiB, 96.99 MiB/s] [zero 11459 ops, 19.757155 s, 34.04 GiB, 1.72 GiB/s] [flush 3 ops, 0.000017 s]

2020-07-07 21:26:10,378 INFO    (Thread-4059) [images] [local] FLUSH ticket=51610e49-ad2b-4365-b3e6-a3315f130680
2020-07-07 21:26:10,378 INFO    (Thread-4059) [http] CLOSE client=local [connection 1 ops, 951.487325 s] [dispatch 49851 ops, 863.509005 s] [operation 49851 ops, 835.458679 s] [read 38389 ops, 72.674797 s, 65.96 GiB, 929.41 MiB/s] [write 38389 ops, 680.173379 s, 65.96 GiB, 99.30 MiB/s] [zero 11459 ops, 66.992465 s, 34.04 GiB, 520.29 MiB/s] [flush 3 ops, 0.000019 s]

Comment 5 Maayan Hadasi 2020-08-03 15:12:15 UTC
Updating the script that was used for local disk upload (comment #4):

[root@leopard03 ~]# cat v2v.sh
LIBGUESTFS_BACKEND=direct virt-v2v \
        -i disk /var/tmp/test.img \
        -o rhv-upload \
        -oc https://jenkins-vm-10.lab.eng.tlv2.redhat.com/ovirt-engine/api \
        -op password_file \
        -on v2v-$1 \
        -os fcp_0 \
        -of raw \
        -oa preallocated \
        -oo rhv-cafile=/etc/pki/vdsm/certs/cacert.pem \
        -oo rhv-cluster=golden_env_mixed_1 \
        -oo rhv-direct=true


In order to start 10 commands in parallel, I used:
for i in $(seq -w 10); do
    sh v2v.sh $i > v2v-$i.log 2>&1 &
done

Comment 6 Sandro Bonazzola 2020-08-05 06:25:16 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.