Bug 1740489 - [RFE][v2v][Scale][RHV] Add total writes/zero and the rate to the logs
Summary: [RFE][v2v][Scale][RHV] Add total writes/zero and the rate to the logs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-imageio-common
Version: 4.3.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.4.1
: 4.4.1
Assignee: Nir Soffer
QA Contact: Maayan Hadasi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-13 05:50 UTC by Ilanit Stein
Modified: 2020-08-04 13:23 UTC (History)
6 users (show)

Fixed In Version: rhv-4.4.1-3
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-04 13:23:38 UTC
oVirt Team: Scale
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:3309 0 None None None 2020-08-04 13:23:56 UTC
oVirt gerrit 108533 0 master MERGED ops: Log more detailed transfer stats 2021-01-25 15:05:25 UTC

Description Ilanit Stein 2019-08-13 05:50:37 UTC
Description of problem:
We have some stats in imageio logs for every upload that give the actual rate we wrote to storage
(example from my testing env with gluster storage on vms, very poor performance):

2019-08-05 01:57:56,004 INFO    (Thread-80) [tickets] [local] ADD ticket={u'uuid': u'9a7f79c1-2769-4c98-a8bb-af087b8e1bf0', u'ops': [u'write'], u'url': u'nbd:unix:/var/run/vdsm/nbd/9a7f79c1-2769-4c98-a8bb-af087b8e1bf0.sock', u'sparse': True, u'timeout': 300, u'transfer_id': u'639439cd-9a30-43e7-8e21-bbc61886f3dc', u'size': 4294967296}
...
2019-08-05 01:57:56,159 INFO    (Thread-83) [http] OPEN client=local
...
2019-08-05 02:00:54,626 INFO    (Thread-83) [http] CLOSE client=local [connection=178.467355/1, dispatch=178.421271/56, operation=178.391597/56, read=1.260019/190, write=170.811010/190, zero=5.373557/40, flush=0.683994/1]

This means we had:
1 connection alive for 178.467355 seconds
56 requests, total time 178.421271 seconds
56 operations, total time 178.391597 seconds
190 reads(from client socket), total time 1.260019 seconds
190 writes (to storage), total time 170.811010 seconds
40 zeros, total time 5.373557 seconds
1 flush, total time 0.683994 seconds

What we don't have here is the total amount of data we written and total size of zero operations.

To find the amount of data written, you can do:

$ awk '/WRITE.+ticket=9a7f79c1-2769-4c98-a8bb-af087b8e1bf0/ {print $8}' /var/log/ovirt-imageio-daemon/daemon.log | awk -F= '{sum+=$2} END {print sum}'
1494855680

1494855680 bytes / 170.811010 seconds -> 8.34 MiB/s

To find zero rate you can do:

$ awk '/ZERO.+ticket=9a7f79c1-2769-4c98-a8bb-af087b8e1bf0/ {print $8}' /var/log/ovirt-imageio-daemon/daemon.log | awk -F= '{sum+=$2} END {print sum}'
2800111616

2800111616 bytes / 5.373557 seconds -> 496.95 MiB/s
(with gluster zero is much faster than write)

If we want to evaluate the performance for v2v imports, we need to collect these logs for every upload.

I think we can add the the total writes/zero and the rate to the logs if this can help the scale team.
Please file RFE for this if needed.

Version-Release number of selected component (if applicable):
vdsm-4.30.19-1.el7ev.x86_64
ovirt-imageio-common-1.5.1-0.el7ev.x86_64

Comment 1 Ilanit Stein 2019-08-13 05:55:20 UTC
The description content was written by nsoffer.

Comment 2 Michal Skrivanek 2020-03-18 15:46:27 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 3 Michal Skrivanek 2020-03-18 15:51:12 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 4 Michal Skrivanek 2020-04-01 14:47:33 UTC
ok, closing. Please reopen if still relevant/you want to work on it.

Comment 5 Michal Skrivanek 2020-04-01 14:51:04 UTC
ok, closing. Please reopen if still relevant/you want to work on it.

Comment 6 Nir Soffer 2020-04-26 00:49:44 UTC
We need this to make performance testing easier.

Comment 7 Nir Soffer 2020-04-27 22:02:49 UTC
For reference, here are new logs for uploads and downloads from UI
via proxy, and from SDK (via proxy or daemon).

Note that the default logs are very small now - WRITE/READ/ZERO log
messages are displayed not only when using DEBUG log level. This is
possible since we have detailed stats when the connection is closed.

Also note the new EXTENTS log - used during downloads with smart
client such as imageio client library. This will be used for full
and incremental backup.


## Upload from UI

### Proxy

2020-04-26 05:04:30,325 INFO    (Thread-1) [http] OPEN client=192.168.122.1
2020-04-26 05:04:49,977 INFO    (Thread-2) [tickets] [127.0.0.1] ADD ticket={'dirty': False, 'ops': ['write'], 'size': 6442450944, 'sparse': True, 'transfer_id': 'b277c237-2d0d-4d0e-8ac3-a8ac259f3151', 'uuid': '908a3197-b6af-4f63-8cfe-9f09c26feace', 'timeout': 300, 'url': 'https://host2:54322/images/908a3197-b6af-4f63-8cfe-9f09c26feace'}
2020-04-26 05:04:51,511 INFO    (Thread-1) [images] [192.168.122.1] OPTIONS ticket=908a3197-b6af-4f63-8cfe-9f09c26feace
2020-04-26 05:05:39,359 INFO    (Thread-1) [images] [192.168.122.1] OPTIONS ticket=908a3197-b6af-4f63-8cfe-9f09c26feace
2020-04-26 05:05:39,728 INFO    (Thread-1) [http] CLOSE client=192.168.122.1 [connection 1 ops, 69.403170 s] [dispatch 65 ops, 44.522528 s] [operation 62 ops, 44.485924 s] [read 799 ops, 12.584285 s, 6.00 GiB, 488.23 MiB/s] [write 799 ops, 31.788956 s, 6.00 GiB, 193.27 MiB/s] [sync 1 ops, 0.001576 s]

### Daemon

2020-04-26 05:04:49,962 INFO    (Thread-1) [tickets] [local] ADD ticket={'dirty': False, 'ops': ['write'], 'size': 6442450944, 'sparse': True, 'transfer_id': 'b277c237-2d0d-
4d0e-8ac3-a8ac259f3151', 'uuid': '908a3197-b6af-4f63-8cfe-9f09c26feace', 'timeout': 300, 'url': 'nbd:unix:/var/run/vdsm/nbd/908a3197-b6af-4f63-8cfe-9f09c26feace.sock'}
2020-04-26 05:04:51,521 INFO    (Thread-3) [http] OPEN client=192.168.122.12
2020-04-26 05:04:51,522 INFO    (Thread-3) [images] [192.168.122.12] OPTIONS ticket=908a3197-b6af-4f63-8cfe-9f09c26feace
2020-04-26 05:05:39,727 INFO    (Thread-3) [images] [192.168.122.12] FLUSH ticket=908a3197-b6af-4f63-8cfe-9f09c26feace
2020-04-26 05:05:39,730 INFO    (Thread-3) [http] CLOSE client=192.168.122.12 [connection 1 ops, 48.207891 s] [dispatch 801 ops, 30.059915 s] [operation 800 ops, 29.737612 s
] [read 799 ops, 18.877159 s, 6.00 GiB, 325.47 MiB/s] [write 799 ops, 9.991013 s, 6.00 GiB, 614.95 MiB/s] [flush 1 ops, 0.000351 s]


## Download from UI

### Proxy

2020-04-26 05:06:57,795 INFO    (Thread-4) [tickets] [127.0.0.1] ADD ticket={'dirty': False, 'ops': ['read'], 'filename': 'test.raw', 'size': 6442450944, 'sparse': True, 'transfer_id': '48c9d098-f2a5-46f5-94cd-9f968a241f35', 'uuid': '627a13f2-ae84-4f09-9f2c-748050efd250', 'timeout': 300, 'url': 'https://host2:54322/images/627a13f2-ae84-4f09-9f2c-748050efd250'}
2020-04-26 05:06:57,858 INFO    (Thread-5) [http] OPEN client=192.168.122.1
2020-04-26 05:07:59,475 INFO    (Thread-6) [tickets] [127.0.0.1] EXTEND timeout=300 ticket=627a13f2-ae84-4f09-9f2c-748050efd250
2020-04-26 05:08:05,198 INFO    (Thread-5) [http] CLOSE client=192.168.122.1 [connection 1 ops, 67.339662 s] [dispatch 1 ops, 67.339435 s] [operation 1 ops, 67.330861 s] [read 768 ops, 37.780378 s, 6.00 GiB, 162.62 MiB/s] [write 768 ops, 29.480514 s, 6.00 GiB, 208.41 MiB/s]

### Daemon

2020-04-26 05:06:57,791 INFO    (Thread-11) [tickets] [local] ADD ticket={'dirty': False, 'ops': ['read'], 'filename': 'test.raw', 'size': 6442450944, 'sparse': True, 'transfer_id': '48c9d098-f2a5-46f5-94cd-9f968a241f35', 'uuid': '627a13f2-ae84-4f09-9f2c-748050efd250', 'timeout': 300, 'url': 'file:///rhev/data-center/mnt/nfs1:_export_1/766d8f9a-77e5-4f54-a76b-198e4b69f419/images/76573b3a-06ea-4761-ba84-8e1400d33ab6/94bbcad6-e302-4569-897e-2278033d4a08'}
2020-04-26 05:06:57,863 INFO    (Thread-12) [http] OPEN client=192.168.122.12
2020-04-26 05:06:57,863 INFO    (Thread-12) [images] [192.168.122.12] OPTIONS ticket=627a13f2-ae84-4f09-9f2c-748050efd250
2020-04-26 05:06:57,864 INFO    (Thread-12) [extents] [192.168.122.12] EXTENTS ticket=627a13f2-ae84-4f09-9f2c-748050efd250 context=zero
2020-04-26 05:08:05,199 INFO    (Thread-12) [http] CLOSE client=192.168.122.12 [connection 1 ops, 67.336100 s] [dispatch 770 ops, 35.098325 s] [extents 1 ops, 0.000230 s] [operation 768 ops, 34.430660 s] [read 768 ops, 21.035536 s, 6.00 GiB, 292.08 MiB/s] [write 768 ops, 11.909811 s, 6.00 GiB, 515.88 MiB/s]


## Upload using uploda_disk.py

### Proxy

2020-04-26 05:19:33,619 INFO    (Thread-8) [tickets] [127.0.0.1] ADD ticket={'dirty': False, 'ops': ['write'], 'size': 6442450944, 'sparse': True, 'transfer_id': '766f7664-93c7-402b-b2db-78cc1890d37e', 'uuid': '358365b5-7cd1-4385-aa71-50a00d041cd4', 'timeout': 300, 'url': 'https://host2:54322/images/358365b5-7cd1-4385-aa71-50a00d041cd4'}
2020-04-26 05:19:33,708 INFO    (Thread-9) [http] OPEN client=192.168.122.1
2020-04-26 05:19:33,709 INFO    (Thread-9) [images] [192.168.122.1] OPTIONS ticket=358365b5-7cd1-4385-aa71-50a00d041cd4
2020-04-26 05:19:41,898 INFO    (Thread-9) [images] [192.168.122.1] FLUSH ticket=358365b5-7cd1-4385-aa71-50a00d041cd4
2020-04-26 05:19:41,901 INFO    (Thread-9) [http] CLOSE client=192.168.122.1 [connection 1 ops, 8.192244 s] [dispatch 202 ops, 8.055845 s] [operation 201 ops, 7.998543 s] [read 228 ops, 2.336442 s, 1.19 GiB, 522.54 MiB/s] [write 228 ops, 5.375953 s, 1.19 GiB, 227.10 MiB/s] [zero 99 ops, 0.229418 s, 4.81 GiB, 20.96 GiB/s] [flush 1 ops, 0.002069 s]

### Daemon

2020-04-26 05:19:33,616 INFO    (Thread-25) [tickets] [local] ADD ticket={'dirty': False, 'ops': ['write'], 'size': 6442450944, 'sparse': True, 'transfer_id': '766f7664-93c7-402b-b2db-78cc1890d37e', 'uuid': '358365b5-7cd1-4385-aa71-50a00d041cd4', 'timeout': 300, 'url': 'nbd:unix:/var/run/vdsm/nbd/358365b5-7cd1-4385-aa71-50a00d041cd4.sock'}
2020-04-26 05:19:33,717 INFO    (Thread-26) [http] OPEN client=192.168.122.12
2020-04-26 05:19:33,718 INFO    (Thread-26) [images] [192.168.122.12] OPTIONS ticket=358365b5-7cd1-4385-aa71-50a00d041cd4
2020-04-26 05:19:41,899 INFO    (Thread-26) [images] [192.168.122.12] FLUSH ticket=358365b5-7cd1-4385-aa71-50a00d041cd4
2020-04-26 05:19:41,902 INFO    (Thread-26) [http] CLOSE client=192.168.122.12 [connection 1 ops, 8.184716 s] [dispatch 329 ops, 5.151182 s] [operation 328 ops, 5.037843 s] [read 228 ops, 3.096498 s, 1.19 GiB, 394.28 MiB/s] [write 228 ops, 1.669468 s, 1.19 GiB, 731.30 MiB/s] [zero 99 ops, 0.121834 s, 4.81 GiB, 39.46 GiB/s] [flush 1 ops, 0.001154 s]


## Download using download_disk.py

### Proxy

2020-04-26 05:24:33,601 INFO    (Thread-14) [tickets] [127.0.0.1] ADD ticket={'dirty': False, 'ops': ['read'], 'size': 6442450944, 'sparse': True, 'transfer_id': 'e8bb443e-ee23-4475-900e-ba068b9d5bd0', 'uuid': '698949d9-d595-46d4-9599-d3ecfd00a91f', 'timeout': 300, 'url': 'https://host2:54322/images/698949d9-d595-46d4-9599-d3ecfd00a91f'}
2020-04-26 05:24:33,633 INFO    (Thread-15) [http] OPEN client=192.168.122.1
2020-04-26 05:24:33,633 INFO    (Thread-15) [images] [192.168.122.1] OPTIONS ticket=698949d9-d595-46d4-9599-d3ecfd00a91f
2020-04-26 05:24:33,634 INFO    (Thread-15) [extents] [192.168.122.1] EXTENTS ticket=698949d9-d595-46d4-9599-d3ecfd00a91f context=zero
2020-04-26 05:24:43,014 INFO    (Thread-15) [http] CLOSE client=192.168.122.1 [connection 1 ops, 9.380285 s] [dispatch 103 ops, 8.235288 s] [extents 1 ops, 0.013373 s] [operation 101 ops, 8.197933 s] [read 228 ops, 5.384928 s, 1.32 GiB, 251.77 MiB/s] [write 228 ops, 2.757307 s, 1.19 GiB, 442.78 MiB/s]

### Daemon

2020-04-26 05:24:33,598 INFO    (Thread-37) [tickets] [local] ADD ticket={'dirty': False, 'ops': ['read'], 'size': 6442450944, 'sparse': True, 'transfer_id': 'e8bb443e-ee23-4475-900e-ba068b9d5bd0', 'uuid': '698949d9-d595-46d4-9599-d3ecfd00a91f', 'timeout': 300, 'url': 'nbd:unix:/var/run/vdsm/nbd/698949d9-d595-46d4-9599-d3ecfd00a91f.sock'}
2020-04-26 05:24:33,638 INFO    (Thread-38) [http] OPEN client=192.168.122.12
2020-04-26 05:24:33,638 INFO    (Thread-38) [images] [192.168.122.12] OPTIONS ticket=698949d9-d595-46d4-9599-d3ecfd00a91f
2020-04-26 05:24:33,639 INFO    (Thread-38) [extents] [192.168.122.12] EXTENTS ticket=698949d9-d595-46d4-9599-d3ecfd00a91f context=zero
2020-04-26 05:24:43,015 INFO    (Thread-38) [http] CLOSE client=192.168.122.12 [connection 1 ops, 9.377116 s] [dispatch 230 ops, 4.804260 s] [extents 1 ops, 0.010611 s] [operation 228 ops, 4.748868 s] [read 228 ops, 2.374451 s, 1.32 GiB, 570.97 MiB/s] [write 228 ops, 2.200837 s, 1.32 GiB, 616.02 MiB/s]


I did not test with virt-v2v yet. Ilanit, I hope you can test
this soon.

Comment 9 mlehrer 2020-05-03 12:21:48 UTC
I see from your replies on this bug the logs now are reduced with nice rate summary shown.
What is the verification flow required here that requires scale?  Or is this mostly to assist V2V efforts?
Either way can you clarify what needs to be checked here?

Comment 10 Nir Soffer 2020-05-03 13:52:14 UTC
(In reply to mlehrer from comment #9)
I think a good test is to run standard v2v import test (e.g. 10 x 100g)
and check that we get good info in the logs.

Now that we have much smaller logs, we expect all the imports to be in the first
logs. Previously we importing 10 vms would generate about 4-5 log files.

Upstream virt-v2v also have more useful logs including similar stats from
nbdkit, both for vddk and imageio side:
https://github.com/libguestfs/nbdkit/blob/b5d5811e5ffc0e9f40264cbcd80f60c02da4d995/filters/stats/nbdkit-stats-filter.pod

I hope that this change exists now in virt-v2v version in rhel 8.2.

Using both stats we can tell where is the bottleneck during an import.

Comment 11 Nir Soffer 2020-06-07 13:21:07 UTC
This is available since ovirt-imageio 2.0.6.

Comment 13 Maayan Hadasi 2020-07-02 12:30:33 UTC
Verified as fixed

Verification steps:
Import 100gb x 10 VMs from VMware to RHV 4.4, via CFME, and check /var/log/ovirt-imageio/daemon.log (in RHV host)

The stats for an upload in log is:

2020-07-01 15:37:19,317 INFO    (Thread-308) [tickets] [local] ADD ticket={'dirty': False, 'ops': ['write'], 'size': 107374182400, 'sparse': False, 'transfer_id': '6b580f7b-e169-4df9-a304-30026de1eaa0', 'uuid': '990e1693-949a-44b5-8d2d-cf68e7183e86', 'timeout': 300, 'url': 'file:///rhev/data-center/mnt/blockSD/f163ac90-cb40-416a-83b3-f419160e4039/images/7889175c-d226-4095-8de7-042fd16b22ca/d25743e6-32c7-4d28-bd73-fb9f07d83d56'}

2020-07-01 16:40:16,224 INFO    (Thread-759) [images] [local] FLUSH ticket=990e1693-949a-44b5-8d2d-cf68e7183e86
2020-07-01 16:40:16,224 INFO    (Thread-759) [images] [local] FLUSH ticket=990e1693-949a-44b5-8d2d-cf68e7183e86
2020-07-01 16:40:16,225 INFO    (Thread-759) [images] [local] FLUSH ticket=990e1693-949a-44b5-8d2d-cf68e7183e86
2020-07-01 16:40:16,226 INFO    (Thread-759) [http] CLOSE client=local [connection 1 ops, 3391.541182 s] [dispatch 38422 ops, 308.717402 s] [operation 38422 ops, 301.024433 s] [read 36119 ops, 21.054336 s, 65.99 GiB, 3.13 GiB/s] [write 36119 ops, 213.980215 s, 65.99 GiB, 315.79 MiB/s] [zero 2300 ops, 64.465514 s, 34.01 GiB, 540.24 MiB/s] [flush 3 ops, 0.000020 s]


The total amount of written data (65.99 GiB) and total size of zero operations (34.01 GiB) were added


The issue was verified with versions:
CFME 5.11.6 + v2v-conversion-host-ansible-1.16.2-3.el8ev
RHV 4.4 host-8.2 + v2v-conversion-host-wrapper-1.16.2-3.el8ev
ovirt-imageio 2.0.8-1.el8ev

Comment 15 errata-xmlrpc 2020-08-04 13:23:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Engine and Host Common Packages 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3309


Note You need to log in before you can comment on or make changes to this bug.