Bug 2051997 - [RFE] Default thin provisioning extension thresholds should match modern hardware
Summary: [RFE] Default thin provisioning extension thresholds should match modern hard...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.50.0.5
Hardware: All
OS: All
high
medium
Target Milestone: ovirt-4.5.0
: 4.50.0.10
Assignee: Nir Soffer
QA Contact: sshmulev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-08 13:49 UTC by Janos Bonic
Modified: 2022-05-03 06:46 UTC (History)
11 users (show)

Fixed In Version: vdsm-4.50.0.10
Doc Type: Enhancement
Doc Text:
Feature: Adapt thin provisioning defaults to match modern hardware with faster write and larger capacity. The minimum allocation size was increased from 1 GiB to 2.5G, and the minimum free space threshold was increased from 512 MiB to 2 GiB. Reason: With modern hardware virtual machines sometimes paused temporarily when writing to thin disks on block based storage. Result: The system allocates more data earlier, minimizing virtual machines pauses.
Clone Of:
Environment:
Last Closed: 2022-05-03 06:46:19 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?
pm-rhel: planning_ack?
pm-rhel: devel_ack+
pm-rhel: testing_ack+


Attachments (Terms of Use)
Extend script for testing paused during extend flow (813 bytes, text/plain)
2022-02-22 10:28 UTC, Nir Soffer
no flags Details
Logs from testing various configurations (563.85 KB, application/x-xz)
2022-02-22 10:45 UTC, Nir Soffer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github oVirt vdsm pull 80 0 None Merged Improve extend timing 2022-02-21 20:32:48 UTC
Github oVirt vdsm pull 81 0 None open tests: Test drive exceeded time behavior 2022-02-21 20:26:20 UTC
Github oVirt vdsm pull 82 0 None open config: Increase thin provisioning threshold 2022-02-22 10:13:42 UTC
Red Hat Issue Tracker RHV-44647 0 None None None 2022-02-08 13:53:00 UTC
Red Hat Knowledge Base (Solution) 130843 0 None None None 2022-02-08 13:49:02 UTC

Description Janos Bonic 2022-02-08 13:49:03 UTC
Description of problem:

The current defaults for thin provisioning (volume_utilization_percent and volume_utilization_chunk_mb) are too low for modern workloads (e.g. Kubernetes). Please update the VDSM default values. If possible, please also preallocate at least 5-10% of the disk to avoid VM pauses on first start.

Comment 2 Arik 2022-02-09 13:42:01 UTC
This goes against other ideas aimed to minimize storage consumption (e.g., bz 2041352)
I think we need to distinguish between OCP on RHV and traditional virtualization workloads
At one extreme, that of OCP on RHV, we don't expect high overcommit and the virtualization workloads are supposed to run relatively high number of nested workloads
At the other extreme, that of VDI solution, the ratio between VMs to hosts is higher and the virtualization workloads are less IO extensive

Nir, what do you think?
How about simplifying the tuning instead of changing the defaults?

Comment 3 Nir Soffer 2022-02-09 14:18:29 UTC
(In reply to Arik from comment #2)
> This goes against other ideas aimed to minimize storage consumption (e.g.,
> bz 2041352)
> I think we need to distinguish between OCP on RHV and traditional
> virtualization workloads
> At one extreme, that of OCP on RHV, we don't expect high overcommit and the
> virtualization workloads are supposed to run relatively high number of
> nested workloads
> At the other extreme, that of VDI solution, the ratio between VMs to hosts
> is higher and the virtualization workloads are less IO extensive
> 
> Nir, what do you think?
> How about simplifying the tuning instead of changing the defaults?

I agree that the defaults are two low. Extending a disk takes 2-6 seconds
in normal conditions, and we extend a disk only when it has 512 MiB free
space. Fast storage can easily write 1 GiB in one second, so VM is likey
to pause for several seconds before the extend request is completed.

When we create a VM we don't know how the VM will access storage, but maybe
we need different defaults for different kinds of VMs.

Currently we recommend to use preallocated disks for VMs that want best
performance and reliability. Thin disks are best used for VMs with low
storage needs (e.g. VDI use case).

The current tunables for thin block storage are in vdsm, and they are global.
A better way would to be have default per:
- system
- cluster
- vm
- disk

For example if a have a DB VM, the DB disk(s) should use different tuning
compared with mostly idle desktop VM that never write to storage.

So the tunables should move to engine, and passed vdsm when starting a VM.
Vdsm can use the provided values to monitor disks.

Regarding pre-extend a disk when creating disks - this is possible in engine
API, and already used in lot of example scripts like:
https://github.com/oVirt/python-ovirt-engine-sdk4/blob/76dc3e7405b9acfefb34147a63172d164d8d7016/examples/upload_disk.py#L230

This example allocates the exact size needed to write the disk to storage.

If users want to allocate additional space to avoid pausing right after
the VM stats, they can add additional size when creating the disks.

As quick way to improve the situation, we can change vdsm defaults to use
2 GiB chunk size instead of 1 GiB. This will mitigate most case, with 
minimal additional storage space. But this change should be done also
one engine, which also keeps the same default.

Comment 4 Nir Soffer 2022-02-09 14:38:39 UTC
Regarding bz 2041352 - the issue there is using 10% extra allocation when
the possible allocation is more like 1%. So this bug does contradict the
requirement for more aggressive chunk size.

For example we use the same chunk size (1 GiB) for 5 GiB disk and 500 GiB
disk. We can adapt the chunk size to the disk size, and/or to the current
allocation.

Ideally we could adapt the chunk size to the recent write rate, so when you
start writing 10 GiB of data, instead of extending 10 times, and pause many
times during the write, we can use bigger chunks and avoid most pauses.

This is possible by using adaptive block threshold - instead of registering
the threshold old to the point you want to extend, register an threshold
that will trigger an event that will fire early for collecting the current
write rage. Based on the write rate, we can adapt the next extend size.

Comment 5 Arik 2022-02-09 14:41:16 UTC
> When we create a VM we don't know how the VM will access storage, but maybe we need different defaults for different kinds of VMs.

If by "kind of vms" you mean to the vm type (server/desktop/high-performance) then yes, it also crossed my mind - to configure higher tuning properties for high performance (assuming the master nodes are set as high performance vms)

> As quick way to improve the situation, we can change vdsm defaults to use 2 GiB chunk size instead of 1 GiB. This will mitigate most case, with  minimal additional storage space. But this change should be done also one engine, which also keeps the same default.

Yes but then we really need to reproduce this issue and see if this change is enough
If we go with bigger changes, then it conflicts with efficient storage consumption
If we go with smaller changes like this, they may not be enough
Janos, were you able to reproduce this issue?

Comment 6 Janos Bonic 2022-02-09 17:18:21 UTC
We have VM pauses on startup in our CI system for every installation since the disk starts at 0. Other than that, OCS seems to do a pretty good job pausing the VM under high load on the OSDs. I agree that being able to tune it to the workload would be ideal, but that may call for a larger code change. Bumping the values to match most use cases would be very welcome.

Comment 7 Nir Soffer 2022-02-09 18:56:20 UTC
(In reply to Janos Bonic from comment #6)
> We have VM pauses on startup in our CI system for every installation since
> the disk starts at 0.

What do you mean by disk starts at 0? If the disk is a snapshot, it starts at
empty 1 GiB logical volume. It will be extended to 2 GiB when the allocation
is 512 MiB.

If you want to allocate more than 1 GiB, you can use larger initial_size
when you create the snapshot. 

> Other than that, OCS seems to do a pretty good job
> pausing the VM under high load on the OSDs.

Do you mean Ceph storage? Do we run Ceph nodes in RHV VMs, using thin disk
on block storage?

> I agree that being able to tune
> it to the workload would be ideal, but that may call for a larger code
> change. Bumping the values to match most use cases would be very welcome.

We cannot optimize defaults for one use case when this is unwanted for other
use cases. We can improve the defaults so they work for most use cases, but
this will never be the optimal values for certain use case.

For best results you should optimize the specific system. For example
"our CI" is a system you fully control, so you can use preallocated disks, or 
large initial size when provisioning this system.

If the issue is not "our CI" but all OCP on RHV installations, the best solution
is to fix this in the installer.

Using your own defaults is very easy - you install your own defaults in:

$ cat /etc/vdsm/vdsm.conf.d/99-ocp.conf
# Settings optimized for running OCP on RHV.

[irs]
# Size of extension chunk in megabytes.
# default value:
# 	volume_utilization_chunk_mb = 1024
volume_utilization_chunk_mb = 2048

And restart vdsm service if you installed the file after vdsm was started.

Comment 10 Arik 2022-02-09 22:31:58 UTC
> If the issue is not "our CI" but all OCP on RHV installations, the best solution is to fix this in the installer

+1
I'd like to add that the assumption that Kubernetes represents modern workloads that run on traditional data centers is debatable (it's more of an exception) and unless we come up with a magic number that fits (almost) all scenarios, I would rather prefer to make the parameters easier to tune. I don't see a reason to rush with this though because specifically for OCP on RHV, the installer can change to provision VMs with preallocated disks or users can be advised to tune it as explained in comment 7

Comment 11 Michal Skrivanek 2022-02-10 10:35:58 UTC
it's not only Kubernetes usecases, we do know we have the same issue for general "database" usecase for ages. It really ids that nowadays most storage is very much capable of writing more than 512MB in few seconds. Same as the default RAM size of 1GB is not really usable for anything today.
We need to match the hardware capabilities in general.

Instead of fancy profiles I think this RFE should track just an update of our defaults to keep the right balance between performance and over-allocation for current hw.

Comment 12 Michal Skrivanek 2022-02-10 10:40:15 UTC
To keep this simple enough I'd like to focus only on a simple change and decide if we should
- just change the values to another predefined value for volume_utilization_chunk_mb
- change volume_utilization_percent as well
- and/or make them proportional to volume size

Comment 14 Arik 2022-02-10 11:08:42 UTC
(In reply to Michal Skrivanek from comment #11)
> it's not only Kubernetes usecases, we do know we have the same issue for
> general "database" usecase for ages. It really ids that nowadays most
> storage is very much capable of writing more than 512MB in few seconds. Same
> as the default RAM size of 1GB is not really usable for anything today.
> We need to match the hardware capabilities in general.
> 
> Instead of fancy profiles I think this RFE should track just an update of
> our defaults to keep the right balance between performance and
> over-allocation for current hw.

What blocks us from preallocating the disks for k8s and such database workloads?
Is there a real demand for having these workloads with thin-provisioning?

Comment 16 Michal Skrivanek 2022-02-10 16:14:00 UTC
(In reply to Arik from comment #14)
> (In reply to Michal Skrivanek from comment #11)
> > it's not only Kubernetes usecases, we do know we have the same issue for
> > general "database" usecase for ages. It really ids that nowadays most
> > storage is very much capable of writing more than 512MB in few seconds. Same
> > as the default RAM size of 1GB is not really usable for anything today.
> > We need to match the hardware capabilities in general.
> > 
> > Instead of fancy profiles I think this RFE should track just an update of
> > our defaults to keep the right balance between performance and
> > over-allocation for current hw.
> 
> What blocks us from preallocating the disks for k8s and such database
> workloads?
>
> Is there a real demand for having these workloads with thin-provisioning?

storage overprovisioning is important, it's the whole reason why thin provisioning even exists. I do believe it has high value and people widely use it and will continue to do so.

Comment 17 Arik 2022-02-10 16:47:32 UTC
(In reply to Michal Skrivanek from comment #16)
> storage overprovisioning is important, it's the whole reason why thin
> provisioning even exists. I do believe it has high value and people widely
> use it and will continue to do so.

+1
Exactly, and the importance of over-provisioning is why I think we should be careful with changing the settings in a way that may lead to over allocation
My point is whether there really are users that have IO intensive operations and expects to take advantages of thin-provisioning or it is rather just an incorrect configuration that OCP on RHV installers applies

Comment 18 Arik 2022-02-14 15:38:17 UTC
Discussed offline, we can go with minimal changes that won't necessarily fix any real-world issue but will also not (significantly) hinder over-commitment.

About making it configurable - that requires a separate RFE

Comment 19 Benny Zlotnik 2022-02-17 10:01:54 UTC
A previous bug for reference https://bugzilla.redhat.com/show_bug.cgi?id=1408594#c8

Comment 20 Nir Soffer 2022-02-21 11:07:50 UTC
Created attachment 1862381 [details]
Extract extend stats from vdsm log

Usage:

    python3 extend-stats.py < /var/log/vdsm/vdsm.log

Comment 21 Nir Soffer 2022-02-21 11:11:27 UTC
I posted https://github.com/oVirt/vdsm/pull/80, improving extend
info timing. With this change we log the time since we received
a block threshold event, until the new size of the disk is visible
to the guest.

I collected extend info for vm with 50 GiB disk, extending the disk
50 times.
https://github.com/oVirt/vdsm/pull/80#issuecomment-1046745262

From this log we can extract the total extend time using the
attached script (attachment 1862381 [details]):

$ python3 extend-stats.py <vdsm.log
min=2.270 avg=3.668 max=6.240

With this data can can adapt the defaults so we don't pause in
the common case with a common write rate in common servers.

Comment 22 Nir Soffer 2022-02-22 10:17:30 UTC
I posted a pr with new defaults:
https://github.com/oVirt/vdsm/pull/82

Tests results with the new defaults show that we can cope now with 
4x times faster write rate before VM start to pause.

Before:

write rate  extends   pauses
----------------------------
 75 MiB/s        50        0
100 MiB/s        50        4
125 MiB/s        50        4
150 MiB/s        53       24

After:

write rate  extends   pauses
----------------------------
200 MiB/s        20        0
250 MiB/s        20        0
300 MiB/s        20        0
350 MiB/s        21        0
400 MiB/s        20        1
450 MiB/s        20        2
500 MiB/s        22        7
550 MiB/s        23        7

Comment 23 Nir Soffer 2022-02-22 10:28:47 UTC
Created attachment 1862576 [details]
Extend script for testing paused during extend flow

The attached script should run inside the guest cause a 50 GiB disk to be fully
extended by very busy guest.

To use the script:

1. Create VM with 50 GiB thin data disk on fast iSCSI/FC storage

2. If the disk in the guest is not /dev/sdb, update the script

    For example testing disk /dev/sdc

    PATH = "/dev/sdc"

3. Set the required write rate (RATE)

    For example, testing write rate of 500 MiB/s:

    RATE = 500 * 1024**2

4. Clear vdsm log

    # echo -n >/var/log/vdsm/vdsm.log

5. Run the script

    # python3 extend.py
    100.00% 50.00 GiB 340.81 seconds 150.23 MiB/s

5. Copy and analyze vdsm log:

    # cp /var/log/vdsm/vdsm.log 150mbs.log

Comment 24 Nir Soffer 2022-02-22 10:45:08 UTC
Created attachment 1862578 [details]
Logs from testing various configurations

The tarball includes logs form testing old and new configurations:

$ tree .
.
├── 1024-50
│   ├── 100mbs.log
│   ├── 125mbs.log
│   ├── 150mbs.log
│   └── 75mbs.log
├── 2560-20
│   ├── 200mbs.log
│   ├── 250mbs.log
│   ├── 300mbs.log
│   ├── 350mbs.log
│   ├── 400mbs.log
│   ├── 450mbs.log
│   ├── 500mbs.log
│   └── 550mbs.log
└── extend-stats.py

2 directories, 13 files

The script extend-stats.py extract extend stats from vdsm log.

The directory 1024-50 contains logs from testing the old configuration:
[irs]
volume_utilization_chunk_mb = 1024
volume_utilization_precent = 50

The directory 2560-20 contains logs from testing the new configuration:
[irs]
volume_utilization_chunk_mb = 2560
volume_utilization_precent = 20

Examples from the attached logs:

$ for n in */*.log; do echo $n; python3 extend-stats.py <$n; echo; done
1024-50/100mbs.log
min=2.320 avg=3.427 max=6.400

1024-50/125mbs.log
min=2.320 avg=3.569 max=5.960

1024-50/150mbs.log
min=2.350 avg=3.597 max=7.880

1024-50/75mbs.log
min=2.430 avg=3.599 max=6.320

2560-20/200mbs.log
min=2.660 avg=3.640 max=5.860

2560-20/250mbs.log
min=2.310 avg=3.341 max=4.800

2560-20/300mbs.log
min=2.500 avg=3.604 max=5.970

2560-20/350mbs.log
min=2.290 avg=3.664 max=6.970

2560-20/400mbs.log
min=2.600 avg=3.696 max=5.610

2560-20/450mbs.log
min=2.410 avg=3.598 max=6.040

2560-20/500mbs.log
min=2.720 avg=4.332 max=8.580

2560-20/550mbs.log
min=2.510 avg=4.199 max=8.350

In general extend total time is stable regardless of the write rate, but with high
write maximum extend time increases.


How time is spent during extend?

$ grep 'completed <Clock' 2560-20/400mbs.log | head -1
2022-02-22 10:18:27,125+0200 INFO  (mailbox-hsm/0) [virt.vm] (vmId='d0730833-98aa-4138-94d1-8497763807c7') Extend volume 3a69ed39-055a-4a30-bd74-b1f51a5ed5cc completed <Clock(total=2.99, wait=0.62, extend-volume=2.11, refresh-volume=0.25)> (thinp:567)

total=2.99 - total time since we received block threshold event until new size is available to guest
wait=0.62 - time from receiving the event until we start the extend attempt
extend-volume=2.11 - time since we send extend request until we get extend reply
refresh-volume=0.25 - time to refresh the volume to update the size on the host

Comment 25 Nir Soffer 2022-02-22 10:48:34 UTC
How to get extend and pause stats from the logs:

Number of completed extends:

$ grep 'completed <Clock(' 2560-20/400mbs.log | wc -l
20

Number of pauses:
[nsoffer@host4 bug-2051997]$ grep 'onResume' 2560-20/400mbs.log | wc -l
1

We check the number of resumes since we get muliple pause events for every
pause, but only one resume event after a vm is resumed.

Comment 28 Nir Soffer 2022-03-24 22:36:16 UTC
This change improved the maximum write rate from 75 MiB/s to 350 MiB/s.
We have a upstream PR improving maximum write rate by to 610 MiB/s:
https://github.com/oVirt/vdsm/pull/103

See the upstream issue describing the why and how this works:
https://github.com/oVirt/vdsm/issues/102

oVirt 4.5 will be 8 times better compared to ovirt 4.4, reducing the
chance for pausing VMs when writing quickly to fast storage.

Examples stats from 100 extensions:

# ./extend-stats <vdsm.log

Total time
min=0.860 avg=2.001 max=3.270

Wait time
min=0.050 avg=1.005 max=1.990

Extend time
min=0.530 avg=0.819 max=2.080

Refresh time
min=0.150 avg=0.175 max=0.210

Note that the wait time (0.05-1.99) seconds is completely unneeded. Eliminating
it will improve the write rate by factor of 2. This is tracked upstream in 
https://github.com/oVirt/vdsm/issues/85

Comment 30 Nir Soffer 2022-04-20 12:56:21 UTC
(In reply to Shir Fishbain from comment #29)

Comment 22 and comment 23 explain how to reproduce and test, but since 
they were written we have better tools for testing and the system was
improved to support higher write rate (using mailbox events).

## Setup

1. Download thinp.py tool from vdsm stress tests in the guest:

   wget https://raw.githubusercontent.com/oVirt/vdsm/master/tests/storage/stress/thinp.py

2. Download extend-stats tool to the host running the vm

   wget https://raw.githubusercontent.com/oVirt/vdsm/master/contrib/extend-stats
   chmod +x extend-stats

## How to test write rate

1. Add a 50g thin disk on block storage to running VM
2. Clear vdsm logs on the host running the VM

   echo -n >/var/log/vdsm/vdsm.log

3. Run in the guest

   python3 thinp.py --rate XXX

This writes 50g to the disk, triggering 20 extensions per run.

4. Copy vdsm log for inspection

   cp /var/log/vdsm/vdsm.log 75m.log

5. Extract stats from vdsm log

   extend-stats <XXX.log

6. Check if vm paused during the test

If the VM paused we will have multiple onIOError logs when the VM pause
with ENOSPC, and single onResume log when the VM is resumed after the
extension.

On engine events page, we will see an error event about vm pausing because
of no space for each pause.

7. Deactivate and remove the test disk

You can test only once with the same disk. Once the disk was extended the VM
will not pause when writing to the disk since the disk was extended to the 
maximum size (53 GiB).

## Test variants

1. Multiple hosts - running VM on one host, the SPM is on the other host
2. Single host - running VM on the SPM host
3. Test disk is on the master storage domain
4. Test disk is on another storage domain

## Expected results

RHV 4.4
- Writing at 75 MiB/s: vm do not pause during the run
- Writing at 100 MiB/s: vm pause at least once during the test
- Writing at 150 MiB/s: vm pause many times during the test

RHV 4.4sp1:
- Writing at 600 MiB/s: vm do not pause during the run
- Writing at 700 MiB/s: vm pause at least once during the test
- Writing at 1200 MiB/s: vm pause many times during the test

To find the limits of RHV 4.4sp1 you need to use fast storage. I tested using
local storage on the hosts, exposed as FC devices.

Please attach output from the thinp.py tool, extend-stats, and vdsm
logs from all run.

Comment 31 sshmulev 2022-05-01 14:31:05 UTC
According to the Doc text "The minimum allocation size was increased from 1 GiB to 2.5G".
I don't see this change, according to logs and the rest API the disk is still 1 GiB in the actual size when choosing provision.

Comment 32 Nir Soffer 2022-05-02 10:56:53 UTC
(In reply to sshmulev from comment #31)
> According to the Doc text "The minimum allocation size was increased from 1
> GiB to 2.5G".
> I don't see this change, according to logs and the rest API the disk is
> still 1 GiB in the actual size when choosing provision.

Not clear what is "the disk". If you create a 1 GiB, we create 1 GiB disk.

If you create 10 GiB disk, the initial size of the disk is 2.5 GiB.

Comment 33 sshmulev 2022-05-02 16:22:12 UTC
Verified.
I couldn't make the VM pause even at the highest rate of 1200 MB/s.
Checked writing with this rate with an older version of 4.4.10-7 and I could immediately reproduce it there, just when it started writing the VM paused, and several times more.

Version tested:
rhv-4.5.0-8
ovirt-engine-4.5.0.5-0.7
vdsm-4.50.0.13-1

results:
With ISCSI, spm-host, disk on another SD

python3 thinp.py /dev/sda --rate 600
50.00 GiB, 121.64 s, 420.9 MiB/s   

# ./extend-stats < spm-host_600.log

Total time
min=1.300 avg=2.394 max=3.720

Wait time
min=0.110 avg=0.908 max=1.990

Extend time
min=0.590 avg=0.945 max=1.160

Refresh time
min=0.330 avg=0.540 max=1.200

The vm didn't pause at all.
---------------------------------------------------------
python3 thinp.py /dev/sda --rate 700
50.00 GiB, 108.56 s, 471.6 MiB/s  


./extend-stats < spm-host_700.log

Total time
min=1.330 avg=2.333 max=3.460

Wait time
min=0.170 avg=1.033 max=2.010

Extend time
min=0.560 avg=0.929 max=1.140

Refresh time
min=0.290 avg=0.369 max=0.760

The VM didn't pause at all.

---------------------------------------------------------

python3 thinp.py /dev/sda --rate 1200
50.00 GiB, 110.53 s, 463.2 MiB/s   

./extend-stats < spm-host_1200.log

Total time
min=1.180 avg=2.891 max=4.090

Wait time
min=0.210 avg=1.091 max=1.930

Extend time
min=0.570 avg=0.932 max=1.150

Refresh time
min=0.390 avg=0.864 max=1.650


#####################################################################
With FCP, non-spm-host, disk on another SD

# python3 thinp.py /dev/sda --rate 600
50.00 GiB, 106.01 s, 483.0 MiB/s  

# ./extend-stats < non-spm-host_600.log

Total time
min=1.700 avg=2.635 max=4.290

Wait time
min=0.130 avg=0.969 max=1.970

Extend time
min=0.560 avg=0.927 max=1.150

Refresh time
min=0.190 avg=0.737 max=1.820

---------------------------------------------------------

# python3 thinp.py /dev/sda --rate 700
50.00 GiB, 78.98 s, 648.3 MiB/s   

# ./extend-stats < non-spm-host_700.log

Total time
min=0.940 avg=1.899 max=2.430

Wait time
min=0.170 avg=0.762 max=1.090

Extend time
min=0.570 avg=0.931 max=1.140

Refresh time
min=0.180 avg=0.208 max=0.240

---------------------------------------------------------

# python3 thinp.py /dev/sda --rate 1200
50.00 GiB, 103.55 s, 494.5 MiB/s  

# ./extend-stats < non-spm-host_1200.log

Total time
min=1.870 avg=2.514 max=3.380

Wait time
min=0.060 avg=0.935 max=1.870

Extend time
min=0.580 avg=0.952 max=1.590

Refresh time
min=0.210 avg=0.627 max=1.230

------------------------------------------------------------
 python3 thinp.py /dev/sda --rate 2500
50.00 GiB, 78.87 s, 649.2 MiB/s

# ./extend-stats < non-spm-host_2500.log

Total time
min=1.530 avg=2.442 max=3.030

Wait time
min=0.660 avg=1.344 max=1.720

Extend time
min=0.550 avg=0.892 max=1.130

Refresh time
min=0.190 avg=0.205 max=0.280

50.00 GiB, 78.87 s, 649.2 MiB/s

Comment 35 Nir Soffer 2022-05-02 17:14:44 UTC
(In reply to sshmulev from comment #33)
> Verified.
> I couldn't make the VM pause even at the highest rate of 1200 MB/s.

The actual output from the command show that in most tests you could not write
more than 480 MiB/s.

> Checked writing with this rate with an older version of 4.4.10-7 and I could
> immediately reproduce it there, just when it started writing the VM paused,
> and several times more.
> 
> Version tested:
> rhv-4.5.0-8
> ovirt-engine-4.5.0.5-0.7
> vdsm-4.50.0.13-1
> 
> results:
> With ISCSI, spm-host, disk on another SD
> 
> python3 thinp.py /dev/sda --rate 600
> 50.00 GiB, 121.64 s, 420.9 MiB/s   
> 
> # ./extend-stats < spm-host_600.log
> 
> Total time
> min=1.300 avg=2.394 max=3.720
> 
> Wait time
> min=0.110 avg=0.908 max=1.990
> 
> Extend time
> min=0.590 avg=0.945 max=1.160
> 
> Refresh time
> min=0.330 avg=0.540 max=1.200
> 
> The vm didn't pause at all.
> ---------------------------------------------------------
> python3 thinp.py /dev/sda --rate 700
> 50.00 GiB, 108.56 s, 471.6 MiB/s  
> 
> 
> ./extend-stats < spm-host_700.log
> 
> Total time
> min=1.330 avg=2.333 max=3.460
> 
> Wait time
> min=0.170 avg=1.033 max=2.010
> 
> Extend time
> min=0.560 avg=0.929 max=1.140
> 
> Refresh time
> min=0.290 avg=0.369 max=0.760
> 
> The VM didn't pause at all.
> 
> ---------------------------------------------------------
> 
> python3 thinp.py /dev/sda --rate 1200
> 50.00 GiB, 110.53 s, 463.2 MiB/s   
> 
> ./extend-stats < spm-host_1200.log
> 
> Total time
> min=1.180 avg=2.891 max=4.090
> 
> Wait time
> min=0.210 avg=1.091 max=1.930
> 
> Extend time
> min=0.570 avg=0.932 max=1.150
> 
> Refresh time
> min=0.390 avg=0.864 max=1.650
> 
> 
> #####################################################################
> With FCP, non-spm-host, disk on another SD
> 
> # python3 thinp.py /dev/sda --rate 600
> 50.00 GiB, 106.01 s, 483.0 MiB/s  
> 
> # ./extend-stats < non-spm-host_600.log
> 
> Total time
> min=1.700 avg=2.635 max=4.290
> 
> Wait time
> min=0.130 avg=0.969 max=1.970
> 
> Extend time
> min=0.560 avg=0.927 max=1.150
> 
> Refresh time
> min=0.190 avg=0.737 max=1.820
> 
> ---------------------------------------------------------
> 
> # python3 thinp.py /dev/sda --rate 700
> 50.00 GiB, 78.98 s, 648.3 MiB/s   
>
> # ./extend-stats < non-spm-host_700.log
> 
> Total time
> min=0.940 avg=1.899 max=2.430
> 
> Wait time
> min=0.170 avg=0.762 max=1.090
> 
> Extend time
> min=0.570 avg=0.931 max=1.140
> 
> Refresh time
> min=0.180 avg=0.208 max=0.240
> 
> ---------------------------------------------------------
> 
> # python3 thinp.py /dev/sda --rate 1200
> 50.00 GiB, 103.55 s, 494.5 MiB/s  
> 
> # ./extend-stats < non-spm-host_1200.log
> 
> Total time
> min=1.870 avg=2.514 max=3.380
> 
> Wait time
> min=0.060 avg=0.935 max=1.870
> 
> Extend time
> min=0.580 avg=0.952 max=1.590
> 
> Refresh time
> min=0.210 avg=0.627 max=1.230
> 
> ------------------------------------------------------------
>  python3 thinp.py /dev/sda --rate 2500
> 50.00 GiB, 78.87 s, 649.2 MiB/s
> 
> # ./extend-stats < non-spm-host_2500.log
> 
> Total time
> min=1.530 avg=2.442 max=3.030
> 
> Wait time
> min=0.660 avg=1.344 max=1.720
> 
> Extend time
> min=0.550 avg=0.892 max=1.130
> 
> Refresh time
> min=0.190 avg=0.205 max=0.280
> 
> 50.00 GiB, 78.87 s, 649.2 MiB/s

Looks good!

Comment 36 Sandro Bonazzola 2022-05-03 06:46:19 UTC
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.