Bug 1201482 - Storage QoS is not applying on a Live VM/disk
Summary: Storage QoS is not applying on a Live VM/disk
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: mom
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ovirt-4.0.4
: 4.0.1
Assignee: Andrej Krejcir
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
: 1149523 (view as bug list)
Depends On:
Blocks: 1324919 1328731 1346754
TreeView+ depends on / blocked
 
Reported: 2015-03-12 19:03 UTC by nijin ashok
Modified: 2019-12-16 04:44 UTC (History)
27 users (show)

Fixed In Version: mom-0.5.6-1
Doc Type: Enhancement
Doc Text:
To properly support disk hot plug and disk QoS changes for a running virtual machine, MoM is updated and now reads the IO QoS settings from metadata and sets the respective ioTune limits to a running virtual machine's disk.
Clone Of:
: 1324919 1328731 (view as bug list)
Environment:
Last Closed: 2016-10-17 13:06:55 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm server and engine logs (1.42 MB, application/x-gzip)
2016-07-28 18:15 UTC, Kevin Alon Goldblatt
no flags Details
vdsm, mom, server and engine logs (1.57 MB, application/x-gzip)
2016-08-02 11:31 UTC, Kevin Alon Goldblatt
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1696 0 normal SHIPPED_LIVE mom bug fix and enhancement update for RHV 4.0 2016-08-24 00:36:29 UTC
oVirt gerrit 52743 0 master MERGED core: Enable live QoS change for cpu and IO 2020-09-15 08:49:28 UTC
oVirt gerrit 52746 0 master MERGED Apply storage QoS on running VM 2020-09-15 08:49:27 UTC
oVirt gerrit 52748 0 master MERGED Expose IO limits to policies 2020-09-15 08:49:27 UTC
oVirt gerrit 53438 0 master MERGED core: Refactor - created helper class for IoTune 2020-09-15 08:49:27 UTC
oVirt gerrit 54208 0 master MERGED Add MoM scripts to change storage QoS on running VM 2020-09-15 08:49:26 UTC
oVirt gerrit 55056 0 master MERGED core: Dao functions take lists of ids 2020-09-15 08:49:26 UTC
oVirt gerrit 55820 0 ovirt-3.6 MERGED Apply storage QoS on running VM 2020-09-15 08:49:26 UTC
oVirt gerrit 55821 0 ovirt-3.6 MERGED Add MoM scripts to change storage QoS on running VM 2020-09-15 08:49:26 UTC
oVirt gerrit 55834 0 master MERGED Bump mom dependency to version 0.5.3 2020-09-15 08:49:26 UTC
oVirt gerrit 55867 0 ovirt-3.6 MERGED Bump mom dependency to version 0.5.3 2020-09-15 08:49:25 UTC
oVirt gerrit 55899 0 ovirt-engine-3.6 MERGED core: Refactor - created helper class for IoTune 2020-09-15 08:49:25 UTC
oVirt gerrit 55901 0 ovirt-engine-3.6 MERGED core: Dao functions take lists of ids 2020-09-15 08:49:24 UTC
oVirt gerrit 55930 0 ovirt-engine-3.6.5 ABANDONED core: Refactor - created helper class for IoTune 2020-09-15 08:49:24 UTC
oVirt gerrit 55931 0 ovirt-engine-3.6.5 ABANDONED core: Dao functions take lists of ids 2020-09-15 08:49:25 UTC
oVirt gerrit 55943 0 ovirt-engine-3.6.5 ABANDONED core: Enable live QoS change for cpu and IO 2020-09-15 08:49:25 UTC
oVirt gerrit 55944 0 ovirt-engine-3.6 MERGED core: Enable live QoS change for cpu and IO 2020-09-15 08:49:24 UTC
oVirt gerrit 61947 0 master MERGED Do not ignore ioTune when there is not policy yet 2020-09-15 08:49:24 UTC

Description nijin ashok 2015-03-12 19:03:02 UTC
Description of problem:

Storage QoS disk profile is not applying to a Live VM disk. Need to restart the virtual machine from the manager to apply the new QoS/Disk profile to the disk.

Version-Release number of selected component (if applicable):
RHEV 3.5

How reproducible:
100%

Steps to Reproduce:
1. Disk profile with unlimited QoS

[root@vm85 ~]# dd bs=1M count=100 if=/dev/zero of=/case/test conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.1826 s, 88.7 MB/s

2. Changed the disk profile which is having a QoS of 10MB/s write

[root@vm85 ~]# dd bs=1M count=100 if=/dev/zero of=/case/test conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.44883 s, 72.4 MB/s

3. power off and start the VM from RHEV-M manager. Now QoS is correctly applied

[root@vm85 ~]# dd bs=1M count=100 if=/dev/zero of=/case/test conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 10.041 s, 10.4 MB/s

Actual results:

QoS is not applying on a live disk

Expected results:

QoS should apply on a live VM/disk

Additional info:

Comment 1 Allon Mureinik 2015-03-14 09:03:17 UTC
Gilad, is there a way to apply a QoS dynamically to a running VM?
If not, this should be blocked (CDA + gui)

Comment 4 Roy Golan 2015-03-15 07:52:47 UTC
Gilad was this planned to be a part of updateVmPolicy?

Comment 5 Gilad Chaplik 2015-03-15 10:19:42 UTC
(In reply to Roy Golan from comment #4)
> Gilad was this planned to be a part of updateVmPolicy?

yes. Martin, did vsdm support make it to 3.5 as well?

Comment 6 nijin ashok 2015-03-23 12:45:56 UTC
Also the read QoS is not working even after bringing VM/disk offline.

Configured QoS of 10 MB/s for read operation . Power off and start the VM from RHEV-M . But QoS is still is not limited.

# dd if=/case/test of=/dev/null bs=1M
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.0324113 s, 3.2 GB/s

Comment 7 Doron Fediuck 2015-04-16 11:40:00 UTC
(In reply to nijin ashok from comment #6)
> Also the read QoS is not working even after bringing VM/disk offline.
> 
> Configured QoS of 10 MB/s for read operation . Power off and start the VM
> from RHEV-M . But QoS is still is not limited.
> 
> # dd if=/case/test of=/dev/null bs=1M
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 0.0324113 s, 3.2 GB/s

This may be a libvirt issue which deserves a separate bz.
Can you verify the limitations are available in libvirt's domain xml?

Comment 9 Doron Fediuck 2015-05-17 08:11:23 UTC
*** Bug 1149523 has been marked as a duplicate of this bug. ***

Comment 11 Roy Golan 2015-07-19 08:25:59 UTC
we didn't hook the disks commands to the update policy flow.
we call the update policy command on VM move to UP. 

we should observe all relevant updates on VM (update cpu, update disk, update nic) and fire the updateVmPolicy command.

Comment 14 Roy Golan 2015-07-20 10:29:20 UTC
just to make sure - is the cluster compatibility version for this VM is 3.5? otherwise the iotune parameters will not be send along for the VM creation info.

Comment 15 nijin ashok 2015-07-22 07:15:22 UTC
(In reply to Roy Golan from comment #14)
> just to make sure - is the cluster compatibility version for this VM is 3.5?
> otherwise the iotune parameters will not be send along for the VM creation
> info.

The compatibility version is 3.5

Comment 16 Roy Golan 2015-07-22 08:22:10 UTC
I need the vdsm log from the time this vm was created. we should see "ioTune" in the specparams of the the disk device. if its not there then the engine didn't send it. then is either a config issue or a bug

Comment 17 Roy Golan 2015-07-30 06:36:59 UTC
any news?

Comment 18 nijin ashok 2015-08-02 07:00:46 UTC
(In reply to Roy Golan from comment #16)
> I need the vdsm log from the time this vm was created. we should see
> "ioTune" in the specparams of the the disk device. if its not there then the
> engine didn't send it. then is either a config issue or a bug

As the bug was reproduced from my end I am attaching the vdsm and engine log from my test environment. 

Disk profile with unlimited QoS

# dd bs=1M count=100 if=/dev/zero of=/case/test conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.51712 s, 69.1 MB/s

Disk profile with limited QoS (10 MB/s)

# dd bs=1M count=100 if=/dev/zero of=/case/test conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.17152 s, 89.5 MB/s


After VM shutdown

## dd bs=1M count=100 if=/dev/zero of=/case/test conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 13.9455 s, 7.5 MB/s

Read same after VM shutdown too

# dd if=/case/test of=/dev/null bs=1M
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.0231245 s, 4.5 GB/s

Comment 20 Roy Golan 2015-08-11 11:24:42 UTC
Engine log shows we do send the ioTune directive in engine log. Its now the job of mom to instruct libvirt. Please attach logcollector so we will see the mom log as well.

Thanks

Comment 24 Yaniv Kaul 2015-09-21 07:42:33 UTC
Aharon - disregard previous comment, I see that GSS provided logs already.

Comment 26 Moran Goldboim 2016-02-10 11:57:37 UTC
Due to the complexity of the fix, we have decided to include it for 3.6 zstream.

Comment 27 Paul Cuzner 2016-02-18 01:18:30 UTC
I'm seeing similar issues using rhevm 3.6.3 beta. When the base (default) disk profile has a limiting QoS I can see that the qemu command is invoked with the appropriate parameters (e.g. aio=threads,iops=200). Which is good.

However, I am currently unable to actually change the disk profile of a vdisk. The UI allows me to update the value, and apply the change. It doesn't report any error but when I look again at the vdisk the original disk profile is still in place. This is the same whether the vm is up or down.

We have QoS as one of the features to help with the hyperconverged usecase, providing a mechanism to limit the potential of noisy neighbours taking over the glusterfs brick(s).

Comment 28 Andrej Krejcir 2016-02-22 09:36:24 UTC
That is bug 1297734.

Comment 29 Mike McCune 2016-03-28 22:14:36 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 31 Martin Sivák 2016-04-20 07:51:44 UTC
Moving this bug to mom component. I will create a clone for ovirt-engine.

Comment 32 Elad 2016-05-11 08:48:23 UTC
Martin, does the fix here contain the fix in engine? Should I test it using the engine or should it be tested the same as done in https://bugzilla.redhat.com/show_bug.cgi?id=1324919#c9 ?

Comment 33 Kevin Alon Goldblatt 2016-06-08 12:30:27 UTC
Tested with the following code:
------------------------------------------
rhevm-4.0.0.2-0.1.el7ev.noarch
vdsm-4.18.1-11.gita92976e.el7ev.x86_64


Tested with the following scenario:
1. Created a VM with disks and wrote to the disk with 'dd' as described in the description
2. Added a new profile via the Data Centre tab
3. Added this profile to the Domain Controller via the Storage tab
4. Via the VM tab select a disk from the disk tab in the VM and pressed edit, however the option to edit the disk is greyed for disk profile. 

Please advise whether this is a bug or whether there is some other new method to edit the disk profile in the Webadmin. This option was working in V3.6xx

Comment 34 Roman Mohr 2016-06-23 14:32:08 UTC
(In reply to Kevin Alon Goldblatt from comment #33)
> Tested with the following code:
> ------------------------------------------
> rhevm-4.0.0.2-0.1.el7ev.noarch
> vdsm-4.18.1-11.gita92976e.el7ev.x86_64
> 
> 
> Tested with the following scenario:
> 1. Created a VM with disks and wrote to the disk with 'dd' as described in
> the description
> 2. Added a new profile via the Data Centre tab
> 3. Added this profile to the Domain Controller via the Storage tab
> 4. Via the VM tab select a disk from the disk tab in the VM and pressed
> edit, however the option to edit the disk is greyed for disk profile. 
> 
Interesting, I have seen that only on master branch. So there was probably a backport which did that to the UI. It should be changeable from there.

Anyway, for as long as https://bugzilla.redhat.com/show_bug.cgi?id=1328731 is not fixed, nothing will be propagated to mom when you change the disk profile. What you can do, is changing the quota values or changing the quota itself in the cluster, this will be propagated.

> Please advise whether this is a bug or whether there is some other new
> method to edit the disk profile in the Webadmin. This option was working in
> V3.6xx

Comment 35 Roman Mohr 2016-06-23 14:41:25 UTC
> 4. Via the VM tab select a disk from the disk tab in the VM and pressed
> edit, however the option to edit the disk is greyed for disk profile. 

Filed https://bugzilla.redhat.com/show_bug.cgi?id=1349498 so that this does not get lost. Should be fixed.

Comment 36 Kevin Alon Goldblatt 2016-07-28 18:07:02 UTC
Tested with the following code:
-------------------------------------
rhevm-4.0.2-0.1.rc.el7ev.noarch
vdsm-4.18.8-1.el7ev.x86_64

Verified with the following scenario:
-------------------------------------
Steps to reproduce:
Tested with the following scenario:
1. Created a VM with disks, started the VM and wrote to the disk with 'dd' as described in the description
dd bs=1M count=100 if=/dev/zero of=/100Ma conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.68398 s, 62.3 MB/s

2. Added a new profile via the Data Centre tab

3. Added this profile to the Domain Controller via the Storage tab

4. Via the VM tab select a disk from the disk tab in the VM and pressed edit and changed the profile to the newly created profile of 10MB write limit

5. Wrote to the disk again but the limit is NOT APPLIED!
dd bs=1M count=100 if=/dev/zero of=/100Mb conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 2.09324 s, 50.1 MB/s


6. After Shutting down the VM and Starting it again run the write operation again. Now it works fine.
dd bs=1M count=100 if=/dev/zero of=/100Mf conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 10.1281 s, 10.4 MB/s


MOVING TO ASSIGNED!

Comment 37 Kevin Alon Goldblatt 2016-07-28 18:15:09 UTC
Created attachment 1185247 [details]
vdsm server and engine logs

Comment 38 Roman Mohr 2016-07-29 04:07:47 UTC
Hi Kevin,

Some questions:

1) Could you also attach mom.log? It is the one which finally applies QoS changes.
2) How long was the VM running? The VM has to be in "Running" state when you apply QoS to immediately see an effect. If not, the engine will wait until the VM changes from starting to running to apply the changes. This can take a little bit of time.

Comment 39 Martin Sivák 2016-08-02 09:19:46 UTC
Kevin, can you please attach the virsh xml dump of the VM too? Just to see what parts are set and what parts are not.

Comment 40 Kevin Alon Goldblatt 2016-08-02 11:29:42 UTC
(In reply to Roman Mohr from comment #38)
> Hi Kevin,
> 
> Some questions:
> 
> 1) Could you also attach mom.log? It is the one which finally applies QoS
> changes.

Adding mom log

> 2) How long was the VM running? The VM has to be in "Running" state when you
> apply QoS to immediately see an effect. If not, the engine will wait until
> the VM changes from starting to running to apply the changes. This can take
> a little bit of time.

VM was up and running for at least 10 minutes before as I did the previous write before adding the disk profile to the DC, Domain and then changing the Disk Profile on the VM -> Disk

Comment 41 Kevin Alon Goldblatt 2016-08-02 11:31:46 UTC
Created attachment 1186761 [details]
vdsm, mom, server and engine logs

Adding logs with mom log

Comment 44 Roman Mohr 2016-08-02 22:46:29 UTC
(In reply to Martin Sivák from comment #39)
> Kevin, can you please attach the virsh xml dump of the VM too? Just to see
> what parts are set and what parts are not.

Martin and Kevin, Ireplayed the scenario and it seems like the engine is sending the correct metadata to VDSM but from there on it is not picked up or applied by MOM. Did not see any strange log entries so far.

Martin could you confirm that the engine is sending it?

Comment 45 Martin Sivák 2016-08-03 09:11:13 UTC
I do not see any updateVmPolicy call in the vdsm log (the log starts at about 20:00), but I see multiple calls to UpdateVmPolicyVDSCommand on the engine side, but none after 20:00. Both logs end at about 21:00.

The attached MOM log is useless, because it does not have debug level enabled.

I can't tell which VM or which disk was used from the test report either.

Comment 46 Roy Golan 2016-08-03 10:21:34 UTC
(In reply to Martin Sivák from comment #45)
> I do not see any updateVmPolicy call in the vdsm log (the log starts at
> about 20:00), but I see multiple calls to UpdateVmPolicyVDSCommand on the
> engine side, but none after 20:00. Both logs end at about 21:00.
> 

I too see the calls from the engine, and that the call is done without an error, this means vdsm got it. 

Kevin are you sure the logs are from camel-vdsb?

> The attached MOM log is useless, because it does not have debug level
> enabled.
> 
> I can't tell which VM or which disk was used from the test report either.

Comment 52 Aharon Canan 2016-08-21 08:22:32 UTC
Can't be on_qa as we do not have a build yet.

Please correct the versions.

Comment 53 Yaniv Lavi 2016-08-21 12:02:06 UTC
Should this be modified?

Comment 57 errata-xmlrpc 2016-08-23 21:09:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1696.html

Comment 58 Martin Sivák 2016-08-24 12:01:32 UTC
There is still one issue pending.

Comment 59 Kevin Alon Goldblatt 2016-09-07 13:25:37 UTC
Tested with the following code:
----------------------------------------
rhevm-4.0.4-0.1.el7ev.noarch
vdsm-4.18.12-1.el7ev.x86_64

Tested with the following scenario:

Steps to Reproduce:
1. Created a VM with disks, started the VM and wrote to the disk with 'dd' as described in the description
dd bs=1M count=100 if=/dev/zero of=/100Ma2 conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.64193 s, 63.9 MB/s


2. Added a new profile via the Data Centre tab

3. Added this profile to the Domain Controller via the Storage tab

4. Via the VM tab select a disk from the disk tab in the VM and pressed edit and changed the profile to the newly created profile of 10MB write limit

5. Wrote to the disk again but the limit is NOT APPLIED!
 dd bs=1M count=100 if=/dev/zero of=/100Ma5 conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 10.3228 s, 10.2 MB/s


The new Qos profile of 10mb writes was successfully applied




Actual results:
The new Qos profile of 10mb writes was successfully applied

Expected results:



Moving to VERIFIED!


Note You need to log in before you can comment on or make changes to this bug.