Bug 1181665

Summary: [RFE][scale] Use events and not polling to detect disk usage [using the improvement from platform bug 1181659]
Product: [oVirt] vdsm Reporter: Francesco Romani <fromani>
Component: RFEsAssignee: Francesco Romani <fromani>
Status: CLOSED CURRENTRELEASE QA Contact: guy chen <guchen>
Severity: high Docs Contact:
Priority: high    
Version: ---CC: bazulay, bgraveno, bugs, eberman, fromani, lpeer, lsurette, mgoldboi, michal.skrivanek, mtessun, srevivo, ykaul
Target Milestone: ovirt-4.2.0Keywords: FutureFeature, Performance
Target Release: 4.20.9Flags: rule-engine: ovirt-4.2+
rule-engine: blocker+
sherold: Triaged+
mtessun: planning_ack+
michal.skrivanek: devel_ack+
eberman: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
VDSM uses an event in libvirt 3.2.0 to obtain information about the allocation of block chunked drives and improve the efficiency of the thin provisioning implementation. This enables VDSM to consume less system resources.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-12 10:10:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1181648, 1181659    
Bug Blocks:    

Description Francesco Romani 2015-01-13 14:48:52 UTC
Description of problem:
RHEV makes heavy use of thin-provisioned disk. VDSM support for them includes
monitoring of their usage, and transparent resizing of them, without the VM noticing.

This feature is built on disk usage polling, because there is no other mean
to detect the disk usage, thus the need for extension.

The polling may be very frequent, and this is among the biggest, if not the single source of load to libvirt.

To improve scalability and resource usage in general, we need an event to notify when disk usage exceeds a threshold. This will allow the feature to work
with much less system load.

The existing polling has to be kept partially as fallback or as recovery option.

This bug tracks the implementation of this feature in RHEV.


Version-Release number of selected component (if applicable):
4.17.0

Comment 1 Michal Skrivanek 2015-01-14 08:30:38 UTC
tentatively planned for 3.6 depending on support in QEMU and libvirt

Comment 2 Michal Skrivanek 2015-05-25 11:31:46 UTC
libvirt side in progress, https://www.redhat.com/archives/libvir-list/2015-May/msg00580.html

this may be a late delivery

Comment 3 Francesco Romani 2015-06-23 14:42:16 UTC
As per last update from libvirt developers, support will most likely slip to 7.3, so we cannot implement this.

Comment 4 Michal Skrivanek 2015-07-02 06:04:29 UTC
as per last comment moving to 4.0 due to platform dependency

Comment 6 Yaniv Kaul 2016-03-14 11:17:47 UTC
Moving to 4.1, as platform bug 1181659 is not yet approved for 7.3.

Comment 11 Red Hat Bugzilla Rules Engine 2016-12-27 16:38:13 UTC
This request has been proposed for two releases. This is invalid flag usage. The ovirt-future release flag has been cleared. If you wish to change the release flag, you must clear one release flag and then set the other release flag to ?.

Comment 13 Francesco Romani 2017-02-08 15:08:40 UTC
(eventually) moved to NEW because I can't work on this until we have the libvirt support available.

Comment 18 Yaniv Kaul 2017-10-15 08:21:51 UTC
Moving back to ASSIGNED, as attached patch was abandoned.

Comment 19 Michal Skrivanek 2017-11-20 15:25:51 UTC
design: https://github.com/oVirt/vdsm/blob/master/doc/thin-provisioning.md

Comment 20 Michal Skrivanek 2017-11-20 15:26:10 UTC
implementation: https://gerrit.ovirt.org/#/q/project:vdsm+branch:master+topic:drivemonitor_event

Comment 21 Michal Skrivanek 2017-11-24 15:10:56 UTC
this is now completed and enabled by default

Comment 22 guy chen 2018-02-04 06:23:12 UTC
Was tested and verified in performance environment and seen a good improvement : 


Lab Topology
System topology (using bare metal hosts)
3 DC
5 Clusters
235 Hosts
Hera : 4 Hosts
leopard : 2 Hosts
Nested hosts : 229 VMS
3  SD
1020 VMS
Hera : 690 VMS
leopard : 330 VMS

Scenario matrix

Delta between the tests
Test Step	Old	New Build 06.12	Delta in HH:MM:SS	Delta in percentage
VM Stop	0:00:08	0:00:06	-0:00:02	-25.00%
VM Start	0:00:36	0:01:00	0:00:24	66.67%
Create VM From template	0:01:09	0:01:21	0:00:12	17.39%
Nested Host stop	0:00:34	0:00:27	-0:00:07	-20.59%
Nested Host start	0:00:50	0:00:30	-0:00:20	-40.00%
Create Nested Host from template	0:01:17	0:01:15	-0:00:02	-2.60%
Sent maintenance to 10 hosts	0:00:08	0:00:07	-0:00:01	-12.50%
Reboot 10 nested hosts	0:05:15	0:05:25	0:00:10	3.17%
Reboot 50 nested hosts	0:05:57	0:05:46	-0:00:11	-3.08%
Reboot 80 nested hosts	0:10:35	0:10:34	-0:00:01	-0.16%
Reboot 100 nested hosts	0:15:09	0:11:40	-0:03:29	-22.99%
Engine restart	0:00:47	0:00:55	0:00:08	17.02%

Comment 23 Sandro Bonazzola 2018-02-12 10:10:45 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.