Bug 1430876
Summary: | [RFE] Increase supported per-manager host limit | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ashton Davis <asdavis> |
Component: | ovirt-engine | Assignee: | Daniel Gur <dagur> |
Status: | CLOSED ERRATA | QA Contact: | Daniel Gur <dagur> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | unspecified | CC: | ahoness, dougsland, Egarciad, fgarciad, hyupark, jclaretm, lbopf, lsurette, mgoldboi, mkalinin, molasaga, mperina, rbalakri, rgolan, Rhev-m-bugs, rzaleski, sradco, srevivo, subhat, usurse, ykaul, ylavi |
Target Milestone: | ovirt-4.2.2 | Keywords: | FutureFeature, Performance |
Target Release: | --- | Flags: | lsvaty:
testing_plan_complete-
|
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: |
undefined
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-05-15 17:41:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1520566 |
Description
Ashton Davis
2017-03-09 18:44:17 UTC
Roy, did we understand what were the bottlenecks QE saw, or was it in their environment? After continuous profiling sessions it is clear that most of the engine effort is put on host and vms statistics collection. Even though my env is mostly with fake vms and hosts, it simulates the load on the engine without a problem. I looks like we can increase the number of hosts with no real problem, and with the help of Postgresql 9.5 which is coming in 4.2 and of course a decent drive for it, it is doable. What I still want to do is do decrease the polling interval of the statistics to 30 seconds instead of the current 15s. This is essentially a config option change. With just a tiny effort we can leave the vms monitoring still polling the vm list on 15s just to cover gaps and to keep the system behaving as is (ahadas's advice). Over all the cpu consumption of the engine on this big setup didn't surpass 40% and was mostly at 15% Over all memory consumption was fluctuating between 200-1200 Mb but with frequent GC cycles (every ~30s) - most of the garbage is young objects created by monitoring code. Supporting large number of hosts should also take into consideration the VM density. High density is usually a VDI deployment, thin VMs and this means more effort on VDSM side to monitor disk watermark - should be better by libvirt event in 4.2 as well. There is nothing preventing deploying lots of hosts with high density but we usually don't see this (cmiiw here) I'm fine with decreasing the polling. I wonder if we should do it by default or only to large environments. Please send an email to devel mailing list asking about the pros/cons. Can we please estimate the improvement from this change, so we can decide the benefit on that base? This bug should be moved to MODIFIED or ON_QA as soon as: 1. PG 9.5 is in. (https://gerrit.ovirt.org/#/q/status:open+project:ovirt-engine+branch:master+topic:postgres9.5 ) 2. Ravi's native threads is in (https://gerrit.ovirt.org/#/q/status:merged+project:ovirt-engine+branch:master+topic:threading ) - already is. 2. Ravi's 2nd part of the series for threads is in (https://gerrit.ovirt.org/#/q/status:open+project:ovirt-engine+branch:master+topic:threading ) Reverting changes done by automatic bots We've completed all the work that we've intended to perform for RHV 4.2 in this RFE. We've already seen QE running with 400 hosts and we believe we can get to higher numbers with additional improvements we've had in 4.2.2. Moving to ON_QA for QE to verify. Removing Need Info as this bug is already closed. And info provided Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1488 *** Bug 1698310 has been marked as a duplicate of this bug. *** BZ<2>Jira Resync |