Bug 1301587
Summary: | [scale] - hosts initialization taking too long (with 500 fake hosts and 10K vms) | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Eldad Marciano <emarcian> | ||||
Component: | Backend.Core | Assignee: | Martin Perina <mperina> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | eberman | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.6.2 | CC: | bugs, mperina | ||||
Target Milestone: | ovirt-4.2.0 | Keywords: | Performance | ||||
Target Release: | 4.2.0 | Flags: | rule-engine:
ovirt-4.2+
rule-engine: planning_ack+ rule-engine: devel_ack+ eberman: testing_ack+ |
||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-01-12 12:57:15 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1364791 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Eldad Marciano
2016-01-25 12:56:51 UTC
Created attachment 1117980 [details]
engine thread dumps
Eldad, - Can you provide engine logs? - What is 'long', in the sense that how many are initialized per minute? (is it linear, is it slowing down?). Is it still same for 100 hosts, for example? - Does the number change depending on the number of VMs? Is it the same without the VMs? - Have you seen any difference between fake and real hosts? Lastly, rhevm 3.6.2.0-1 is a bit old. While I don't think there were critical changes in this area, the latest is rhevm-3.6.2.6-0.1 (In reply to Yaniv Kaul from comment #2) > Eldad, > - Can you provide engine logs? Yes i'll. > - What is 'long', in the sense that how many are initialized per minute? (isthe > it linear, is it slowing down?). Is it still same for 100 hosts, for example? didn't test, i notice that problem when i restart the engine. and seems like it's serial. > - Does the number change depending on the number of VMs? Is it the same > without the VMs? didn't test it. > - Have you seen any difference between fake and real hosts? yes, we have 37 real hosts vs 500 fake hosts. > > Lastly, rhevm 3.6.2.0-1 is a bit old. While I don't think there were > critical changes in this area, the latest is rhevm-3.6.2.6-0.1 we'll upgrade it ASAP. (In reply to Eldad Marciano from comment #3) > (In reply to Yaniv Kaul from comment #2) > > Eldad, > > - Can you provide engine logs? > Yes i'll. > > > - What is 'long', in the sense that how many are initialized per minute? (isthe > > it linear, is it slowing down?). Is it still same for 100 hosts, for example? > didn't test, i notice that problem when i restart the engine. and seems like > it's serial. > I'd be interested in knowing exactly how long it takes. Currently targeting to 3.6.4, but for such scale we might address it only on 4.0. Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone. Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA. oVirt 4.0 beta has been released, moving to RC milestone. oVirt 4.0 beta has been released, moving to RC milestone. Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone. (In reply to Oved Ourfali from comment #4) > (In reply to Eldad Marciano from comment #3) > > (In reply to Yaniv Kaul from comment #2) > > > Eldad, > > > - Can you provide engine logs? > > Yes i'll. > > > > > - What is 'long', in the sense that how many are initialized per minute? (isthe > > > it linear, is it slowing down?). Is it still same for 100 hosts, for example? > > didn't test, i notice that problem when i restart the engine. and seems like > > it's serial. > > > > I'd be interested in knowing exactly how long it takes. Currently we dont have this such of scale capacity, i'll update one we have it. Is this bug still relevant in terms of topology ?! we would like to reproduce it with 500 hosts and 10K vms ?! (In reply to Eldad Marciano from comment #13) > Is this bug still relevant in terms of topology ?! > we would like to reproduce it with 500 hosts and 10K vms ?! We haven't done any improvements for engine startup time, but I think that fixes for BZ1438497 might help also here. Please test 500 hosts and 10K VMs, if there is still issue we will try to optimize. (In reply to Martin Perina from comment #14) > (In reply to Eldad Marciano from comment #13) > > Is this bug still relevant in terms of topology ?! > > we would like to reproduce it with 500 hosts and 10K vms ?! > > We haven't done any improvements for engine startup time, but I think that > fixes for BZ1438497 might help also here. Please test 500 hosts and 10K VMs, > if there is still issue we will try to optimize. Tested with: Version: 4.2.0-0.0.master.20171121184703.git173fe83.el7.centos 3 DC 3 Clusters 406 Hosts Hera : 6 Hosts leopard : 2 Hosts UCS : 1 Nested hosts : 400 VMS 3 SD 1700 VMS up (from 1800) Hera : 1005 VMS leopard : 515 VMS UCS : 180 Scenario matrix Test Step 1700 VMS 400 Nested hosts From UI perspective all response time were very reasonable, and everything responded much better Tested with chrome and Firefox didnt noticed any latency issues from UX/UI relevant scenarios preformed: Sent maintenance to 10 hosts 0:00:08 Reboot 10 nested hosts 0:05:21 Reboot 50 nested hosts 0:06:00 Reboot 80 nested hosts 0:10:14 Reboot 100 nested hosts 0:11:15 Engine restart 0:01:08 |