Bug 1267697
Summary: | Much higher memory usage in 5.5 | ||
---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | Alex Krzos <akrzos> |
Component: | Performance | Assignee: | Keenan Brock <kbrock> |
Status: | CLOSED ERRATA | QA Contact: | Alex Krzos <akrzos> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.5.0 | CC: | apatters, cpelland, dajohnso, dmetzger, jhardy, jocarter, kbrock, mfeifer, nachandr, obarenbo, perfbz, simaishi |
Target Milestone: | Beta 2 | ||
Target Release: | 5.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 5.5.0.8 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-12-08 13:33:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alex Krzos
2015-09-30 17:24:59 UTC
This is impacting QE test automation runs, Dennis, can we get someone to look into this soon please? TIA * The measurement for memory in this comment and comment #0 is the difference of the RSS memory consumed from immediately after rails console starts and until the benchmark completes. In the benchmark's code that is mrss_change. This is less than total memory used by the benchmark since that would include what overhead is used by spawning the rails console. Additional tests show that larger scaled RHEVM providers and VMware providers are also affected 5.5 Initial Refresh: RHEVM large provider - Between 1095MiB to 1216MiB VMware small provider - Between 93MiB to 94MiB VMware medium provider - Between 376MiB to 395MiB VMware large provider - ~1248MiB (Only one sample at this time) 5.4 Initial Refresh: RHEVM large provider - Between 606MiB to 610MiB VMware small provider - Between 58MiB to 63MiB VMware medium provider - Between 227MiB to 233MiB VMware large provider - Between 551MiB to 607MiB * In addition the the EmsRefresh Memory bloat, VMware providers have a VIMBroker Worker which might also bloat in memory usage. Capacity and Utilization Benchmarks show that RSS Memory Utilization is also higher for VMware VM/Host perf_captures and RHEVM/VMware perf_capture_timer. That means this memory growth will affect more than the refresh worker/feature. (Adjusting the BZ title to match) VM.perf_capture 5.5 99%ile of 4 samples: RHEVM small provider - 9MiB RHEVM medium provider - 11MiB RHEVM large provider - 19MiB VMware small provider - 72MiB VMware medium provider - 78MiB VMware large provider - 72MiB 5.4 99%ile of 4 samples: RHEVM small provider - 10MiB RHEVM medium provider - 14MiB RHEVM large provider - 19MiB VMware small provider - 35MiB VMware medium provider - 36MiB VMware large provider - 44MiB Host.perf_capture 5.5 99%ile of 4 samples: RHEVM small provider - 9.2MiB RHEVM medium provider - 9.8MiB RHEVM large provider - 9.8MiB VMware small provider - 81.2MiB VMware medium provider - 87.3MiB VMware large provider - 83.1MiB 5.4 99%ile of 4 samples: RHEVM small provider - 10.0MiB RHEVM medium provider - 10.4MiB RHEVM large provider - 10.5MiB VMware small provider - 40.4MiB VMware medium provider - 35.0MiB VMware large provider - 39.7MiB * While some of the RHEVM perf_capture tests show memory growth, it is much harder to measure timing/memory values with RHEVM perf_captures due to: https://bugzilla.redhat.com/show_bug.cgi?id=1085988 During these measurements the simulators had the dwhd stopped thus there shouldn't have been any data to collect from rhevm, however in some cases there is still apparent memory growth perf_capture_timer 5.5 99%ile of 4 samples: RHEVM small provider - 21.5MiB RHEVM medium provider - 43.5MiB RHEVM large provider - 95.1MiB VMware small provider - 34.8MiB VMware medium provider - 110.3MiB VMware large provider - 143.3MiB 5.4 99%ile of 4 samples: RHEVM small provider - 21.8MiB RHEVM medium provider - 120.3MiB RHEVM large provider - 237.2MiB VMware small provider - 23.7MiB VMware medium provider - 41.3MiB VMware large provider - 84.2MiB Correction to Comment 4: RHEVM provider memory utilization was reversed for 5.4 vs 5.5. Below is correct data for benchmarks of RHEVM perf_capture_timer. RHEVM providers have higher memory utilization during this benchmark in 5.5 alpha. perf_capture_timer 5.5 99%ile of 4 samples: RHEVM small provider - 21.8MiB RHEVM medium provider - 120.3MiB RHEVM large provider - 237.2MiB 5.4 99%ile of 4 samples: RHEVM small provider - 21.5MiB RHEVM medium provider - 43.5MiB RHEVM large provider - 95.1MiB *** Bug 1267695 has been marked as a duplicate of this bug. *** In order to characterize this issue against CFME 5.4, I deployed a 5.4.3.0 appliance and 5.5.0.3 appliance and managed the same provider while capturing worker rss/virt memory usage over 20 minutes: The environment managed was a medium sized VMware environment consisting of 1000 VMs, 50 Hosts and 61 datastores. The only applied workload was to add the provider and allow cfme to inventory. evmserverd was then restarted and the memory utilization was tracked for 20 mintues. The amount more for 5.5 appliances by worker amounts to: 115MiB More for Refresh Worker 52MiB More for MiqVimBrokerWorker 102MiB More for MiqEmsRefreshCoreWorker ~80MiB More for MiqGenericWorker (2x) ~50MiB More for MiqPriorityWorker (2x) 43MiB More for MiqScheduleWorker 48MiB More for MiqUiWorker 46MiB More for MiqWebServiceWorker ~48MiB More for MiqReportingWorker (2x) 44MiB More for MiqEventHandler 73MiB More for Event Catcher Worker + 169MiB for MiqAutomateWorker (2x) This totals up to an additional 1217MiB to manage the same sized provider in 5.5. Performing the same sequence only this time turning on C&U collections for the entire region results in even greater memory usage over 5.4: Significantly changed workers: 95MiB More for MiqVimBrokerWorker ~46MiB More for MiqGenericWorker (2x) ~93MiB More for MiqPriorityWorker (2x) 79MiB More for MiqUiWorker ~148MiB More for Collector Worker (2x) ~107MiB More for MiqEmsMetricsProcessorWorker (2x) 112MiB More for Refresh Worker 93MiB More for MiqEmsRefreshCoreWorker 50.9MiB More for MiqScheduleWorker 48MiB More for MiqWebServiceWorker ~47MiB More for MiqReportingWorker (2x) 43MiB More for MiqEventHandler 72MiB More for Event Catcher Worker + 169MiB for MiqAutomateWorker (2x) This totals up to an additional 1813MiB to manage and collect metrics on the same sized provider in 5.5. There were numerous commits over the past couple weeks all relating reduction in appliance memory. Current memory utilization support Small/Medium environments with the desired 6Gb memory configuration. Therefore, this ticket is being closed, however development will continue to monitor / evaluate the application memory footprint closely. This has been addressed and merged. Please open BZs with specific memory errors for the rest of this release Fixed in 5.5.0.12. As stated by Keenan, individual BZs addressing memory usage will be opened as seen fit under further analysis. Many fixes were applied to reduce/accommodate for the memory footprint of 5.5. These include: Removal of Automate Workers GC Tuning to reduce and cap rate of memory growth Reduced Vim Broker Worker memory usage Default memory of appliance was raised to 8GiB With the above fixes, I can now manage Small (100 total vms, 50 online) and Medium (1000 total vms, 500 online) VMware environments with a default appliance configuration with Capacity and Utilization turned on. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:2551 |