1357003 – Workers Reset due to extra memory usage during initial C&U capture when connected to RHEVM environments

Bug 1357003 - Workers Reset due to extra memory usage during initial C&U capture when connected to RHEVM environments

Summary: Workers Reset due to extra memory usage during initial C&U capture when conne...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Performance
Sub Component:
Version:	5.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	GA
Target Release:	5.7.0
Assignee:	Boriso
QA Contact:	Pradeep Kumar Surisetty
Docs Contact:
URL:
Whiteboard:	perf:c&u:rhev
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-15 13:06 UTC by Alex Krzos
Modified:	2019-08-06 20:07 UTC (History)
CC List:	9 users (show)
Fixed In Version:	5.7.0.1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-01-04 12:57:40 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:0012	0	normal	SHIPPED_LIVE	CFME 5.7.0 bug fixes and enhancement update	2017-01-04 17:50:36 UTC

Description Alex Krzos 2016-07-15 13:06:13 UTC

Description of problem:
When connecting Cloud Forms 5.6.0.13 to a RHEVM environment that has data warehousing installed and configured (ovirt_engine_history) there is a spike in memory usage resulting in reset workers and much higher than "normal" resource usage during the initial C&U collections.

The cause of this issue stems from that fact that the ovirt_engine_history database can contain > 24 hours of realtime sample in which the collector workers will collect all potential realtime data samples they can obtain.  This not only spikes the collector workers memory but also results in a spike of perf_rollup messages causing processor workers to stay on the cpu for longer than expected than if connected to another infra provider type of similar size with C&U turned on. (In comparison to VMware environments)


Version-Release number of selected component (if applicable):
5.6.0.13

How reproducible:
Should be almost always reproducible unless your ovirt_engine_history database is brand new and hasn't accumulated many realtime samples.

Steps to Reproduce:
1. Connect CFME to RHEVM environment
2. Turn on C&U
3. Observe System performance CPU & Memory of the CFME appliance

Actual results:
Depending on environment size and when the last time a "purge" job has run on the ovirt_engine_history database will vary the amount of memory growth and resulting number of reset workers in CFME

Expected results:
Collector workers to collect C&U data without huge spikes in memory growth during the initial collection/processing.

Additional info:
Previous versions of this same issue are documented here: https://bugzilla.redhat.com/show_bug.cgi?id=1085988

Potentially the capture of realtime metrics from beyond an hour can be viewed as a feature of CFME managing RHEVM as we capture more historical samples compared to VMware which only stores 1 hour of realtime samples.

Comment 2 CFME Bot 2016-08-28 12:41:02 UTC

https://github.com/ManageIQ/manageiq/pull/10828

Comment 3 CFME Bot 2016-09-08 14:15:48 UTC

New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/e3e19a9660ed8db07f34a6ad7b669343e7e79cd3

commit e3e19a9660ed8db07f34a6ad7b669343e7e79cd3
Author:     borod108 <bodnopoz>
AuthorDate: Sun Aug 28 15:24:18 2016 +0300
Commit:     borod108 <bodnopoz>
CommitDate: Tue Aug 30 10:18:42 2016 +0300

    Change the range of captured metric data on initialization
    
    Captured metric data on RHV will now follow the setting of "initial_capture_days"
    and not capture 7 days back by default on initialization.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1357003
    https://bugzilla.redhat.com/show_bug.cgi?id=1348879

 .../providers/redhat/infra_manager/metrics_capture.rb    |  2 +-
 spec/factories/authentication.rb                         |  4 ++++
 spec/factories/ext_management_system.rb                  |  7 +++++++
 .../redhat/infra_manager/metrics_capture_spec.rb         | 16 ++++++++++++++++
 4 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 spec/models/manageiq/providers/redhat/infra_manager/metrics_capture_spec.rb

Comment 4 CFME Bot 2016-09-30 03:07:39 UTC

https://github.com/ManageIQ/manageiq/pull/10721

Comment 8 errata-xmlrpc 2017-01-04 12:57:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0012.html

Note You need to log in before you can comment on or make changes to this bug.