Bug 1325405

Summary: C&U Metrics Processor memory and timeout issues associated with 'perf_rollup' method and vmware host and vm isntances
Product: Red Hat CloudForms Management Engine Reporter: Chris Pelland <cpelland>
Component: PerformanceAssignee: Keenan Brock <kbrock>
Status: CLOSED ERRATA QA Contact: Nandini Chandra <nachandr>
Severity: urgent Docs Contact:
Priority: high    
Version: 5.5.0CC: carnott, cpelland, dmetzger, fdewaley, jhardy, jprause, kbrock, mfeifer, nachandr, obarenbo, thenness
Target Milestone: GAKeywords: ZStream
Target Release: 5.5.4   
Hardware: All   
OS: All   
Whiteboard: c&u
Fixed In Version: 5.5.4.0 Doc Type: Bug Fix
Doc Text:
Previously, the Capacity and Utilization metrics processor worker fetched all historical performance data to report metrics, causing the query to fail due to the extremely large amount of data to process. This has been fixed in the code by only loading recent performance state records. As a result, the process no longer times out and the Capacity and Utilization metrics are reported successfully.
Story Points: ---
Clone Of: 1322485 Environment:
Last Closed: 2016-05-31 13:42:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1322485    
Bug Blocks:    

Comment 8 Keenan Brock 2016-05-06 19:58:11 UTC
Let me describe the actual bug:

before:
    When rolling up metrics, the system fetches all historical performance data (vm_performance_states).
    This query alone took over 24 minutes to run and timed out.

after:
    Just fetch the performance data for the current hour/day.
    date added to query: SELECT "vim_performance_states".* FROM "vim_performance_states" ...
    no longer see errors with text "timed out after "
    no longer see "Timed Out Active Message"

reproduction 1:
    access a system that has run cap&u for many days. (any provider type)
    see before and after to note change in error rates

reproduction 2:
    change log levels to debug (so you will see the sql running on the server
    note the vim_performance_states query. It will still have the long id list, but it will also have a date query in it

Comment 9 Nandini Chandra 2016-05-12 19:56:42 UTC
On my appliance,I changed the log levels to debug, but I wasn't able to see this query at all in the logs.

the SELECT "vim_performance_states".* FROM "vim_performance_states" ...

Reproducer:
1)Manage a provider and enable C&U collection for the provider
2)Capture C&U data for a few hours/days.
3)Disable C&U collection for at least 1 day.
4)Re-enable C&U collection

Before fix:
When C&U collection is re-enabled, CFME fetches all historical performance dats.

After fix:
When C&U collection is re-enabled, CFME fetches performance data for the current hour only.

Verified that CFME fetches performance data for the current hour only by 
looking at the DB itself.Marking this as VERIFIED.

Verified in 5.5.4.0.

Comment 11 errata-xmlrpc 2016-05-31 13:42:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1101