Previously, the Capacity and Utilization metrics processor worker fetched all historical performance data to report metrics, causing the query to fail due to the extremely large amount of data to process. This has been fixed in the code by only loading recent performance state records. As a result, the process no longer times out and the Capacity and Utilization metrics are reported successfully.
Let me describe the actual bug:
before:
When rolling up metrics, the system fetches all historical performance data (vm_performance_states).
This query alone took over 24 minutes to run and timed out.
after:
Just fetch the performance data for the current hour/day.
date added to query: SELECT "vim_performance_states".* FROM "vim_performance_states" ...
no longer see errors with text "timed out after "
no longer see "Timed Out Active Message"
reproduction 1:
access a system that has run cap&u for many days. (any provider type)
see before and after to note change in error rates
reproduction 2:
change log levels to debug (so you will see the sql running on the server
note the vim_performance_states query. It will still have the long id list, but it will also have a date query in it
On my appliance,I changed the log levels to debug, but I wasn't able to see this query at all in the logs.
the SELECT "vim_performance_states".* FROM "vim_performance_states" ...
Reproducer:
1)Manage a provider and enable C&U collection for the provider
2)Capture C&U data for a few hours/days.
3)Disable C&U collection for at least 1 day.
4)Re-enable C&U collection
Before fix:
When C&U collection is re-enabled, CFME fetches all historical performance dats.
After fix:
When C&U collection is re-enabled, CFME fetches performance data for the current hour only.
Verified that CFME fetches performance data for the current hour only by
looking at the DB itself.Marking this as VERIFIED.
Verified in 5.5.4.0.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2016:1101
Let me describe the actual bug: before: When rolling up metrics, the system fetches all historical performance data (vm_performance_states). This query alone took over 24 minutes to run and timed out. after: Just fetch the performance data for the current hour/day. date added to query: SELECT "vim_performance_states".* FROM "vim_performance_states" ... no longer see errors with text "timed out after " no longer see "Timed Out Active Message" reproduction 1: access a system that has run cap&u for many days. (any provider type) see before and after to note change in error rates reproduction 2: change log levels to debug (so you will see the sql running on the server note the vim_performance_states query. It will still have the long id list, but it will also have a date query in it