Red Hat Bugzilla – Bug 535240
make CheckForTimedOutOperationsJob run in a short and predictable amount of time
Last modified: 2014-05-05 14:30:40 EDT
When working on RHQ-1950, and talking with Joseph, it became clear that execution time for our CheckForTimedOutOperationsJob will most likely grow as the the size of the RHQ_OPERATION_HISTORY table grows. This is because we execute ResourceOperationHistory.QUERY_FIND_ALL_IN_STATUS to find all IN_PROGRESS queries. The status field is stored as an unindexed, string field. So, we're looking at a table scan for each execution of the query.
This jop currently executes at 1 minute intervals. This is down from 10 minutes due to the work in RHQ-1950, but the interval is probably unimportant. What is important is the amount of time the job can run if the op history table grows large. How large before this may be a problem is not currently known and should be determined as the first step of this Jira. In that way we can determine whether we need to do some query/db work to resolve this issue.
Putting in index on the current field probably will not solve the problem given it's character nature and low cardinality of values. An alternative suggested by Joe would be to change the field to a numeric, with enum values corresponding to the progressive states of an op. E.g. (inprocess, cancelled, completed, failed). Giving this field an ordered index could give us the ability to do range scans efficiently.
I suggest we at least understand the perf of the current query in the 1.2 timeframe so as to know whether a change is required in the short term.
workaround is to periodically delete your operation history items. User can go into the Operation history tab and start purging old histories.
Lets see if we can include this in the perf work for 1.3
Postgres is very much able to use an index when looking for entries that are relatively rare like "INPROGRESS", so I think this can help.
A quick test on a table with >2000 (2078x success, 39x failure, 3x inprogress) entries shows that for failure and inprogress, the index is used, while for success a table scan is going over the whole table.
Moving features/improvements to 1.4
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1958
This bug relates to RHQ-2302
This bug relates to RHQ-2343
mass add of key word FutureFeature to help track
This hasn't been a problem. Closing and we can open a new BZ if and when this is an issue in the future.