Red Hat Bugzilla – Bug 621597
Perf: precompute select min() from rhq_measurement_data_*
Last modified: 2014-06-18 15:33:25 EDT
Every hour the hourly job is running a select min() from rhq_measurement_data_* (I think 1h, 6h 1d only) which requires at least a full index scan on Oracle, which can be very expensive and run for minutes if much metrics have been collected. See attached screenshot for an OEM output.
We need to also check for select max() calls on the same tables.
Related is also the compression:
INSERT INTO RHQ_MEASUREMENT_DATA_NUM_1H (SELECT :1 , ft.schedule_id, AVG(value), MIN(value), MAX(value) FROM RHQ_MEAS_DATA_NUM_R09 ft WHERE ft.time_stamp >= :2 AND ft.time_stamp < :3 GROUP BY ft.schedule_id)
Those min/avg/max values can be computed on the fly when measurement reports come in and stored locally (in a distributed cache) and just be written out to the 1h table each hour without the need to find all data again.
Some cleverness needs to be done to cater for the time from server start to next full hour and/or start of hour until server-shutdown. Basically on shutdown the server would need to write the data for the hour until now to some tmp space. On shutdown it would check if the data in the tmp space is from the current hour and if so, read it and continue with processing.
Same would apply for 6h and 1d tables.
I agree that caching the value of min() should be something we strive for.
This obviously requires a preprocessing routine over every measurement report across the wire, but it's better to amortize the load on the servers rather than require the database to perform an expensive operation later.
Each server upon starting up would load the min value for every raw measurement table, would calculate which table the measurement report would be written to, process the entire report against that current minimum, and finally if the min value changes update the row in the db that represents that cached min value for the table.
Since the frequency of changes to the min value for any given table should be relatively low, this implementation (storing the precomputed/cached min value for each table in the DB) should provide the performance improvement being sought after without the need for a distributed caching layer.
With any caching solution, there is a possibility of staleness. In this case, the min values cached by any server might be out-of-date. This can be compensated for by only updating the min value in the DB if the value truly is the minimum (UPDATE metricTableCache SET min=nextMinValue WHERE table=someTable AND min=prevMinValue). In this way, the min value only gets updated if the servers previous min value was up-to-date. If the query returns 0 updated rows, you know that the value was already out-of-date, and so the server should query for the latest. Otherwise the value was current, the min value got updated, and the server's cache of that min value is thus accurate.
A similar solution could be used to keep a cache of the max value for each table, but the value is mitigated due to the higher write-access profile, so it might not be valuable to attempt caching this value for the active raw table. For every other table, however, it would be valuable to cache the max.
I would strongly suggest against trying to keep a cache of running values to make the compression job less impactful. In order to do this, you would need to keep track of how many data points you have, because that is a required input to any algorithm that computes a running average. Also, running averages introduce far more computational error than if you compute the average using all the data points at once. Furthermore, you have to segment these by scheduleId, and then keep separate caches for different blocks of time. Even if all of that is done perfectly, the workflow is now more complicated because the measurement report processed by any server needs to be replicated (in a guaranteed way) to other servers, and those servers must ack and incorporate that data into their running averages. The workflow in this case thus also becomes more complicated - if measurement reports are coming in from all different servers and replicating their running averages to other servers, then the replication needs to occur in serial fashion as well (otherwise rollbacks are impossible upon failure). But serialization of requests would slow down measurement report processing significantly. And don't forget, we still need to work in the case where only one server is stood up, one that might even be restarted frequently (dev mode) - and all of this caching thus introduces a significant possibility of losing data.
This is out of date, optimizations now must be targeted at storage node storage.