Bug 1114202
| Summary: | Data aggregation should be fault tolerant | |||
|---|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | John Sanda <jsanda> | |
| Component: | Core Server, Storage Node | Assignee: | Nobody <nobody> | |
| Status: | ON_QA --- | QA Contact: | ||
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.11 | CC: | hrupp | |
| Target Milestone: | --- | |||
| Target Release: | RHQ 4.13 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1114203 (view as bug list) | Environment: | ||
| Last Closed: | Type: | Bug | ||
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1114199 | |||
| Bug Blocks: | 1133605, 1114203 | |||
|
Description
John Sanda
2014-06-28 15:21:52 UTC
I am re-targeting this for RHQ 4.13 due to issues found in 4.12. I am moving this back to ASSIGNED because I realize that there are a couple problems with the solution implemented in 4.12. It addresses failures in raw data aggregation and only partially in 1 hour or 6 hour data aggregation. When there is a failure, the corresponding schedule ids are not deleted from the metrics index table. On a subsequent run of the data aggregation job, we will query the raw data index for the prior time slice and attempt to aggregate the data again. If the 6 hour time slice has passed, we will also recompute the 6 hour data. And if the 24 hour time slice has passed, we will also recompute the 24 hour data. Now suppose it is 12:00, and the data aggregation job runs. We will compute 1 hour data for the 11:00 - 12:00 hour. We will also compute 6 hour metrics for the 06:00 - 12:00 time slice. If there is an error aggregating the 1 hour data, we will not attempt to recompute the 6 hour metrics (assuming there are no errors computing raw data during the 06:00 - 12:00 time slice) since we only look at the raw data index for previous time slices. We need to look at the1 hour and 6 hour data indexes as well when looking at past time slices for any metrics that need to be recomputed. Changes have been pushed to master. commit hash: 574393c12f2a |